# Project 1

# Are global warming and climate change supported by data? How did land temperatures change following major  innovations in transportation? How is this rate different in different places around the world?

Faisal Alkhalili


1004723427


ECO225

## Introduction

This research project aims to use records of the average land temperatures around the world categorized globally, by country, and by city to investigate whether climate change and global warming are supported by data. There will also be a focus on warming effect of different milestones in the development of transportation technology. Essentially, this research will look at how the rate of the rise of land temperatures has generally changed over since 1750 and how it changed immediately following the widespread use of new transportation technologies. The key is to examine how this rate is different in the period preceding the adoption of a certain new mode of transportation and immediately after it. The two primary events that will be isolated are the widespread use of the car (it was patented in 1886) and, subsequently, the aircraft (began to be used commercially in 1914).

The data that will be used is sourced from Berkeley Earth, an agency that has made several archives of environmental data avialable. The data contains monthly land temperature recordings that begin in 1750. After the year 1850, the data started included maximum and minimum values for each month. Some subsets of the data contain the monthly land temperature values categorized according to the country, city, major city, and US state. It's important to consider that this data begins around the same time that the Industrial Revolution is thought to have started. This is generally considered as the time that industrial pollution began to have seriously adverse effects on the climate and the temperature of the Earth. The outcome that is being considered in this research is the change in the average land temperature and the two main independent variables are pollution generally (which is essentially represented by time), the adoption of new transportation technologies, and the distance of each major city from the equator.

## Raw Data

In [None]:
data_url = "https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data"

In [None]:
import matplotlib.colors as mplc
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm #for linear regression: sm.ols
import geopandas as gpd

from shapely.geometry import Point

from pandas_datareader import DataReader

%matplotlib inline
#activate plot theme
import qeds
qeds.themes.mpl_style();
! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes

In [None]:
#Reading the first dataset, Global average land temperatures.
glob_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Turning the first dataset into a dataframe. The data is cleaned by dropping irrelevant columns.
glob_df = pd.DataFrame(glob_land_temp)
glob_df = glob_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty','LandAndOceanAverageTemperature'], axis=1)
glob_df

In [None]:
#Creating a new column in the dataframe glob_df that finds the percent change 
#in the average global land temperature, month over month.
glob_df['percent change in temp'] = glob_df['LandAverageTemperature'].pct_change()
glob_df['percent change in temp'] = glob_df['percent change in temp'] * 100
glob_df['percent change in temp'] = glob_df['percent change in temp'].round(2)
glob_df

The table above represents the raw data for the average global land temperature, per month, since 1750. This is going to be this research's main source of information for the global average land temperature.

In [None]:
#Reading the second dataset, monthly average land temperatures by major city. 
#The data is cleaned and irrelevant columns are dropped.
city_land_temp = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByMajorCity.csv')
city_df = pd.DataFrame(city_land_temp)
city_df = city_land_temp.drop(['AverageTemperatureUncertainty', 'Longitude'], axis=1)
city_df.set_index('City')

The above table includes the land temperature, per month, for each major city around the world. This data will help show differences in the rate of the change of land temperatures around the world. It will be used to identify any relationship between a city's distance from the equator and the change in its land temperature.

In [None]:
#Create a new column in city_df that measures the percent change in temperature month over month
#Find the distance of each city from the equator by multiplying the degrees portion of its latitude by 111.045 km.
city_df['percent change in temp'] = city_df['AverageTemperature'].pct_change(fill_method = 'ffill')
city_df['percent change in temp'] = city_df['percent change in temp'] * 100
city_df['percent change in temp'] = city_df['percent change in temp'].round(2)
city_df['dist from equator'] = city_df['Latitude']
city_df['dist from equator'] = city_df['dist from equator'][:-1]
city_df['dist from equator'] = city_df['dist from equator'].replace({'N': '', 'S': ''}, regex=True)
city_df['dist from equator'] = pd.to_numeric(city_df['dist from equator']) * 111.045
city_df.set_index('City')
#for row in city_df:
  #  if city_df['percent change in temp'][row] > 5000 or city_df[
    #    city_df.drop(row)

## Summary Statistics

In [None]:
glob_df.describe().round(2)

The table above includes a summary statistic of the dataframe glob_df. It computes different statistical values that may be important to the research.

In [None]:
city_df.describe().round(2)

The table above includes a summary statistic of the dataframe city_df. It computes different statistical values that may be important to the research.

## Visual Representations

In [None]:
# Create a plot, from the dataframe glob_df, that has y as the percent change of land temperature and x as time.
# Some early values are ommitted due to the high uncertaunty around them and so, the data starts 1500 months after 
#January 1750 so, January 1875.
print('Percent Change of global land temperatures per month since 1750')
glob_df.plot.line(y= 'percent change in temp', ylim = [-200,200], xlim=[1500,3500], xlabel='months since 1750')

In this graph, although an upwards trend in the avergae global land temperature is not clear, there is clearly much less variance in the later points. With additional formatting, this graph may exhibit some interesting relationships that may, perhaps, be relevant to the research.

In [None]:
# Create a plot, from the dataframe glob_df, that has y as the p Average land temperature and x as a city's distance
#from the equator in kilometres.
city_df.plot(x='dist from equator', y='AverageTemperature')

Although this plot is a bit difficult to understand at first, it actually provides some useful information. This graoh gives some sort of an indication that land temperatures vary greatly the farther away a city is from the equator. This would make sense since the closer to the Earth's pole a location is, the more likely it is to be affected by climate change. This could also be used to pursue the question of if seasons are getting harsher due to global warming.

In [None]:
#Create histogram using dataframe glob_df that plots the percent change in the global average tempretature.
#This histogram has a small number of bins so it is easier to read given the numerous outliers that make the plot look
#somewhat odd.
glob_df.hist(column='percent change in temp', bins=10)

This histogram shows that there are far more observation with a negative percent change in tempereature (month over month) than there are positive ones. Again, this visualization would require further investigation to draw out the relationships present in it.

Below is what is to come in the future

## The role that transportation technologies have had in global warming 

## Confounding variables

As is the case with any research, there is always the possibility that there are open backdoor paths or underlying variables that could lead to misleading interpretations of data. This research could face this issue and, as such, this section will identify two possible confounding variables and demonstrate their summary statistics.

1. Governmental Policy

...

# Project 2

# Are global warming and climate change supported by data? What is the relationship between Co2 emissions and the global land temperature? How is the rate of warming different in different places around the world?

Faisal Alkhalili

1004723427

ECO225

## Introduction

This research project aims to use records of the average land temperatures around the world categorized globally, by country, and by city to investigate whether climate change and global warming are supported by data. Additional data that will be used is Co2 emissions data. There will also be a focus on the warming effect of Co2 emissions which translate to pollution. Essentially, this research will look at how the rate of the rise of land temperatures has generally changed over since 1750 and if these changes correspond to changes in the emission of Co2. Also, this report will examine if the rate of warming is different in different cities as well as if the oceans are warming faster than land.

The data that will be used is sourced from Berkeley Earth, an agency that has made several archives of environmental data avialable. The data contains monthly land temperature recordings that begin in 1750. After the year 1850, the data started included maximum and minimum values for each month. Some subsets of the data contain the monthly land temperature values categorized according to the country, city, major city, and US state. It's important to consider that this data begins around the same time that the Industrial Revolution is thought to have started. This is generally considered as the time that industrial pollution began to have seriously adverse effects on the climate and the temperature of the Earth. The additional data, Co2 emissions, is sourced from World Bank. This data will corespond to the level of pollution around the world. The dataset includes Co2 emissions categorized by country including a subset that contains cumulative global Co2 emissions. The outcome that is being considered in this research is the change in the average land temperature and the two main independent variables are pollution generally (which is essentially represented by Co2 emissions) and location.

## Raw Data

In [None]:
data_url = "https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data"

In [None]:
import matplotlib.colors as mplc
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm #for linear regression: sm.ols
import geopandas as gpd

from shapely.geometry import Point

from pandas_datareader import DataReader

%matplotlib inline
#activate plot theme
import qeds
qeds.themes.mpl_style();
! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
! pip install bokeh
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
from bokeh.palettes import OrRd


In [None]:
#Reading the first dataset, Global average land temperatures.
glob_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Turning the first dataset into a dataframe. The data is cleaned by dropping irrelevant columns.
glob_df = pd.DataFrame(glob_land_temp)
glob_df = glob_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty','LandAndOceanAverageTemperature'], axis=1)
glob_df

The table above represents the raw data for the average global land temperature, per month, since 1750. This is going to be this research's main source of information for the global average land temperature.

In [None]:
#Creating a new column in the dataframe glob_df that finds the percent change 
#in the average global land temperature, month over month.
glob_df['percent change in temp'] = glob_df['LandAverageTemperature'].pct_change()
glob_df['percent change in temp'] = glob_df['percent change in temp'] * 100
glob_df['percent change in temp'] = glob_df['percent change in temp'].round(2)
#Changing the average temperature from monthly intervals to yearly intervals to combine with emissions dataframe
glob_df['dt'] = pd.to_datetime(glob_df['dt'])
glob_df = glob_df.set_index('dt')
glob_df.resample('YS').mean()
glob_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)'}, inplace=True)





In [None]:
#Creating a new dataframe that contains the global co2 emissions for each country. This dataframe also has a row
#labeled "World" with the global emissions data
co2 = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/co2_emission.csv')
co2_df = pd.DataFrame(co2)
co2_df['Year'] = pd.to_datetime(co2_df['Year'], format='%Y')
co2_df

#Creating another dataframe that takes global Co2 emissions and is merged with global average land temperatures
world_co2_df = co2_df[co2_df["Entity"] == "World"]
temp_co2 = glob_df.merge(world_co2_df,right_on='Year', left_on='dt', how='right')
temp_co2


In [None]:
#Reading the second dataset, monthly average land temperatures by major city. 
#The data is cleaned and irrelevant columns are dropped.
city_land_temp = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByMajorCity.csv')
city_df = pd.DataFrame(city_land_temp)
city_df = city_land_temp.drop(['AverageTemperatureUncertainty', 'Longitude'], axis=1)
city_df.rename(columns={'AverageTemperature': 'City Average Temperature (˚C)'}, inplace=True)
city_df.set_index('City')

The above table includes the land temperature, per month, for each major city around the world. This data will help show differences in the rate of the change of land temperatures around the world. It will be used to identify any relationship between a city's distance from the equator and the change in its land temperature.

In [None]:
#Create a new column in city_df that measures the percent change in temperature month over month
city_df['percent change in temp'] = city_df['City Average Temperature (˚C)'].pct_change(fill_method = 'ffill')
city_df['percent change in temp'] = city_df['percent change in temp'] * 100
city_df['percent change in temp'] = city_df['percent change in temp'].round(2)
#Find the distance of each city from the equator by multiplying the degrees portion of its latitude by 111.045 km.
city_df['dist from equator (km)'] = city_df['Latitude']
city_df['dist from equator (km)'] = city_df['dist from equator (km)'][:-1]
city_df['dist from equator (km)'] = city_df['dist from equator (km)'].replace({'N': '', 'S': ''}, regex=True)
city_df['dist from equator (km)'] = pd.to_numeric(city_df['dist from equator (km)']) * 111.045
city_df.set_index('City')

The above table includes additional information about the average land temperature for each major city.
It adds the percent change in the temperature month over month as well as each city's distance from the equator in km.

## Summary Statistics

In [None]:
glob_df.describe().round(2)

The table above includes a summary statistic of the dataframe glob_df. It computes different statistical values that may be important to the research.

In [None]:
city_df.describe().round(2)

The table above includes a summary statistic of the dataframe city_df. It computes different statistical values that may be important to the research.

## Visual Representations

In [None]:
# Create a plot, from the dataframe glob_df, that has y as the percent change of land temperature and x as time.
# Some early values are ommitted due to the high uncertaunty around them and so, the data starts 1500 months after 
#January 1750 so, January 1875.

fig, ax = plt.subplots()
temp_co2.plot(
    kind = 'scatter', x='Annual CO₂ emissions (tonnes )', y= 'Land Average Temperature (˚C)', color='b',
    legend = False, ax=ax, ylim=[-1, 5]
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Annual avg land temp vs. Co2 emissions")


This scatter plot demonstrates the relationship between the average temperature of the Earth compared to Co2 emissions. Although there are many points that lie close to 0 (which is likely due to some countries not reporting their emissions until recently), the positive relationship is clear! When global Co2 emissions are higher, a higher global land temperature is observed.

In [None]:
# Create a boxplot, from the dataframe city_df, that has the p Average land temperature by a city's distance
#from the equator in kilometres.
city_df.boxplot('City Average Temperature (˚C)', by='dist from equator (km)', figsize=(10,10))

This boxplot demonstrates that the interquartile range lies closer to the maximum value than the minimum. This shows that, coupled with the graph titled "Diff in city and global temp vs dist from equator", there is more variance in the temperatures of cities further away from the equator. This is shown by the difference in the maximum and minimum values and the interquartile ranges.

Although it is not easy to read the ticks on the x-axis, it is still clear what message the graph is trying to send and that does not take away from its values.

In [None]:
#Create histogram using dataframe glob_df that plots the percent change in the global average tempretature.
   
fig, ax = plt.subplots()
glob_df.plot(
    kind = 'hist', y='percent change in temp', color='b',
    bins = 400, legend = False, density = False, ax=ax, xlim=(-200,200)
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Pct Change in Global Land Temp since 1750")

This histogram shows that there are  more observation with a negative percent change in tempereature (month over month) than there are positive ones. This means that there are more months where the average global land temperature went down than months where it went up. However, it seems as if the months that had a negative percent change had values that were closer to zero compared to months where the percent change was positive.

## The Message

The main question being investigated in this report is, as stated earlier:
    

Does data about the land temperature support claims about global warming? Are Co2 emissions related to any changes in the temperature of the land? If the data supports global warming, are some areas of the globe warming faster than others?


Thus far, although there is no definitive answer, the data I'm exploring demonstrates that there is evidence to back up the claim that the Earth is warming and that global warming is a reality. Also, my visualizations show that the average land temperature is rising and there is a positive relationship between the land temperature and Co2 emissions. Additionally, the visualizations I've created and the ones that will follow this portion show that there is a relatively even warming of the Earth with no geographic location warming especially faster than another.

Here are some more visual representations I would like to create:

![Caption](https://i.postimg.cc/FR4bLN7K/IMG-9196.png)

![Caption](https://i.postimg.cc/WzKwtrZB/IMG-9197.png)

## Additional Visual Representations

.

.

.

.

.

.

.

.

.

In [None]:
#Creating first visual representations by merging glob_df and city_df dataframes and adding a column that contains
#The difference between a city's average temperature and the global average temperature in 2010

#First, I will convert all dates to datetime format
city_df['dt'] = pd.to_datetime(city_df['dt'])

#Merge glob_df and city_df and find difference in global and city temp
glob_city = glob_df.merge(city_df, left_on='dt', right_on='dt')
glob_city['temp difference (˚C)'] = glob_city['Land Average Temperature (˚C)'] - \
        glob_city['City Average Temperature (˚C)']
glob_city_2010 = glob_city[glob_city['dt'] == '2010-01-01']
glob_city_2010
glob_city_2010.drop(['Latitude','percent change in temp_x','percent change in temp_y'], axis=1)

In [None]:
#Create lineplot with temp difference as the y variable and distance from the equator as the x axis.
fig, ax = plt.subplots()
glob_city_2010.plot(
    kind = 'scatter',x='dist from equator (km)', y='temp difference (˚C)', color='orange',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.8))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Diff in city and global temp vs dist from equator")

After creating a line plot, it was clear that a scatter plot would be more appropriate for this date. This graph is rather informative. It helps answer the part of the message that examines if different parts of the world are warming at different rates. Cities that were far away and close to the equator had average land temperatures that were much higher than those somewhat far away. This makes sense because those that are a bit far away from the equator have average temperatures that are close to the global average. However, these scatter points have a quadratic tendency and there is some heterogeneity.

In [None]:
#I will recreate the glob_df dataframe to now include data about the cumulative land and ocean temperatures
glob_data = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Create new dataframe land_ocean_df that includes the Average land temperature and the global land and ocean 
#temperature
land_ocean_df = pd.DataFrame(glob_land_temp)
land_ocean_df = land_ocean_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty'], axis=1)
land_ocean_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)', \
                              'LandAndOceanAverageTemperature': 'Land and Ocean Avg Temp (˚C)'}, inplace=True)
land_ocean_df

This table includes the global average land temperature as well as the global average cumulative land and ocean temperatures.

In [None]:
#Create lineplot with Land Average Temperature as the x variable and Land and Ocean Avg Temp as the y axis.
fig, ax = plt.subplots()
land_ocean_df.plot(
    kind = 'scatter',x='Land Average Temperature (˚C)', y='Land and Ocean Avg Temp (˚C)', color='g',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.85, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Land temp vs. land and ocean temp")

This graph did not give the result that was expected. It demonstrates the relationship between the temperature of the land and the temperature of the land and oceans. Clearly, adding the ocean temp to the land temperature, when plotted against just the land tempretaure, shows a relationship that seems linear. This tells us that, on average, the ocean temperature increases similarly to the land tempretaure.

In [None]:
#Read and clean new dataset that contains Land Temperatures by country
cntry_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByCountry.csv')

cntry_df = pd.DataFrame(cntry_land_temp)
cntry_df = cntry_df.drop(['AverageTemperatureUncertainty'], axis = 1)
cntry_df.rename(columns={'AverageTemperature': 'Country Average Temperature (˚C)'}, inplace=True)

temp_1850 = cntry_df[cntry_df['dt'] == '1850-07-01']

In [None]:
#Read file with world map information and add geometric information to dataframe with land temperature
#according to country
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")
world.loc['USA' ,'name'] = 'United States'
world.loc['COD', 'name'] ='Congo (Democratic Republic Of The)'

In [None]:
#Plot world map of countries with their color corresponsing to their land temperature in July, 1900

temp_1900 = cntry_df[cntry_df['dt'] == '1900-07-01']
world_1900 = world.merge(temp_1900, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_1900.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 1900 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2000

temp_2000 = cntry_df[cntry_df['dt'] == '2000-07-01']
world_2000 = world.merge(temp_2000, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2000.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2000 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2010

temp_2010 = cntry_df[cntry_df['dt'] == '2010-07-01']
world_2010 = world.merge(temp_2010, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2010.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2010 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')

# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()


These three maps are colour coded according to each country's average land temperature in July 1900, 2000, and 2010 respectively. The aim of these maps is to demonstrate how each country's average temperature changed over time and, although the differences in the shades of each country are slight, the fact that there are any difference at all is significant.

In [None]:
co2_1900 = co2_df[co2_df['Year']== '1900']
temp_co2_1900 = co2_1900.merge(temp_1900, left_on='Entity', right_on='Country', how='right')
temp_co2_1900.rename(columns={'Country Average Temperature (˚C)': 'Country temp'}, inplace=True)

In [None]:
temp_co2_geojson=GeoJSONDataSource(geojson=temp_co2_1900.to_json())

color_mapper = LinearColorMapper(palette = brewer['RdBu'][10], low = 0, high = 1)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
                     border_line_color=None,location = (0,0), orientation = 'horizontal')
hover = HoverTool(tooltips = [ ('Country','@country'),
                              ('Annual CO₂ emissions (tonnes)', '@Annual CO₂ emissions (tonnes )'),
                               ('Avg Temperature (˚C)','@Country Average Temperature (˚C)')])
v = figure(title="Wisconsin Voting in 2016 Presidential Election", tools=[hover])
v.patches("xs","ys",source=temp_co2_geojson,
          fill_color = {'field' :'Country Average Temperature (˚C)', 'transform' : color_mapper})
v.add_layout(color_bar, 'below')
show(v)

#I attempted to make an interactive map but couldn't. I will keep this code to attempt to debug for the future. If you
#have any tips on how to fix this and then include it in the next project, that would be greatly appreciated!

## New Dataset

In this project, compared to the last one, I added a new major dataset that contains Co2 emissions. After all, it is difficult to have any discussion of global warming and climate change without considering carbon dioxide emissions. This new dataset helped me refine my message and allowed me to create new, more intuitive, and more useful graphs that help me respond to my question more directly.

## Conclusion

To conclude, the question I am trying to answer is if data about the temperature of the Earth supports the claim that there is global warming. Also, part of my question is whether or not Co2 emissions are associated with an increasing global land temperature. My findings thus far, which are demonstrated in the visualizations, support the claim that the Earth is steadily warming. The strongest evidence in favor of this hypothesis is that there is a positive relationship between Co2 emissions and the temperature of the Earth. 

My maps also support this hypothesis, although the differences in the shades of each country are slight the fact that there is a difference is significant. It is widely accepted that even slight changes in the temperature of the Earth have catastrophic events on the environment and warrants that there is an overhaul in the way we live life. Global warming is a serious problem and this data supports its existence. It is difficult to doubt the effect that pollution has on our environemnt after considering these visualizations.

# Project 3

# Are global warming and climate change supported by data? What is the relationship between Co2 emissions and the global land temperature? How is the rate of warming different in different places around the world?

Faisal Alkhalili

1004723427

ECO225

## Introduction

This research project aims to use records of the average land temperatures around the world categorized globally, by country, and by city to investigate whether climate change and global warming are supported by data. Additional data that will be used is global Co2 emissions data and per capita co2 emissions categorized by country. There will also be a focus on the warming effect of Co2 emissions which translates to pollution. Essentially, this research will look at how the rate of the rise of land temperatures has generally changed since 1750 and if these changes correspond to changes in the emission of Co2. Also, this report will examine if the rate of warming is different in different cities as well as if the oceans are warming faster than land.

The data that will be used is sourced from Berkeley Earth, an agency that has made several archives of environmental data avialable. The data contains monthly land temperature recordings that begin in 1750. After the year 1850, the data started included maximum and minimum values for each month. Some subsets of the data contain the monthly land temperature values categorized according to the country, city, major city, and US state. It's important to consider that this data begins around the same time that the Industrial Revolution is thought to have started. This is generally considered as the time that industrial pollution began to have seriously adverse effects on the climate and the temperature of the Earth. The additional data, Co2 emissions, is sourced from World Bank. This data will corespond to the level of pollution around the world. The dataset includes Co2 emissions categorized by country including a subset that contains cumulative global Co2 emissions. The outcome that is being considered in this research is the change in the average land temperature and the two main independent variables are pollution generally (which is essentially represented by Co2 emissions) and location. The per capita co2 emissions data will be sourced from the Wikipedia article titled, "List of countries by carbon dioxide emissions per capita" and will be scraped.

## Raw Data 

In [None]:
data_url = "https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data"

In [None]:
import matplotlib.colors as mplc
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm #for linear regression: sm.ols
import geopandas as gpd
from IPython.display import display, Math, Latex

from shapely.geometry import Point

from pandas_datareader import DataReader

%matplotlib inline
import qeds
qeds.themes.mpl_style();
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
from bokeh.palettes import OrRd
import seaborn as sns
import requests
from bs4 import BeautifulSoup
import urllib.request
import time

In [None]:
#Reading the first dataset, Global average land temperatures.
glob_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Turning the first dataset into a dataframe. The data is cleaned by dropping irrelevant columns.
glob_df = pd.DataFrame(glob_land_temp)
glob_df = glob_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty','LandAndOceanAverageTemperature'], axis=1)
glob_df

The table above represents the raw data for the average global land temperature, per month, since 1750. This is going to be this research's main source of information for the global average land temperature.

In [None]:
#Creating a new column in the dataframe glob_df that finds the percent change 
#in the average global land temperature, month over month.
glob_df['percent change in temp'] = glob_df['LandAverageTemperature'].pct_change()
glob_df['percent change in temp'] = glob_df['percent change in temp'] * 100
glob_df['percent change in temp'] = glob_df['percent change in temp'].round(2)
#Changing the average temperature from monthly intervals to yearly intervals to combine with emissions dataframe
glob_df['dt'] = pd.to_datetime(glob_df['dt'])
glob_df = glob_df.set_index('dt')
glob_df.resample('YS').mean()
glob_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)'}, inplace=True)

In [None]:
#Creating a new dataframe that contains the global co2 emissions for each country. This dataframe also has a row
#labeled "World" with the global emissions data
co2 = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/co2_emission.csv')
co2_df = pd.DataFrame(co2)
co2_df['Year'] = pd.to_datetime(co2_df['Year'], format='%Y')
co2_df

#Creating another dataframe that takes global Co2 emissions and is merged with global average land temperatures
world_co2_df = co2_df[co2_df["Entity"] == "World"]
temp_co2 = glob_df.merge(world_co2_df,right_on='Year', left_on='dt', how='right')

In [None]:
#Reading the second dataset, monthly average land temperatures by major city. 
#The data is cleaned and irrelevant columns are dropped.
city_land_temp = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByMajorCity.csv')
city_df = pd.DataFrame(city_land_temp)
city_df = city_land_temp.drop(['AverageTemperatureUncertainty', 'Longitude'], axis=1)
city_df.rename(columns={'AverageTemperature': 'City Average Temperature (˚C)'}, inplace=True)
city_df.set_index('City')

The above table includes the land temperature, per month, for each major city around the world. This data will help show differences in the rate of the change of land temperatures around the world. It will be used to identify any relationship between a city's distance from the equator and the change in its land temperature.

In [None]:
#Create a new column in city_df that measures the percent change in temperature month over month
city_df['percent change in temp'] = city_df['City Average Temperature (˚C)'].pct_change(fill_method = 'ffill')
city_df['percent change in temp'] = city_df['percent change in temp'] * 100
city_df['percent change in temp'] = city_df['percent change in temp'].round(2)
#Find the distance of each city from the equator by multiplying the degrees portion of its latitude by 111.045 km.
city_df['dist from equator (km)'] = city_df['Latitude']
city_df['dist from equator (km)'] = city_df['dist from equator (km)'][:-1]
city_df['dist from equator (km)'] = city_df['dist from equator (km)'].replace({'N': '', 'S': ''}, regex=True)
city_df['dist from equator (km)'] = pd.to_numeric(city_df['dist from equator (km)']) * 111.045
city_df.set_index('City')

The above table includes additional information about the average land temperature for each major city.
It adds the percent change in the temperature month over month as well as each city's distance from the equator in km.







## Summary Statistics 

In [None]:
glob_df.describe().round(2)

The table above includes a summary statistic of the dataframe glob_df. It computes different statistical values that may be important to the research.

In [None]:
city_df.describe().round(2)

The table above includes a summary statistic of the dataframe city_df. It computes different statistical values that may be important to the research.

<br>
<br>
<br>
<br>
<br>
<br>


## Visual Representations

### Additional Dataset 

In [None]:
# Create a plot, from the dataframe glob_df, that has y as the percent change of land temperature and x as time.
# Some early values are ommitted due to the high uncertaunty around them and so, the data starts 1500 months after 
# January 1750 so, January 1875.

fig, ax = plt.subplots(figsize=(7.5,7.5))
temp_co2.plot(
    kind = 'scatter', x='Annual CO₂ emissions (tonnes )', y= 'Land Average Temperature (˚C)', color='b',
    legend = False, ax=ax, ylim=[-1, 5]
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Annual avg land temp vs. Co2 emissions")

This scatter plot demonstrates the relationship between the average temperature of the Earth compared to Co2 emissions. Although there are many points that lie close to 0 (which is likely due to some countries not reporting their emissions until recently), the positive relationship is clear! When global Co2 emissions are higher, a higher global land temperature is observed.

In [None]:
# Create a new dataframe that contains a city's log average temperature and contains its log distance from the equator.
city_df['Log City Avg Temp (˚C)'] = np.log(city_df['City Average Temperature (˚C)'])
city_df['Log Dist from equator (km)'] = np.log(city_df['dist from equator (km)'])

dist_2000 = city_df[city_df['dt'] == '2000-07-01']
dist_2000['Log Dist from equator (km)'] = dist_2000['Log Dist from equator (km)'].round(1)
dist_2000['City Average Temperature (˚C)'] = dist_2000['City Average Temperature (˚C)'].round()


In [None]:
# Create a boxplot, from the dataframe city_df, that has the log Average land temperature by the log city's distance
#from the equator in kilometres.
fig, ax = plt.subplots(figsize=(20, 10))
box = sns.boxplot(x='Log Dist from equator (km)', y='City Average Temperature (˚C)', data=dist_2000, ax=ax)

This boxplot demonstrates that cities farther away from the equator have aberage temperatures that are lower yet the relationship is not very strong. This visualization shows us that there might not be as strong of a relationship between's how far a city is from teh equator and its average temperature which may have strong implications for climate change.

In [None]:
#Create histogram using dataframe glob_df that plots the percent change in the global average tempretature.
   
fig, ax = plt.subplots(figsize=(15, 5))
glob_df.plot(
    kind = 'hist', y='percent change in temp', color='b',
    bins = 400, legend = False, density = False, ax=ax, xlim=(-200,200)
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Pct Change in Global Land Temp since 1750")

This histogram shows that there are  more observation with a negative percent change in tempereature (month over month) than there are positive ones. This means that there are more months where the average global land temperature went down than months where it went up. However, it seems as if the months that had a negative percent change had values that were closer to zero compared to months where the percent change was positive.

## The Message

The main question being investigated in this report is, as stated earlier:

Does data about the land temperature support claims about global warming? Are Co2 emissions related to any changes in the temperature of the land? If the data supports global warming, are some areas of the globe warming faster than others?


Thus far, although there is no definitive answer, the data I'm exploring demonstrates that there is evidence to back up the claim that the Earth is warming and that global warming is a reality. Also, my visualizations show that the average land temperature is rising and there is a positive relationship between the land temperature and Co2 emissions. Additionally, the visualizations I've created and the ones that will follow this portion show that there is a relatively even warming of the Earth with no geographic location warming especially faster than another.

## Additional Visual Representations

In [None]:
#Creating first visual representations by merging glob_df and city_df dataframes and adding a column that contains
#The difference between a city's average temperature and the global average temperature in 2010

#First, I will convert all dates to datetime format
city_df['dt'] = pd.to_datetime(city_df['dt'])

#Merge glob_df and city_df and find difference in global and city temp
glob_city = glob_df.merge(city_df, left_on='dt', right_on='dt')
glob_city['temp difference (˚C)'] = glob_city['Land Average Temperature (˚C)'] - \
        glob_city['City Average Temperature (˚C)']
glob_city_2010 = glob_city[glob_city['dt'] == '2010-01-01']
glob_city_2010 = glob_city_2010.drop(['Latitude','percent change in temp_x','percent change in temp_y'], axis=1)

In [None]:
#Create lineplot with temp difference as the y variable and distance from the equator as the x axis.
fig, ax = plt.subplots(figsize=(7.5,7.5))
glob_city_2010.plot(
    kind = 'scatter',x='dist from equator (km)', y='temp difference (˚C)', color='orange',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.8))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Diff in city and global temp vs dist from equator")

After creating a line plot, it was clear that a scatter plot would be more appropriate for this date. This graph is rather informative. It helps answer the part of the message that examines if different parts of the world are warming at different rates. Cities that were far away and close to the equator had average land temperatures that were much higher than those somewhat far away. This makes sense because those that are a bit far away from the equator have average temperatures that are close to the global average. However, these scatter points have a quadratic tendency and there is some heterogeneity.

In [None]:
#I will recreate the glob_df dataframe to now include data about the cumulative land and ocean temperatures
glob_data = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Create new dataframe land_ocean_df that includes the Average land temperature and the global land and ocean 
#temperature
land_ocean_df = pd.DataFrame(glob_land_temp)
land_ocean_df = land_ocean_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty'], axis=1)
land_ocean_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)', \
                              'LandAndOceanAverageTemperature': 'Land and Ocean Avg Temp (˚C)'}, inplace=True)
land_ocean_df

This table includes the global average land temperature as well as the global average cumulative land and ocean temperatures.

In [None]:
#Create lineplot with Land Average Temperature as the x variable and Land and Ocean Avg Temp as the y axis.
fig, ax = plt.subplots(figsize=(7.5,7.5))
land_ocean_df.plot(
    kind = 'scatter',x='Land Average Temperature (˚C)', y='Land and Ocean Avg Temp (˚C)', color='g',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.85, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Land temp vs. land and ocean temp")

This graph did not give the result that was expected. It demonstrates the relationship between the temperature of the land and the temperature of the land and oceans. Clearly, adding the ocean temp to the land temperature, when plotted against just the land tempretaure, shows a relationship that seems linear. This tells us that, on average, the ocean temperature increases similarly to the land tempretaure.

## Scraped Data 

In [None]:
# Set URL of site from which I want to scrape the data. 
# Also, I find what html code separates each value I would like to isolate.
web_url = 'https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita'
response = requests.get(web_url)
soup_object = BeautifulSoup(response.content)
data_table = soup_object.find_all('table', 'wikitable sortable')[0]
all_values = data_table.find_all('tr')

In [None]:
# Create an empty dataframe and a for loop which fills it with the scraped data I need.
c02_per_capita_df = pd.DataFrame(columns = ['country', '2015 per capita co2 emissions (tonnes)'])
ix = 0 # Initialise index to zero

for row in all_values[1:]:
    values = row.find_all('td') 
    country = values[0].text
    emissions_2015 = values[-2].text
    
    c02_per_capita_df.loc[ix] = [country, emissions_2015]
    ix += 1

In [None]:
# Clean dataframe with scraped data
c02_per_capita_df['2015 per capita co2 emissions (tonnes)'] = pd.to_numeric(c02_per_capita_df[
    '2015 per capita co2 emissions (tonnes)'], errors='coerce')
c02_per_capita_df = c02_per_capita_df.sort_values(by=['2015 per capita co2 emissions (tonnes)'], ascending = False)
c02_per_capita_df = c02_per_capita_df.dropna()
c02_per_capita_df = c02_per_capita_df.rename(columns={'country':'Country'})

In [None]:
# Create barplot that sorts countries according to their per capita co2 emissions.
sns.set(font_scale=0.9)
fig, ax = plt.subplots(figsize=(5, 32))

sns.barplot(x=c02_per_capita_df['2015 per capita co2 emissions (tonnes)'], y=c02_per_capita_df['Country'], 
            color='g', ax=ax, data=c02_per_capita_df, orient='h')

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

This barplot sorts all countries according to their per capita co2 emissions. It is helpful to this discussions because the countries at the top are not countries people would normally expect. This shows that per capita emissions is not a very good indicator of total emissions and tells us more about the size of a country's population than its emissions.

In [None]:
# Create a new dataframe the merges scraped data with previous data on each country's average land temperature. 
# I also clean the merged dataframes
cntry_2010 = cntry_df[cntry_df['dt'] == '2010-07-01']
cntry_2000 = cntry_df[cntry_df['dt'] == '2000-07-01']
cntry_diff = cntry_2010.merge(cntry_2000, on='Country', how='outer')

cntry_diff = cntry_diff.rename(columns={'Country Average Temperature (˚C)_x': 'Avg Temp 2010 (˚C)', 
                   'Country Average Temperature (˚C)_y': 'Avg Temp 2000 (˚C)'})
cntry_diff['Temp diff (˚C)'] = cntry_diff['Avg Temp 2010 (˚C)'] - cntry_diff['Avg Temp 2000 (˚C)']
cntry_diff['Country'] = cntry_diff['Country'].str.strip()
c02_per_capita_df['Country'] = c02_per_capita_df['Country'].str.strip()
cntry_c02 = pd.merge(cntry_diff, c02_per_capita_df, on='Country', how='outer')
cntry_c02 = cntry_c02.dropna()


In [None]:
#Create lineplot with temp difference as the y variable and distance from the equator as the x axis.
fig, ax = plt.subplots(figsize=(5, 5))
cntry_c02.plot(
    kind = 'scatter',x='2015 per capita co2 emissions (tonnes)', y='Temp diff (˚C)', color='orange',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Change in temp vs per capita c02 emissions")

This graph plots the 2015 per capita co2 emissions against each country's change in land temperature between 2010 and 2000. It is clear that there are many outliers in this graph and that per capita measures of pollutions are not good indicators of the warming of each country or their overall contribution to global warming.

In [None]:
#Read and clean new dataset that contains Land Temperatures by country
cntry_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByCountry.csv')

cntry_df = pd.DataFrame(cntry_land_temp)
cntry_df = cntry_df.drop(['AverageTemperatureUncertainty'], axis = 1)
cntry_df.rename(columns={'AverageTemperature': 'Country Average Temperature (˚C)'}, inplace=True)

temp_1850 = cntry_df[cntry_df['dt'] == '1850-07-01']

In [None]:
#Read file with world map information and add geometric information to dataframe with land temperature
#according to country
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")
world.loc['USA' ,'name'] = 'United States'
world.loc['COD', 'name'] ='Congo (Democratic Republic Of The)'

In [None]:
#Plot world map of countries with their color corresponsing to their land temperature in July, 1900

temp_1900 = cntry_df[cntry_df['dt'] == '1900-07-01']
world_1900 = world.merge(temp_1900, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_1900.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 1900 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2000

temp_2000 = cntry_df[cntry_df['dt'] == '2000-07-01']
world_2000 = world.merge(temp_2000, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2000.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2000 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2010

temp_2010 = cntry_df[cntry_df['dt'] == '2010-07-01']
world_2010 = world.merge(temp_2010, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2010.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=40 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2010 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')

# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()


These three maps are colour coded according to each country's average land temperature in July 1900, 2000, and 2010 respectively. The aim of these maps is to demonstrate how each country's average temperature changed over time and, although the differences in the shades of each country are slight, the fact that there are any difference at all is significant.

## New Datasets

I added a new major dataset that contains global Co2 emissions. After all, it is difficult to have any discussion of global warming and climate change without considering carbon dioxide emissions. This new dataset helped me refine my message and allowed me to create new, more intuitive, and more useful graphs that help me respond to my question more directly.

Additionally, I scraped data about each country's co2 emissions per capita. This new data will help determine whether or not per capita emissions are a good indicator of a country's contribution to global warming. Per capita co2 emissions are often reported by countries to show their management of pollution and so, I would like to explore if this is truly a good indicator of a country's carbon footprint.

##  Analysis

Thus far, the different visual representations have been helpful at answering separate parts of the research question. This portion of the project aims to consolidate all the data collected and all the visual representations presented to provide a clear answer to the question.

Firstly, it is clear that data about the average land temperature throughout history does in fact support claims about global warming. In the histogram titled "Pct Change in Global Land Temp since 1750", we can observe that although it seems like there are more instances where the global average temperature is decreasing, the skew in the plot demonstrates that there are more extreme changes in the temperature in the positive direction than the negative. Also, the maps presented show a slight change in the shades of the countries towards darker hues and although the differences are subtle, they have substantial implications with regards to the future of our planet.

Next, the relationship between increases in Co2 emissions and the temperature of the Earth is practically undeniable. The graph titled "Annual avg land temp vs. Co2 emissions" shows the positive relationship between the increasing average global land temperature and the increasing level of co2 emissions. However the scraped data, which shows the 2015 per capita c02 emissions per country, shows us that the per capita measure of emissions is not a very good indicator of a country's contribution to general global warming. After finding the change in temperature for each country between 2010 and 2000 and then plotting that difference against the per capita emissions in 2015, there is no clear relationship. The visual representation corresponding to this claim is titled, "Change in temp vs per capita c02 emissions". This is likely due to large, populous countries which are very indsutrious having relatively low per capita emissions despite large changes in their land temperature.

Lastly, the final part of the questions asks whether some parts of the world are warming faster than others. The data demonstrates that the oceans are not warming at a rate that is faster than land. The graph titled, "Land temp vs. land and ocean temp" shows that there is an extremely strong linear relationship between the temperature of the land and the combined temperature of the land and oceans which signals that the two measures grow linearily. As for cities further away from the equator, the graph, "Diff in city and global temp vs dist from equator" shows that countries further away from the equator have a temperature that varies from the global average with the relationship seemingly having quadratic tendencies. That is, as countries get further away, the difference between the local and global temperature grows faster than the distance grows.

## Conclusion

To conclude, the question I am trying to answer is if data about the temperature of the Earth supports the claim that there is global warming. Also, part of my question is whether or not Co2 emissions are associated with an increasing global land temperature. My findings, which are demonstrated in the visualizations, support the claim that the Earth is steadily warming. The strongest evidence in favor of this hypothesis is that there is a positive relationship between Co2 emissions and the temperature of the Earth. However, there is no evidence that distance from the equator or per capita co2 emissions are strong indicators for global warming. Despite this, there is conclusive evidence to assert that global warming is in fact a reality that we are facing and that co2 emissions are strongly related to this reality


My maps also support this hypothesis, although the differences in the shades of each country are slight the fact that there is a difference is significant. It is widely accepted that even slight changes in the temperature of the Earth have catastrophic events on the environment and warrants that there is an overhaul in the way we live life. Global warming is a serious problem and this data supports its existence. It is difficult to doubt the effect that pollution has on our environment after considering these visualizations.

# Final Project

Faisal Alkhalili

1004723427

ECO225

## Introduction

This research project aims to use records of the average land temperatures around the world categorized globally, by country, and by city to investigate whether climate change and global warming are supported by data. Additional data that will be used is global CO2 emissions data and per capita CO2 emissions categorized by country. There will also be a focus on the warming effect of CO2 emissions which translates to pollution. Essentially, this research will look at how the rate of the rise of land temperatures has generally changed since 1750 and if these changes correspond to changes in the emission of CO2. Also, this report will examine if the rate of warming is different in different cities as well as if the oceans are warming faster than land.

The data that will be used is sourced from Berkeley Earth, an agency that has made several archives of environmental data avialable. The data contains monthly land temperature recordings that begin in 1750. After the year 1850, the data started included maximum and minimum values for each month. Some subsets of the data contain the monthly land temperature values categorized according to the country, city, major city, and US state. It's important to consider that this data begins around the same time that the Industrial Revolution is thought to have started. This is generally considered as the time that industrial pollution began to have seriously adverse effects on the climate and the temperature of the Earth. The additional data, CO2 emissions, is sourced from World Bank. This data will corespond to the level of pollution around the world. The dataset includes CO2 emissions categorized by country including a subset that contains cumulative global CO2 emissions. The outcome that is being considered in this research is the change in the average land temperature and the two main independent variables are pollution generally (which is essentially represented by CO2 emissions) and location. The per capita CO2 emissions data will be sourced from the Wikipedia article titled, "List of countries by carbon dioxide emissions per capita" and will be scraped.

## Raw Data

In [None]:
data_url = "https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data"

In [None]:
import matplotlib.colors as mplc
import matplotlib.patches as patches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm #for linear regression: sm.ols
import geopandas as gpd
from IPython.display import display, Math, Latex

from shapely.geometry import Point

from pandas_datareader import DataReader

%matplotlib inline
import qeds
qeds.themes.mpl_style();
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
from bokeh.palettes import OrRd
import seaborn as sns
import requests
from bs4 import BeautifulSoup
import urllib.request
import time
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
from linearmodels.iv import IV2SLS

In [None]:
#Reading the first dataset, Global average land temperatures.
glob_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Turning the first dataset into a dataframe. The data is cleaned by dropping irrelevant columns.
glob_df = pd.DataFrame(glob_land_temp)
glob_df = glob_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty','LandAndOceanAverageTemperature'], axis=1)
glob_df

The table above represents the raw data for the average global land temperature, per month, since 1750. This is going to be this research's main source of information for the global average land temperature.

In [None]:
#Creating a new column in the dataframe glob_df that finds the percent change 
#in the average global land temperature, month over month.
glob_df['percent change in temp'] = glob_df['LandAverageTemperature'].pct_change()
glob_df['percent change in temp'] = glob_df['percent change in temp'] * 100
glob_df['percent change in temp'] = glob_df['percent change in temp'].round(2)
#Changing the average temperature from monthly intervals to yearly intervals to combine with emissions dataframe
glob_df['dt'] = pd.to_datetime(glob_df['dt'])
glob_df = glob_df.set_index('dt')
glob_df.resample('YS').mean()
glob_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)'}, inplace=True)

In [None]:
#Creating a new dataframe that contains the global CO2 emissions for each country. This dataframe also has a row
#labeled "World" with the global emissions data
co2 = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/co2_emission.csv')
co2_df = pd.DataFrame(co2)
co2_df['Year'] = pd.to_datetime(co2_df['Year'], format='%Y')
co2_df

#Creating another dataframe that takes global Co2 emissions and is merged with global average land temperatures
world_co2_df = co2_df[co2_df["Entity"] == "World"]
temp_co2 = glob_df.merge(world_co2_df,right_on='Year', left_on='dt', how='right')
temp_co2.drop(temp_co2.index[:100],0,inplace=True)

In [None]:
#Reading the second dataset, monthly average land temperatures by major city. 
#The data is cleaned and irrelevant columns are dropped.
city_land_temp = pd.read_csv('~/Desktop/School/Uoft/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByMajorCity.csv')
city_df = pd.DataFrame(city_land_temp)
city_df = city_land_temp.drop(['AverageTemperatureUncertainty', 'Longitude'], axis=1)
city_df.rename(columns={'AverageTemperature': 'City Average Temperature (˚C)'}, inplace=True)
city_df.set_index('City')

The above table includes the land temperature, per month, for each major city around the world. This data will help show differences in the rate of the change of land temperatures around the world. It will be used to identify any relationship between a city's distance from the equator and the change in its land temperature.

In [None]:
#Create a new column in city_df that measures the percent change in temperature month over month
city_df['percent change in temp'] = city_df['City Average Temperature (˚C)'].pct_change(fill_method = 'ffill')
city_df['percent change in temp'] = city_df['percent change in temp'] * 100
city_df['percent change in temp'] = city_df['percent change in temp'].round(2)
#Find the distance of each city from the equator by multiplying the degrees portion of its latitude by 111.045 km.
city_df['dist from equator (km)'] = city_df['Latitude']
city_df['dist from equator (km)'] = city_df['dist from equator (km)'][:-1]
city_df['dist from equator (km)'] = city_df['dist from equator (km)'].replace({'N': '', 'S': ''}, regex=True)
city_df['dist from equator (km)'] = pd.to_numeric(city_df['dist from equator (km)']) * 111.045
city_df.set_index('City')

The above table includes additional information about the average land temperature for each major city.
It adds the percent change in the temperature month over month as well as each city's distance from the equator in km.

## Summary Statistics 

In [None]:
glob_df.describe().round(2)

The table above includes a summary statistic of the dataframe glob_df. It computes different statistical values that may be important to the research.

In [None]:
city_df.describe().round(2)

The table above includes a summary statistic of the dataframe city_df. It computes different statistical values that may be important to the research.

## Visual Representations

### Additional Dataset 

In [None]:
# Create a plot, from the dataframe temp_co2, that has y as the percent change of land temperature and x as CO2 emissions.
# Some early values are ommitted due to the high uncertaunty around them and so, the data starts 1500 months after 
# January 1750 so, January 1875.

fig, ax = plt.subplots(figsize=(7.5,7.5))
temp_co2.plot(
    kind = 'scatter', x='Annual CO₂ emissions (tonnes )', y= 'Land Average Temperature (˚C)', color='b',
    legend = False, ax=ax, ylim=[-1, 5]
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Annual avg land temp vs. CO2 emissions")

This scatter plot demonstrates the relationship between the average temperature of the Earth compared to CO2 emissions. Although there are many points that lie close to 0 (which is likely due to some countries not reporting their emissions until recently), the positive relationship is clear! When global CO2 emissions are higher, a higher global land temperature is observed.

In [None]:
# Create a new dataframe that contains a city's log average temperature and contains its log distance from the equator.
city_df['Log City Avg Temp (˚C)'] = np.log(city_df['City Average Temperature (˚C)'])
city_df['Log Dist from equator (km)'] = np.log(city_df['dist from equator (km)'])

dist_2000 = city_df[city_df['dt'] == '2000-07-01']
dist_2000['Log Dist from equator (km)'] = dist_2000['Log Dist from equator (km)'].round(1)
dist_2000['City Average Temperature (˚C)'] = dist_2000['City Average Temperature (˚C)'].round()

In [None]:
# Create a boxplot, from the dataframe city_df, that has the log Average land temperature by the log city's distance
#from the equator in kilometres.
fig, ax = plt.subplots(figsize=(20, 10))
box = sns.boxplot(x='Log Dist from equator (km)', y='City Average Temperature (˚C)', data=dist_2000, ax=ax)

This boxplot demonstrates that cities farther away from the equator have aberage temperatures that are lower yet the relationship is not very strong. This visualization shows us that there might not be as strong of a relationship between's how far a city is from teh equator and its average temperature which may have strong implications for climate change.

In [None]:
#Create histogram using dataframe glob_df that plots the percent change in the global average tempretature.
   
fig, ax = plt.subplots(figsize=(15, 5))
glob_df.plot(
    kind = 'hist', y='percent change in temp', color='g',
    bins = 400, legend = False, density = False, ax=ax, xlim=(-200,200)
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Percent Change in Global Land Temp since 1750", fontsize='17')

This histogram shows that there are  more observation with a negative percent change in tempereature (month over month) than there are positive ones. This means that there are more months where the average global land temperature went down than months where it went up. However, it seems as if the months that had a negative percent change had values that were closer to zero compared to months where the percent change was positive.

## The Message

The main question being investigated in this report is, as stated earlier:

Does data about the land temperature support claims about global warming? Are CO2 emissions related to any changes in the temperature of the land? If the data supports global warming, are some areas of the globe warming faster than others?


Thus far, although there is no definitive answer, the data I'm exploring demonstrates that there is evidence to back up the claim that the Earth is warming and that global warming is a reality. Also, my visualizations show that the average land temperature is rising and there is a positive relationship between the land temperature and CO2 emissions. Additionally, the visualizations I've created and the ones that will follow this portion show that there is a relatively even warming of the Earth with no geographic location warming especially faster than another.

## Additional Visual Representations

In [None]:
#Creating first visual representations by merging glob_df and city_df dataframes and adding a column that contains
#The difference between a city's average temperature and the global average temperature in 2010

#First, I will convert all dates to datetime format
city_df['dt'] = pd.to_datetime(city_df['dt'])

#Merge glob_df and city_df and find difference in global and city temp
glob_city = glob_df.merge(city_df, left_on='dt', right_on='dt')
glob_city['temp difference (˚C)'] = glob_city['Land Average Temperature (˚C)'] - \
        glob_city['City Average Temperature (˚C)']
glob_city_2010 = glob_city[glob_city['dt'] == '2010-01-01']
glob_city_2010 = glob_city_2010.drop(['Latitude','percent change in temp_x','percent change in temp_y'], axis=1)

In [None]:
#Create lineplot with temp difference as the y variable and distance from the equator as the x axis.
fig, ax = plt.subplots(figsize=(7.5,7.5))
glob_city_2010.plot(
    kind = 'scatter',x='dist from equator (km)', y='temp difference (˚C)', color='blue',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Diff in city temp and global temp vs dist from equator", fontsize='17')
ax.set_ylabel("Temp difference (˚C)", fontsize='15')
ax.set_xlabel("Distance from equator (km)", fontsize='15')

After creating a line plot, it was clear that a scatter plot would be more appropriate for this date. This graph is rather informative. It helps answer the part of the message that examines if different parts of the world are warming at different rates. Cities that were far away and close to the equator had average land temperatures that were much higher than those somewhat far away. This makes sense because those that are a bit far away from the equator have average temperatures that are close to the global average. However, these scatter points have a quadratic tendency and there is some heterogeneity.

In [None]:
#I will recreate the glob_df dataframe to now include data about the cumulative land and ocean temperatures
glob_data = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalTemperatures.csv')

In [None]:
#Create new dataframe land_ocean_df that includes the Average land temperature and the global land and ocean 
#temperature
land_ocean_df = pd.DataFrame(glob_land_temp)
land_ocean_df = land_ocean_df.drop(['LandAverageTemperatureUncertainty', 'LandMaxTemperature', 'LandMaxTemperatureUncertainty', 
                        'LandMinTemperature', 'LandAndOceanAverageTemperatureUncertainty', 
                        'LandMinTemperatureUncertainty'], axis=1)
land_ocean_df.rename(columns={'LandAverageTemperature': 'Land Average Temperature (˚C)', \
                              'LandAndOceanAverageTemperature': 'Land and Ocean Avg Temp (˚C)'}, inplace=True)
land_ocean_df

This table includes the global average land temperature as well as the global average cumulative land and ocean temperatures.

In [None]:
#Create lineplot with Land Average Temperature as the x variable and Land and Ocean Avg Temp as the y axis.
fig, ax = plt.subplots(figsize=(7.5,7.5))
land_ocean_df.plot(
    kind = 'scatter',x='Land Average Temperature (˚C)', y='Land and Ocean Avg Temp (˚C)', color='blue',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.85, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Land Temp vs. Land and Ocean Temp", fontsize='20')

This graph did not give the result that was expected. It demonstrates the relationship between the temperature of the land and the temperature of the land and oceans. Clearly, adding the ocean temp to the land temperature, when plotted against just the land tempretaure, shows a relationship that seems linear. This tells us that, on average, the ocean temperature increases similarly to the land tempretaure.

## New Datasets

I added a new major dataset that contains global CO2 emissions (used in vizualizations above). After all, it is difficult to have any discussion of global warming and climate change without considering carbon dioxide emissions. This new dataset helped me refine my message and allowed me to create new, more intuitive, and more useful graphs that help me respond to my question more directly.

Additionally, I scraped data about each country's CO2 emissions per capita. This new data will help determine whether or not per capita emissions are a good indicator of a country's contribution to global warming. Per capita CO2 emissions are often reported by countries to show their management of pollution and so, I would like to explore if this is truly a good indicator of a country's carbon footprint. The last  dataset is one that includes the population by country and is sourced from Ouw World in Data, a research and data agency supported by Oxford University.

In [None]:
#Reading the population dataset, Population by Country.
population = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/population.csv')

In [None]:
#Turning the population dataset into a dataframe and isolating the world population.
pop_df = pd.DataFrame(population)
pop_df.rename(columns={'Total population (Gapminder, HYDE & UN)':'Population'}, inplace=True)
world_pop = pop_df[pop_df['Entity'] == 'World']
world_pop = world_pop.reset_index()
world_pop = world_pop[30:]

## Scraped Data 

To suplement my data, I used web scraping in order to obtain data about each country's CO2 emissions per capita around the world. This data will help my research because it would allow me to discuss if per capita data is useful when discussing environmental damage and climate change. This can be done by assessing whether countries with high per capita CO2 emissions are countries with significant carbon footprints or if they are simply low population countries. The web scraped data is obtained from a Wikipedia article about CO2 emissions per capita for different countries.

In [None]:
# Set URL of site from which I want to scrape the data. 
# Also, I find what html code separates each value I would like to isolate.
web_url = 'https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions_per_capita'
response = requests.get(web_url)
soup_object = BeautifulSoup(response.content)
data_table = soup_object.find_all('table', 'wikitable sortable')[0]
all_values = data_table.find_all('tr')

In [None]:
# Create an empty dataframe and a for loop which fills it with the scraped data I need.
co2_per_capita_df = pd.DataFrame(columns = ['country', '2015 per capita co2 emissions (tonnes)'])
ix = 0 # Initialise index to zero

# This for loop extracts the values for each column in my dataframe from the variable that contains the html
# code of the Wikipedia page. Then, it puts the extracted values into the dataframe.
for row in all_values[1:]:
    values = row.find_all('td') 
    country = values[0].text
    emissions_2015 = values[-2].text
    
    co2_per_capita_df.loc[ix] = [country, emissions_2015]
    ix += 1

In [None]:
# Clean dataframe with scraped data
co2_per_capita_df['2015 per capita co2 emissions (tonnes)'] = pd.to_numeric(co2_per_capita_df[
    '2015 per capita co2 emissions (tonnes)'], errors='coerce')
co2_per_capita_df = co2_per_capita_df.sort_values(by=['2015 per capita co2 emissions (tonnes)'], ascending = False)
co2_per_capita_df = co2_per_capita_df.dropna()
co2_per_capita_df = co2_per_capita_df.rename(columns={'country':'Country'})
co2_per_capita_df = co2_per_capita_df.reset_index()
co2_per_capita_df.drop(co2_per_capita_df.index[60:],0,inplace=True)
#temp_co2.drop(temp_co2.index[:100],0,inplace=True)

In [None]:
# Create barplot that sorts countries according to their per capita co2 emissions.

sns.set(font_scale=1)
fig, ax = plt.subplots(figsize=(6, 15))

sns.barplot(x=co2_per_capita_df['2015 per capita co2 emissions (tonnes)'], y=co2_per_capita_df['Country'], 
            color='orange', ax=ax, data=co2_per_capita_df, orient='h')

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("2015 Co2 Emissions per Capita by Country",fontsize='20')
ax.set_xlabel('2015 per capita co2 emissions (tonnes)', fontsize='15')
ax.set_ylabel('Country', fontsize='17')

This barplot sorts all countries according to their per capita CO2 emissions. It is helpful to this discussions because the countries at the top are not countries people would normally expect. This shows that per capita emissions is not a very good indicator of total emissions and tells us more about the size of a country's population than its emissions.

In [None]:
#Read and clean new dataset that contains Land Temperatures by country
cntry_land_temp = pd.read_csv('~/Desktop/School/UofT/Third Year/ECO225/ECO225 Project 1/GlobalLandTemperaturesByCountry.csv')

cntry_df = pd.DataFrame(cntry_land_temp)
cntry_df = cntry_df.drop(['AverageTemperatureUncertainty'], axis = 1)
cntry_df.rename(columns={'AverageTemperature': 'Country Average Temperature (˚C)'}, inplace=True)

In [None]:
# Create a new dataframe the merges scraped data with previous data on each country's average land temperature. 
# I also clean the merged dataframes
cntry_2010 = cntry_df[cntry_df['dt'] == '2010-07-01']
cntry_2000 = cntry_df[cntry_df['dt'] == '2000-07-01']
cntry_diff = cntry_2010.merge(cntry_2000, on='Country', how='outer')

cntry_diff = cntry_diff.rename(columns={'Country Average Temperature (˚C)_x': 'Avg Temp 2010 (˚C)', 
                   'Country Average Temperature (˚C)_y': 'Avg Temp 2000 (˚C)'})
cntry_diff['Temp diff (˚C)'] = cntry_diff['Avg Temp 2010 (˚C)'] - cntry_diff['Avg Temp 2000 (˚C)']
cntry_diff['Country'] = cntry_diff['Country'].str.strip()
co2_per_capita_df['Country'] = co2_per_capita_df['Country'].str.strip()
cntry_co2 = pd.merge(cntry_diff, co2_per_capita_df, on='Country', how='outer')
cntry_co2 = cntry_co2.dropna()


In [None]:
#Create lineplot with temp difference as the y variable and distance from the equator as the x axis.
fig, ax = plt.subplots(figsize=(10, 7))
cntry_co2.plot(
    kind = 'scatter',x='2015 per capita co2 emissions (tonnes)', y='Temp diff (˚C)', color='blue',
    legend = False, ax=ax,
)

ax.set_facecolor((0.96, 0.96, 0.96))
fig.set_facecolor((0.9, 0.9, 0.9))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.set_title("Change in temp vs per capita c02 emissions")

This graph plots the 2015 per capita CO2 emissions against each country's change in land temperature between 2010 and 2000. It is clear that there are many outliers in this graph and that per capita measures of pollutions are not good indicators of the warming of each country or their overall contribution to global warming.

In [None]:
#Read file with world map information and add geometric information to dataframe with land temperature
#according to country
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")
world.loc['USA' ,'name'] = 'United States'
world.loc['COD', 'name'] ='Congo (Democratic Republic Of The)'

In [None]:
#Plot world map of countries with their color corresponsing to their land temperature in July, 1900

temp_1900 = cntry_df[cntry_df['dt'] == '1900-07-01']
world_1900 = world.merge(temp_1900, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_1900.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r',
    vmin=-3, vmax=45 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 1900 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2000

temp_2000 = cntry_df[cntry_df['dt'] == '2000-07-01']
world_2000 = world.merge(temp_2000, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2000.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=45 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2000 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')


# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

#Plot world map of countries with their color corresponsing to their land temperature in July, 2010

temp_2010 = cntry_df[cntry_df['dt'] == '2010-07-01']
world_2010 = world.merge(temp_2010, left_on = "name", right_on = "Country", how="left")
fig, gax = plt.subplots(figsize=(50,5))

#Plotting the Countries with colors according to land temperatures
world_2010.plot(
    ax=gax, edgecolor='black', column='Country Average Temperature (˚C)', legend=True, cmap='RdBu_r', 
    vmin=-3, vmax=45 #range of your column value for the color legend
)

# Format axes and title
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('World Land Temperatures in 2010 in (˚C)')
gax.annotate('Land Temperature in (˚C)', xy=(0.77, 0.06), xycoords='figure fraction')

# Removing spines
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()


These three maps are colour coded according to each country's average land temperature in July 1900, 2000, and 2010 respectively. The aim of these maps is to demonstrate how each country's average temperature changed over time and, although the differences in the shades of each country are slight, the fact that there are any difference at all is significant.

## Statistical Analysis

Under this section, I will be running separate regressions to determine the relationships between some of the variable I've introduced and how these relationships should be interpreted. Throughout this report, I have been commenting on the relationships shown in graphs and reaching some assumptions based on the visual representations. Now, I will be subjecting these assumptions to some statistical scrutiny and determining how said relationships should be interpreted and their relevance.

In all of the regression analyses that are to follow, some variation of land temperature either the global or city temperature represents the outcome variable. As for the independent variables, these comprise of geographic location, CO2 emissions, and CO2 emissions per capita.

However, the final regression involves the land temperature as the independent variable and the land and ocean temperature as the outcome variable. The importance of each regression and its contribution to my report will be explained under each respective section.

### Regression #1: CO2 Emissions versus Global Average Land Temperatures

The relationship between CO2 emissions and the global temperature around the world has been assumed to be a causal one throughout most discussions in this report. Although it is widely accepted that CO2 emissions are a direct cause for global warming and the rising land temperatures around the world, it would be valuable to assess the strength of this relationship as shown by this historical data used in this report.

This relationship is visualized in the scatter plot titled, "Annual avg land temp vs. CO2 emissions" and it seems like there is a linear relationship between the two. However, this regression is going to be more rigorous than a simple scatter plot and will include a key control variable: population. I will be including data about the world population as a control for the rising land temperatures. 

In [None]:
#Organizing the dataframe a little to prepare them for merging
temp_co2['year'] = pd.to_datetime(temp_co2['Year'], errors='coerce').dt.year
temp_co2 = temp_co2.drop('Year', axis=1)

#Merging the df with the global temperature and Co2 emissions with the population df.
temp_co2_pop = temp_co2.merge(world_pop, left_on='year', right_on='Year', how='inner')

In [None]:
#Merging the df with the global temperature and Co2 emissions with the population df.
temp_co2_pop = temp_co2.merge(world_pop, left_on='year', right_on='Year', how='inner')

In [None]:
#Cleaning the new dataframe to prepare it for a regression analysis. This involves dropping missing values
#and changing some of the units of the values in order to get results from the regression that are coherent.
temp_co2_pop = temp_co2_pop.dropna()
temp_co2_pop['Annual CO₂ emissions (tonnes )'] = temp_co2_pop['Annual CO₂ emissions (tonnes )']/10000000000
temp_co2_pop['Population'] = temp_co2_pop['Population']/100000000
temp_co2_pop.rename(columns={'Annual CO₂ emissions (tonnes )': 'Annual CO₂ emissions (ten billions tonnes)',
                            'Population': 'Population (billions)'}, inplace=True)

In [None]:
#Add a column for the constant and create regressions one with control variables and one without
temp_co2_pop['cons'] = 1

OLS1_X1 = ['cons', 'Annual CO₂ emissions (ten billions tonnes)']
OLS1_X2 = ['cons', 'Annual CO₂ emissions (ten billions tonnes)', 'Population (billions)']

OLS1_reg1 = sm.OLS(temp_co2_pop['Land Average Temperature (˚C)'], temp_co2_pop[OLS1_X1], missing='drop').fit()
OLS1_reg2 = sm.OLS(temp_co2_pop['Land Average Temperature (˚C)'], temp_co2_pop[OLS1_X2], missing='drop').fit()

In [None]:
#Specify the statistical analysis I would like to do on the fitted models and which statistical output to include.
info_dict={'R-squared' : lambda x: f"{x.rsquared:.2f}", 
           'No. observations' : lambda x: f"{int(x.nobs):d}"}

#Create table with summarized regression results to print which the two fitted models above and the relevant
#statistical output.
results_table = summary_col(results=[OLS1_reg1,OLS1_reg2],
                            float_format='%0.2f',
                            stars = True,
                            model_names=['Model 1',
                                         'Model 2'],
                            info_dict=info_dict,
                            regressor_order=['cons',
                                             'Annual CO₂ emissions (ten billions tonnes)',
                                             'Land Average Temperature (˚C)',
                                             'Population (billions)'])
print(OLS1_reg1.summary())

From these regression results, the relationship between Annual CO2 emissions, in ten billion tonnes, and the Global Average Land Temperature, in degrees Celsius, is statistically significant with a p-value of almost 0. The adjusted r-squred reported states that 42.3% of the variance in the Land temperature is explained by variance in the Annual CO2 emissions. Usually an r-squared of this magnitude essentially means that the relationship is moderate. The following table reports the same relationship but while controlling for population and is the stronger regression of the two.

In [None]:
#Print regression summary with both regression specified in previous cell.
print(results_table)

From the above summary, the relationship that has been assumed throughout the report somewhat falls apart. The effect of an additional ten billion tonnes of CO2 emissions per year is associated with an almost 1 degree celsius decrease in the annual average temperature. This relationship is statistically significant. What is also worrying is that the control variable also has a statistically significant relationship with the outcome variable. Although the p-value for these relationships is not reported in this summary, these relationships are all significant at the 99% significance level. 

These models are important in the general scheme of the report because they suggest that the relationship that should have been further investigated is that between population growth and land temperature. Additionally, it may have been useful to include more controls in this regression. These models are also important because, although CO2 emissions are known to significantly contribute to general global warming, humans must be wary about other patterns of living such as high reproduction rates. 

### Regression #2: Distance from the equator versus variation in temperature from global average

The second relationship I would like to assess is how being further away from the equator affects the difference between a city's temperature and the global average. Although this relationship may not seem intuitive at first, it helps answer the question of whether different places are warming at different temperatures. A large variation means that a specific geographic region is much warmer than the global average. If this persists over time, then a specific region may be contributing too heavily to global warming. This regression will be a simple bivariate regression and will help clarify the relationship assumes in the graph titled "Diff in city temp and global temp vs dist from equator".

In [None]:
#Create column for constant
glob_city_2010['cons'] = 1

#Specify which type of regression to create and which statistical analysis to include in regression summary.
#A linear regression with which column to make up the endogenous and exogenous variables.
OLS2_reg1 = sm.OLS(endog=glob_city_2010['temp difference (˚C)'], 
                   exog=glob_city_2010[['cons','dist from equator (km)']],
                  missing='drop')

OLS2_results = OLS2_reg1.fit()
print(OLS2_results.summary())

The second relationship analyzed is that between distance from the equator, interpreted as geographic location and in important indicator for climate, and the difference between's a city's land tempreature and the global average land temperature. The above regression results report that every 1000 km that a city is further away from the equator is associated with a 7.1 degree difference between the city's average land temperature and the global average land temperature in degrees celsius. This relationship is also statistically significant as is shown by its p-value which is almost zero. Also, the adjusted r-squared of 0.747 reflects a strong relationship and is interpreted as meaning that 74.7% of the variance in the temperature difference is explained by variance in the distance from equator.

Additionally, this regression may be more accuarte and produce a higher r-squared by controlling for some variables which may also resolve the issue of multicolinearity reported in note 2 under the regression summary.

This regression is important for the general report because it helps answer the question of whether or not different parts of the world are warming at different temperatures. Through the regression results, it may be inferred that cities further away from the equator have largely varying tempereatures compared to the global average. This suggests that warmer cities closer to the equator warm at a rate closer to the global average; an alarming indicator for their environmental future considering the astonishing rate at which the globe is warming.

### Regression #3: Co2 Emissions per Capita versus Land Temperature

The third part I would like to analyze is whether or not Co2 emissions per capita are a good indicator of global warming. After looking at the bar graph above that plots each country's Co2 emissions per capita, it seemed somewhat clear that this measure is more reflective of a country's population since the major industrial powerhouses were not among the top countries. To have a more concrete answer, I will be running a regression that has each country's 2015 Co2 emissions per capita plotted against the change in temperature between 2000 and 2010.

In [None]:
#Create column for constant
cntry_co2['cons'] = 1

#Specify which type of regression to create and which statistical analysis to include in regression summary.
#A linear regression with which column to make up the endogenous and exogenous variables.
OLS3_reg1 = sm.OLS(endog=cntry_co2['Temp diff (˚C)'],
                  exog=cntry_co2[['cons', '2015 per capita co2 emissions (tonnes)']],
                  missing='drop')



OLS3_results = OLS3_reg1.fit()
print(OLS3_results.summary())

This third regression produced the most expected results of all four. As was observed in the visualization relating to this relationship, there is no apparents relationship between CO2 emissions per capita and the difference between a country's land temperature and the global average. These results demonstrate that a 1 tonne decrease in a country's 2015 per capita CO2 emissions is associated with a 0.02 decrease in the temperature difference. This relationship is not statistically significant and the adjusted r-squared of almost 0 translates into essentially no relationship. 

To reiterate, CO2 emissions per capita is not a reflective measure of a country's contribution to global warming. This adds to the report by explaining that countries who use this indicator as demonstrative of their low pollution levels are often times hiding behind their small populations. Also, this measure does not help answer the question of whether or not countries with high per capita CO2 emissions are warming at a higher rate than others.

### Regression #4: Average Global Land Temperature versus Average Global Land and Ocean Temperature

The final question I would like to asses is whether or not oceans around the world are warming at a rate that is faster than land. To do this, I created a visualization that plots the land temperature against the land and ocean temperature. In my visualization titled "Land Temp vs. Land and Ocean Temp", the visual relationship seems perfestly linear. If this is the case after running the relevant statistical analysis, then it can be inferred that land temperatures predict ocean and land temperature very well and, thus, that would mean that there is no difference in the rates of warming. 

In [None]:
#Drop missing variables
land_ocean_df = land_ocean_df.dropna()

#Create column for constant
land_ocean_df['cons'] = 1

#Specify which type of regression to create and which statistical analysis to include in regression summary.
#A linear regression with which column to make up the endogenous and exogenous variables.
OLS4_reg1 = sm.OLS(endog=land_ocean_df['Land and Ocean Avg Temp (˚C)'],
                  exog=land_ocean_df[['cons', 'Land Average Temperature (˚C)']],
                  missing='drop')



OLS4_results = OLS4_reg1.fit()
print(OLS4_results.summary())

Similarly to the previous regression, the results of this regression is rather expected. After looking at the visualization related to this regression, the expected result was that the land temperature would predict the combined land and ocean temperature very well. Looking the coefficients, the relationship shows that a 1 degree increase in the average land temperature results in a 0.3 increase in the land and ocean temperature. This relationship is statistically significant as is demonstrated by a p-value that is almost zero. Also, the very high adjusted r-squared of 0.976 shows that 97.6% of the variation in the land and ocean temperature is explained by variation in the land temperature. The AIC and BIC are also small; highlighted by a large number in the negative direction.

This regression is important with respect to the general report since it highlights that the land on Earth is warming at the same rate as the oceans. This could mean that pollution which leads to higher land tempertaures has a similar effect on the oceans. This shows that our destructive habits are not only negatively influencing land masses, but also oceans; a part of the globe whose changes in temperature may not be immeditaely obvious.

## Analysis

Thus far, the different visual representations have been helpful at answering separate parts of the research question. This portion of the project aims to consolidate all the data collected and all the visual representations presented to provide a clear answer to the question.

Firstly, it is clear that data about the average land temperature throughout history does in fact support claims about global warming. In the histogram titled "Pct Change in Global Land Temp since 1750", we can observe that although it seems like there are more instances where the global average temperature is decreasing, the skew in the plot demonstrates that there are more extreme changes in the temperature in the positive direction than the negative. Also, the maps presented show a slight change in the shades of the countries towards darker hues and although the differences are subtle, they have substantial implications with regards to the future of our planet.

Next, the relationship between increases in CO2 emissions and the temperature of the Earth is practically undeniable. The graph titled "Annual avg land temp vs. CO2 emissions" shows the positive relationship between the increasing average global land temperature and the increasing level of CO2 emissions. However the scraped data, which shows the 2015 per capita c02 emissions per country, shows us that the per capita measure of emissions is not a very good indicator of a country's contribution to general global warming. After finding the change in temperature for each country between 2010 and 2000 and then plotting that difference against the per capita emissions in 2015, there is no clear relationship. The visual representation corresponding to this claim is titled, "Change in temp vs per capita CO2 emissions". This is likely due to large, populous countries which are very indsutrious having relatively low per capita emissions despite large changes in their land temperature.

Lastly, the final part of the questions asks whether some parts of the world are warming faster than others. The data demonstrates that the oceans are not warming at a rate that is faster than land. The graph titled, "Land temp vs. land and ocean temp" shows that there is an extremely strong linear relationship between the temperature of the land and the combined temperature of the land and oceans which signals that the two measures grow linearily. As for cities further away from the equator, the graph, "Diff in city and global temp vs dist from equator" shows that countries further away from the equator have a temperature that varies from the global average with the relationship seemingly having quadratic tendencies. That is, as countries get further away, the difference between the local and global temperature grows faster than the distance grows.

## Conclusion

To conclude, the question I am trying to answer is if data about the temperature of the Earth supports the claim that there is global warming. Also, part of my question is whether or not CO2 emissions are associated with an increasing global land temperature. My findings, which are demonstrated in the visualizations, support the claim that the Earth is steadily warming. The strongest evidence in favor of this hypothesis is that there is a positive relationship between CO2 emissions and the temperature of the Earth. However, there is no evidence that distance from the equator or per capita CO2 emissions are strong indicators for global warming. Despite this, there is conclusive evidence to assert that global warming is in fact a reality that we are facing and that CO2 emissions are strongly related to this reality


My maps also support this hypothesis, although the differences in the shades of each country are slight the fact that there is a difference is significant. It is widely accepted that even slight changes in the temperature of the Earth have catastrophic events on the environment and warrants that there is an overhaul in the way we live life. Global warming is a serious problem and this data supports its existence. It is difficult to doubt the effect that pollution has on our environment after considering these visualizations.