## Presentation 1
#### Spring 2020
### Introduction
Climate change is one of the apparent effects of the rapid economic growth that has occurred in most countries
since the Industrial Revolution of the late 18th century. It is an important issue for economic
policymaking, since governments need to assess how serious the problem is and then decide how to
mitigate it. 

Suppose you on a policy advisory team for a nation. The government would like to know more about the
extent of climate change and its possible causes, before they make decisions on how to mitigate its
probable economic and social impacts, and whether or not to actively participate in internationally
coordinated collective efforts to reduce global impacts.
I would like you to select a (real!) grateful nation to receive your policy advice. Please determine the
capital city of this country, and then find its geographic location (latitude and longitude).
 You are generally being asked to examine the following questions from the perspective of your specific
country:

1. How can we tell whether climate change will actually affect our nation?
2. If it is real, how can we measure the extent of climate change and determine what is causing it?

To answer the first question, you will analyze the behavior of environmental variables over time to see
whether there are general patterns in environmental conditions that could be indicative of climate
change. You will focus on temperature-related variables.
To answer the second question, you will examine the degree of association between temperature and
another variable, CO2 emissions, and consider whether there is a plausible relationship between the
two, or whether there are other explanations for what you observe.
You will do your analysis, plots, and write up, then submit your results in the form of a Jupyter
notebook, making use of Python, Pandas, Matplotlib, as well as Numpy, statsmodels, or any other
Python ecosystem packages you think useful. 


In [2]:
# the usual suspects...
import numpy as np                  # pandas uses numpy, and sometimes want to use numpy within pandas
import pandas as pd                    # data package, redundant since already did
import matplotlib.pyplot as plt               # graphics package
import seaborn as sns               # makes matplotlib prettier without issuing a single command!
import datetime as dt                  # date and time module, often need to use
import sys 

# check versions (overkill, but why not?)
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)

print(plt.style.available)
plt.style.use('fivethirtyeight')
%matplotlib inline 

Python version: 3.7.4 (default, Aug 13 2019, 15:17:50) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
Pandas version:  0.25.1
['seaborn-dark', 'seaborn-darkgrid', 'seaborn-ticks', 'fivethirtyeight', 'seaborn-whitegrid', 'classic', '_classic_test', 'fast', 'seaborn-talk', 'seaborn-dark-palette', 'seaborn-bright', 'seaborn-pastel', 'grayscale', 'seaborn-notebook', 'ggplot', 'seaborn-colorblind', 'seaborn-muted', 'seaborn', 'Solarize_Light2', 'seaborn-paper', 'bmh', 'tableau-colorblind10', 'seaborn-white', 'dark_background', 'seaborn-poster', 'seaborn-deep']


### Part 1
In the questions below, you will analyze data from NASA about land-ocean temperature anomalies in
the northern hemisphere from 1880 to 2016. Figure 1 is constructed using this data and shows
temperatures in three latitude bands over the period 1880–2016, expressed as differences from
the average temperature from 1951 to 1980. We start by creating a plot for the latitude band your country is situated in, similar to Figure 1 (but without the smoothed trend line), in order to visualize the data and spot patterns more easily.
### Figure 1
![Image](https://data.giss.nasa.gov/gistemp/graphs_v4/graph_data/Temperature_Change_for_Three_Latitude_Bands/graph.png)

Make sure you understand how temperature is measured. Go to NASA’s Goddard Institute for Space Studies website: https://data.giss.nasa.gov/gistemp/

Under the subheading ‘Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies (Land-Ocean Temperature Index, LOTI)’, select the CSV version of ‘Zonal annual means, 1880-present, updated through most recent complete year’.

The default name of this file is ZonAnn.Ts+dSST.csv. The above code gives it a suitable name and reads it into your Jupyter Notebook home folder.

In this dataset, temperature is measured as ‘anomalies’ rather than absolute temperature.

In [106]:
url='https://data.giss.nasa.gov/gistemp/tabledata_v4/ZonAnn.Ts+dSST.csv'
df_nasa=pd.read_csv(url)
df_nasa

Unnamed: 0,Year,Glob,NHem,SHem,24N-90N,24S-24N,90S-24S,64N-90N,44N-64N,24N-44N,EQU-24N,24S-EQU,44S-24S,64S-44S,90S-64S
0,1880,-0.16,-0.27,-0.04,-0.35,-0.12,-0.02,-0.83,-0.43,-0.26,-0.15,-0.09,-0.04,0.05,0.64
1,1881,-0.08,-0.16,0.00,-0.33,0.11,-0.07,-0.92,-0.41,-0.19,0.10,0.11,-0.06,-0.07,0.56
2,1882,-0.10,-0.20,-0.01,-0.29,-0.04,0.01,-1.42,-0.23,-0.13,-0.05,-0.04,0.01,0.03,0.59
3,1883,-0.17,-0.27,-0.06,-0.33,-0.16,-0.01,-0.20,-0.53,-0.25,-0.17,-0.15,-0.04,0.07,0.47
4,1884,-0.28,-0.42,-0.15,-0.60,-0.14,-0.14,-1.32,-0.63,-0.45,-0.13,-0.16,-0.19,-0.02,0.62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,2015,0.90,1.18,0.63,1.31,0.96,0.40,1.99,1.47,1.00,0.98,0.94,0.75,0.18,-0.35
136,2016,1.02,1.31,0.73,1.55,1.01,0.49,3.24,1.43,1.08,0.96,1.07,0.67,0.25,0.36
137,2017,0.92,1.18,0.67,1.39,0.82,0.59,2.51,1.37,1.05,0.87,0.77,0.76,0.35,0.53
138,2018,0.85,1.04,0.66,1.25,0.68,0.68,2.15,1.09,1.06,0.73,0.63,0.80,0.37,0.93


##### Q 1.1. 
Using the source as a reference, explain in your own words what temperature ‘anomalies’
means. Why have researchers chosen this particular measure over other measures (such as absolute
temperature)?



Now create some line plots using this annual data, which help us look for general patterns over time. 
First, select the latitude band corresponding to the capital city for the country you chose. Viewing the
CSV file as a spreadsheet, the columns labelled E-O contain the average temperature anomaly for each
year, by latitude band.

##### Q 1.2. 
Plot a line chart with annual average temperature anomaly on the vertical axis and time
(1880–2020) on the horizontal axis. Your chart should look like the above figure (minus the smoothed trend line). Create a horizontal line that intersects the vertical axis at 0, and label it ‘1951–1980 average’.
What does your chart suggest about the relationship between temperature and time? 

#### Part 2
This exercise uses two data sets, [scripps CO2 data](https://scrippsco2.ucsd.edu/data/atmospheric_co2/icecore_merged_products.html)  and [ETH Zurich data](ftp://data.iac.ethz.ch/CMIP6/input4MIPs/UoM/GHGConc/CMIP/yr/atmos/UoM-CMIP-1-1-0/GHGConc/gr3-GMNHSH/v20160701/) . The latter data set was downloaded from ftp://data.iac.ethz.ch/CMIP6/input4MIPs/UoM/GHGConc/CMIP/yr/atmos/UoM-CMIP-1-1-0/GHGConc/gr3-GMNHSH/v20160701/ where it is maintained by the Institute for Atmospheric and Climate Science (IAC) at Eidgenössische Technische Hochschule in Zürich, Switzerland. Please download both datasets from the above links into this Jupyter notebook.



In [107]:
#  location variables to point to URLs of these files 
loc1='https://scrippsco2.ucsd.edu/assets/data/atmospheric/merged_ice_core_mlo_spo/merged_ice_core_yearly.csv'
loc2='ftp://data.iac.ethz.ch/CMIP6/input4MIPs/UoM/GHGConc/CMIP/yr/atmos/UoM-CMIP-1-1-0/GHGConc/gr3-GMNHSH/v20160701/mole_fraction_of_carbon_dioxide_in_air_input4MIPs_GHGConcentrations_CMIP_UoM-CMIP-1-1-0_gr3-GMNHSH_0000-2014.csv'

In [108]:
malo_data=pd.read_csv(loc1,header=27)
malo_data.columns=['co2_ppm']
malo_data.index.name='yr'
malo_data['year']=malo_data.index.astype(int)

eth_data=pd.read_csv(loc2)

In [109]:
malo_data.tail(5)

Unnamed: 0_level_0,co2_ppm,year
yr,Unnamed: 1_level_1,Unnamed: 2_level_1
2014.0,395.87,2014
2015.0,397.83,2015
2016.0,400.78,2016
2017.0,403.57,2017
2018.0,405.66,2018


In [110]:
malo_data.dtypes

co2_ppm    float64
year         int64
dtype: object

In [111]:
eth_data.tail(5)

Unnamed: 0,year,data_mean_global,data_mean_nh,data_mean_sh
2010,2010,388.717029,390.784658,386.649401
2011,2011,390.944015,393.041154,388.846876
2012,2012,393.015993,395.036206,390.995779
2013,2013,395.724979,397.714917,393.735042
2014,2014,397.546977,399.590917,395.503037


In [112]:
eth_data.dtypes

year                  int64
data_mean_global    float64
data_mean_nh        float64
data_mean_sh        float64
dtype: object

In [113]:
eth_data.tail(5)

Unnamed: 0,year,data_mean_global,data_mean_nh,data_mean_sh
2010,2010,388.717029,390.784658,386.649401
2011,2011,390.944015,393.041154,388.846876
2012,2012,393.015993,395.036206,390.995779
2013,2013,395.724979,397.714917,393.735042
2014,2014,397.546977,399.590917,395.503037


##### Question 2.1. 
As you have just seen, both the Scripps and the ETA data contain estimated annual mean atmospheric $CO_2$ levels going back to roughly 0 AD!

(The ancient estimates of $CO_2$ levels are based on analysis of air bubbles trapped in ice cores in the arctic and antarctic. More contemporary estimates of average atmospheric $CO_2$ levels are measured at instruments located in the northern hemisphere (data_mean_nh), the southern hemisphere (data_mean_sh), and a global average of these measurements (data_mean_global).

The Scripps data is a single global average based on ice cores prior to 1958, and the mean of Mauna Loa (in the Northern Hemisphere) and the South Pole from 1958 on.

Please merge the eth dataframe with the mauna loa dataframe using year in both datasets as the match variable for merge, then construct a plot containing the eth global mean, and the Scripps annual measurement, over the years 1959-2019.

Repeat this plot, but with a log scale for $CO_2$ levels.


##### Question 2.2:   
Now, redo your plot for question 2, but this time plot northern and southern hemisphere mean $CO_2$ levels, as well as the Scripps global measurement, over all available years for all three variables.

**Hint:** 
Instead of using the "inner join" that is the default for Pandas' merge method, use an "outer join" to merge together all data for all years in which you have an observation for any variable.

##### Question 2.3
Calculate the correlation coefficients between the Scripps global, northern hemisphere, and southern hemisphere average $CO_2$ concentrations.

**Hint:** Use the Pandas `.corr()` method on your merged dataframe.

Is the correlation of the Scripps global average with the Northern Hemisphere greater or smaller than Scripps' correlation with the Southern Hemisphere?

### Part 3 
## Carbon emissions and the environment
The government has heard that carbon emissions could be responsible for climate change. It has asked
your team to investigate whether this is the case. To do so, we are now going to look at carbon emissions over
time, and use another type of chart (scatter plots) to show their relationship with temperature
anomalies. One way to measure whether there is a relationship between two variables is a linear
regression. Another commonly used measure, the correlation coefficient, when squared, equals the $R^2$
(explained variance share) statistic calculated by linear regression software. $R^2$
 measures how well two variables are related to one another by a simple linear relationship. For example, high values of one variable may tend to be observed along with high values of the other variable—this would be associated with a positive and statistically significant coefficient estimate on the explanatory variable in the
regression, and values close to 1 for $R^2$.

This coefficient estimate can be positive or negative. It is negative when high values of one variable are
associated with low values of the other. (Example: When the weather is hotter, purchases of ice cream
are higher. Temperature and ice cream sales have a positive association. On the other hand, if purchases
of hot beverages decrease when the weather is hotter, we say that temperature and hot beverage sales
have a negative association.) A statistically significant coefficient does not mean that there necessarily is
a causal relationship between the variables. The `statsmodels` package can run a linear regression
between dependent and explanatory variables in a Pandas dataframe, as we have previously seen. 


##### Q 3.1. 
[Using this source](https://www.esrl.noaa.gov/gmd/ccgg/about/co2_measurements.html)
as reference, explain whether or not you think this data is a reliable representation of the
global atmosphere.

##### Q 3.2. 
Now, please use a line plot to look for general patterns over time.


Plot a line chart with mean $CO_2$ levels on the vertical axis and time (over the entire
historical data set, and then starting from 1958 on, when current air samples begin to be
sampled) on the horizontal axis. Label the axes and the chart legend, and give your plot an
appropriate title. What does this chart suggest about the relationship between $CO_2$ and time?

We will now combine the $CO_2$ data with the temperature data from Part 1, and then examine the
relationship between these two variables visually, using scatterplots, and statistically, using correlation coefficients, for your chosen country. Use the latitude band for the capital of your country for the temperature measurement, and the hemisphere of your country for the $CO_2$ measurement.

Add the $CO_2$ data to the temperature dataset from Part 1, making sure that the data corresponds
to the correct year.

#### Q 3.3. 
Make a scatterplot of $CO_2$ level on the vertical axis and temperature anomaly on the
horizontal axis, for all historical data, and from 1958 on. What can you say about the apparent relationship, if any?

##### Q3.4

Calculate the correlation coefficient between temperature anomaly and $CO_2$ for your country. Does this alter your view as expressed in thhe previous question?

##### Q3.5.
Does your team believe the data support the theory that increased carbon dioxide accumulation causes an increase in termperature based on data for your nation? Explain why?


##### Q3.6
Can your team think of any sorts of experiments ('natural' or unnatural) that might test whether the apparent association between $CO_2$ and temperature is a causal link between increased greenhouse gas levels and temperature increases?
