# Analysing Global Land Temperatures Data from 1750 to 2015

This dataset has been sourced from Kaggle: Climate Change: Earth Surface Temperature Data
    
The original dataset can be found here: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data


# Reading the Dataset

Here is a sample of what the raw dataset looks like:

In [6]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [7]:
df = pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
df.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
0,1750-01-01,3.034,3.574,,,,,,
1,1750-02-01,3.083,3.702,,,,,,
2,1750-03-01,5.626,3.076,,,,,,
3,1750-04-01,8.49,2.451,,,,,,
4,1750-05-01,11.573,2.072,,,,,,


# Creating 'Year' value to analyse by year

Then I create a few new columns to make the analysis easier.

In [8]:
import datetime
df['month'] = pd.DatetimeIndex(df['dt']).month
df['year'] = pd.DatetimeIndex(df['dt']).year
df.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty,month,year
0,1750-01-01,3.034,3.574,,,,,,,1,1750
1,1750-02-01,3.083,3.702,,,,,,,2,1750
2,1750-03-01,5.626,3.076,,,,,,,3,1750
3,1750-04-01,8.49,2.451,,,,,,,4,1750
4,1750-05-01,11.573,2.072,,,,,,,5,1750


# Aggregating the data by year

In [11]:
global3=pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
global3=global3[['dt','LandAverageTemperature']]
global3.dropna(inplace=True)
global3

Unnamed: 0,dt,LandAverageTemperature
0,1750-01-01,3.034
1,1750-02-01,3.083
2,1750-03-01,5.626
3,1750-04-01,8.490
4,1750-05-01,11.573
...,...,...
3187,2015-08-01,14.755
3188,2015-09-01,12.999
3189,2015-10-01,10.801
3190,2015-11-01,7.433


In [12]:
import datetime
global3['dt'] = pd.DatetimeIndex(global3['dt']).year
global3

Unnamed: 0,dt,LandAverageTemperature
0,1750,3.034
1,1750,3.083
2,1750,5.626
3,1750,8.490
4,1750,11.573
...,...,...
3187,2015,14.755
3188,2015,12.999
3189,2015,10.801
3190,2015,7.433


In [13]:
global3 = global3.pivot_table(index='dt', values='LandAverageTemperature', aggfunc='mean').reset_index()
global3

Unnamed: 0,dt,LandAverageTemperature
0,1750,8.719364
1,1751,7.976143
2,1752,5.779833
3,1753,8.388083
4,1754,8.469333
...,...,...
261,2011,9.516000
262,2012,9.507333
263,2013,9.606500
264,2014,9.570667


# Chart: Average Land Temperature 1750 - 2015

In [14]:
trace=go.Scatter(
    x=global3['dt'],
    y=global3['LandAverageTemperature'],
    mode='lines',
    )
data=[trace]

py.iplot(data, filename='line-mode')

In [15]:
global4=pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')
global4['dt'] = pd.DatetimeIndex(global4['dt']).year
global4 = global4.pivot_table(index='dt', values='LandAverageTemperature', aggfunc='mean').reset_index()
trace=go.Scatter(
    x=global4['dt'],
    y=global4['LandAverageTemperature'],
    mode='markers',
    )
data=[trace]

py.iplot(data, filename='line-mode')

# Chart: Land Minimum and Maximum Temperatures 1750 - 2015

In [16]:
global5 = pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')

In [17]:
import datetime
global5.dropna(inplace=True)

In [18]:
global5['dt'] = pd.DatetimeIndex(global5['dt']).year
global5

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
1200,1850,0.749,1.105,8.242,1.738,-3.206,2.822,12.833,0.367
1201,1850,3.071,1.275,9.970,3.007,-2.291,1.623,13.588,0.414
1202,1850,4.954,0.955,10.347,2.401,-1.905,1.410,14.043,0.341
1203,1850,7.217,0.665,12.934,1.004,1.018,1.329,14.667,0.267
1204,1850,10.004,0.617,15.655,2.406,3.811,1.347,15.507,0.249
...,...,...,...,...,...,...,...,...,...
3187,2015,14.755,0.072,20.699,0.110,9.005,0.170,17.589,0.057
3188,2015,12.999,0.079,18.845,0.088,7.199,0.229,17.049,0.058
3189,2015,10.801,0.102,16.450,0.059,5.232,0.115,16.290,0.062
3190,2015,7.433,0.119,12.892,0.093,2.157,0.106,15.252,0.063


In [19]:
global5df = global5[['dt','LandMaxTemperature','LandMinTemperature']]

In [20]:
global5df1 = global5df.pivot_table(index='dt', values='LandMaxTemperature', aggfunc='mean').reset_index()
global5df1

Unnamed: 0,dt,LandMaxTemperature
0,1850,13.476667
1,1851,13.081000
2,1852,13.397333
3,1853,13.886583
4,1854,13.977417
...,...,...
161,2011,15.284833
162,2012,15.332833
163,2013,15.373833
164,2014,15.313583


In [21]:
global5df2 = global5df.pivot_table(index='dt', values='LandMinTemperature').reset_index()
global5df2

Unnamed: 0,dt,LandMinTemperature
0,1850,1.964333
1,1851,2.203917
2,1852,2.337000
3,1853,1.892500
4,1854,1.762167
...,...,...
161,2011,3.827667
162,2012,3.756167
163,2013,3.911333
164,2014,3.877750


In [22]:
trace0=go.Scatter(
    x=global4['dt'],
    y=global4['LandAverageTemperature'],
    mode='lines',
    )

trace1=go.Scatter(
    x=global5df1['dt'],
    y=global5df1['LandMaxTemperature'],
    mode='lines',
    )

trace2=go.Scatter(
    x=global5df2['dt'],
    y=global5df2['LandMinTemperature'],
    mode='lines'
    )

data = [trace0, trace1, trace2]

py.iplot(data, filename='line-mode')

# Chart: Land Avg Temp (1750 - 2015) (including Uncertainty)

In [23]:
global_temp = pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')

In [24]:
df1 = pd.read_csv('/Users/lindyhua/Library/Mobile Documents/com~apple~CloudDocs/Documents/Learning with Hands/Data Analytics/Datasets/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')

#add a column finding year from date
df1['Year'] = pd.DatetimeIndex(df1['dt']).year

#create new table with the mean LandAvgTemp and LandAvgTempUncertainty, by year
df1plot = df1.groupby(['Year'])['LandAverageTemperature','LandAverageTemperatureUncertainty'].mean().reset_index()

df1plot

Unnamed: 0,Year,LandAverageTemperature,LandAverageTemperatureUncertainty
0,1750,8.719364,2.637818
1,1751,7.976143,2.781143
2,1752,5.779833,2.977000
3,1753,8.388083,3.176000
4,1754,8.469333,3.494250
...,...,...,...
261,2011,9.516000,0.082000
262,2012,9.507333,0.083417
263,2013,9.606500,0.097667
264,2014,9.570667,0.090167


In [25]:
#create plotly graphs

trace0 = go.Scatter(
    x = df1plot['Year'], 
    y = df1plot['LandAverageTemperature'] + df1plot['LandAverageTemperatureUncertainty'],
    fill= None,
    mode='lines',
    name='Uncertainty higher',
    line=dict(
        color='rgb(255, 219, 204)',
    )
)
trace1 = go.Scatter(
    x = df1plot['Year'], 
    y = df1plot['LandAverageTemperature'] - df1plot['LandAverageTemperatureUncertainty'],
    fill='tonexty',
    mode='lines',
    name='Uncertainty lower',
    line=dict(
        color='rgb(255, 219, 204)',
    )
)

trace2 = go.Scatter(
    x = df1plot['Year'], 
    y = df1plot['LandAverageTemperature'],
    name='Average Temperature',
    line=dict(
        color='rgb(255, 128, 0)',
    )
)
data = [trace0, trace1, trace2]

layout = go.Layout(
    xaxis=dict(title='Year'),
    yaxis=dict(title='Average Temperature, °C'),
    title='Average Global Land Temperature, from 1750 - 2015',
    showlegend = False)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

# Conclusion

Firstly, we can see that temperature measurement contained a lot of uncertainty between 1750 - 1850. A hypothesis for this could be because measurement tools were less precise OR that there was not a large focus on fastidious temperature recordings which would result in smaller sample sizes / inaccuracies. However, we can see from 1850 onwards, uncertainty decreases significantly, to the point where there is a great amount of certainty in Land Average Temperature recordings from 1950 onwards. 
      
Secondly, from the above, we can see that the global land average temperature has been on an increasing trend since 1750. In particuarl, we can see an increase in global temperatures from 1900 - 1960, before the increase significantly accelerates between 1970 - 2015. A reasoning for this could be the huge increase in industry, leading to increasing CO2 emmissions and global warming.''')

# NEXT STEPS

- Incorporate CO2 emmissions since 1750
- Analyse by cities, seeing if there are any cities or regions being particularly affected
