## Time Series: Climate Data
![climate](https://images.pexels.com/photos/2969/climate-cold-glacier-iceberg.jpg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260)
**Objectives:**
- list methods to adjust data for time series
- define the vocabulary of lagging, moving averages, and differencing
- replicate the process in google sheets and in python

**Question**: What are numbers we'd want to model over time?

**Problem:** All our tools aren't natively prepared to handle time series data. We need to make a lot of adjustments to our data. 

### Set up environment and tool set 

In [None]:
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = [8, 3]
import matplotlib.pyplot as plt

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels

import scipy
from scipy.stats import pearsonr

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

In [None]:
print(matplotlib.__version__)
print(pd.__version__)
print(np.__version__)
print(statsmodels.__version__)
print(scipy.__version__)

Data obtained from `https://datahub.io/core/global-temp#data`<br>
Looks like original source is obtained from `https://www.ncdc.noaa.gov/cag/global/time-series`

Data are included from the GISS Surface Temperature (GISTEMP) analysis and the global component of Climate at a Glance (GCAG)

### Obtain and visualize data

In [None]:
## data obtained from https://datahub.io/core/global-temp#data
df = pd.read_csv("https://pkgstore.datahub.io/core/global-temp/annual_csv/data/a26b154688b061cdd04f1df36e4408be/annual_csv.csv")
df.head()

In [None]:
df.Mean[:100].plot()

### Exercise: what is wrong with the data and plot above? How can we fix this?

In [None]:
df = df.pivot(index='Year', columns='Source', values='Mean')

In [None]:
df.head()

In [None]:
df.GCAG.plot()

In [None]:
type(df.index)

### Exercise: how can we make the index more time aware?

In [None]:
df.index = pd.to_datetime(df.index, format='%Y')

In [None]:
df.head()

In [None]:
type(df.index)

In [None]:
df.GCAG.plot()

In [None]:
df['1880']

In [None]:
plt.plot(df['1880':'1950'][['GCAG', 'GISTEMP']])

In [None]:
plt.plot(df['1950':][['GISTEMP']])

## Logging

`np.log()`

In [None]:
df['GCAG_log'] = np.log(df.GCAG)
df.tail(10)

## Lagging

`shift()` [shift documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html)

In [None]:
df['GCAG_lag1'] = df.GCAG.shift()

In [None]:
df.head()


## Differencing

`diff()` [diff documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html)

#### First order difference

In [None]:
df['GCAG_diff1']=df.GCAG.diff()

In [None]:
df.head()

#### Second order difference

In [None]:
df['GCAG_diff2']=df.GCAG_diff1.diff()

In [None]:
df.head()

In [None]:
plt.plot(df.index, df.GCAG_diff1, label='GCAG first order difference', color='orange')
plt.plot(df.index, df.GCAG_diff2, label='GCAG second order difference', color='magenta')
plt.legend(loc='upper left')
plt.show()

## Moving Average

![img](img/MA.png)

`rolling()`  [rolling here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html)

In [None]:
rolling_mean = df.GCAG.rolling(window=4).mean()
rolling_mean2 = df.GCAG.rolling(window=8).mean()
plt.plot(df.index, df.GCAG, label = 'GCAG')
plt.plot(df.index, rolling_mean, label='GCAG 4 year SMA', color='orange')
plt.plot(df.index, rolling_mean2, label='GCAG 8 year SMA', color='magenta')
plt.legend(loc='upper left')
plt.show()

### Exercise: How strongly do these measurements correlate contemporaneously? What about with a time lag?

In [None]:
plt.scatter(df['1880':'1900'][['GCAG']], df['1880':'1900'][['GISTEMP']])

In [None]:
plt.scatter(df['1880':'1899'][['GCAG']], df['1881':'1900'][['GISTEMP']])

In [None]:
pearsonr(df['1880':'1899'].GCAG, df['1881':'1900'].GISTEMP)

In [None]:
df['1880':'1899'][['GCAG']].head()

In [None]:
df['1881':'1900'][['GISTEMP']].head()

In [None]:
min(df.index)

In [None]:
max(df.index)

### References:

- [Duke resource on differencing](https://people.duke.edu/~rnau/411diff.htm)
- [Scipy talk on time series](https://www.youtube.com/watch?v=v5ijNXvlC5A)
- [Aileen Nielson book](https://www.oreilly.com/library/view/practical-time-series/9781492041641/)

### Check Objectives
