## Time Series Data Analysis and Plotting

Time series analysis - investigating climate data using Python and Pandas. Time series data are repeated measurements of the same phenomenon, taken sequentially over time.

`import pandas as pd`

`from matplotlib import pyplot as plt`

`from zipfile import ZipFile`

`%matplotlib inline`

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
from zipfile import ZipFile
%matplotlib inline

## Getting the Data

Download the following 2 datasets:

1. Estimate of global surface temperature change (NASA)

These data represent temperature anomalies (differences from the mean/expected value) per month and per season (DJF=Dec-Feb, MAM=Mar-May, etc).

`! curl -o ./data/GISTEMP.csv https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.csv`

In [4]:
! curl -o ./data/GISTEMP.csv https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12274  100 12274    0     0  11111      0  0:00:01  0:00:01 --:--:-- 11117


2. Estimate of CO2 emissions, in metric tons per capita (World Bank)

These data give us the average CO₂ emission (in metric tons) per person. The dataset is divided up by countries and other categories such as ‘World’ or ‘Upper middle income.’

Download and unzip the files.

`! curl -o ./data/CO2.zip http://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=csv`

`with ZipFile("./data/CO2.zip", "r") as zip:
    zip.extractall("./data/")`

In [24]:
! curl -o ./data/CO2.zip http://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=csv

    

## Create DataFrames

Read in the raw temperature and CO₂ emissions datasets. Use `skiprows` to structure the data in the DataFrame.

`raw_t = pd.read_csv("./data/GISTEMP.csv", skiprows=1)`

`raw_e = pd.read_csv("./data/API_EN.ATM.CO2E.PC_DS2_en_csv_v2_713061.csv", skiprows=3)`

## Data Wrangling

Transforming data from one format to another to make them usable

### Wrangling Temperature Data

* Create a DateTime index

* Handling missing values

* Resampling to a different frequency


### Creating a DateTime index

For the temperature data, create an empty dataframe with a DateTime index of monthly frequency - then use the raw data to populate the new dataframe.

Use the `date_range` function to create a date range index. The index will range from January 1880 to June 2019.

`date_rng = pd.date_range(start="1/1/1880", end="6/1/2019", freq="M")`

Create an empty dataframe with one column named `date` containing the values from the date index

`t = pd.DataFrame(date_rng, columns=["date"])`

Create a column for the anomoly values

`t["Avg_Anomaly_deg_C"] = None`

Set the index to the date column (DateTime index)

`t.set_index("date", inplace=True)`

Extract the raw temperature data from the year and month columns.

`raw_t = raw_t.iloc[:,:13]` *Select all rows of columns 0 through 12*

### Apply 

Use the `apply` function to step through the rows of the raw data (`axis=1` for rows, `axis=0` for columns) and apply them to the new dataframe.

Import the following libraries

`from datetime import datetime` *useful for parsing dates and times*

`import calendar` *used to get the last day of each month*

In [8]:
from datetime import datetime
import calendar

Define a function to populate the dataframe with row values from the raw temperature data:

`def populate_anom():`

&nbsp;&nbsp;&nbsp;&nbsp;`year = row["Year"]`

&nbsp;&nbsp;&nbsp;&nbsp;`monthly_anomolies = row.iloc[1:]`

&nbsp;&nbsp;&nbsp;&nbsp;`months = monthly_anomolies.index`

&nbsp;&nbsp;&nbsp;&nbsp;`for month in monthly_anomolies.index:`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`last_day = calendar.monthrange(year,datetime.strptime(month, '%b').month)[1]`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`date_index = datetime.strptime(f'{year} {month} {last_day}', '%Y %b %d')`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`t.loc[date_index] = monthly_anomolies[month]`

`strptime` creates a  date time object from a string

`strftime` creates a string from a date time object

For more on strptime and strftime see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior


        
### Lambda
 
 A lambda function is a small anonymous function that can take any number of arguments, but can only have one expression.
 
Example:
 
`x = lambda a : a + 10`

`(x(5))`

Using lambda, apply the `populate_df()` function to each row of raw data (axis=1)

`_ = raw_t.apply(lambda row: populate_anom(row), axis=1)`

In [25]:
def populate_anom(row):
    year = row["Year"]
    monthly_anomolies = row.iloc[1:]
    months = monthly_anomolies.index
    for month in monthly_anomolies.index:
        last_day = calendar.monthrange(year,datetime.strptime(month, "%b").month)[1]
        date_index = datetime.strptime(f"{year} {month} {last_day}", "%Y %b %d")
        t.loc[date_index] = monthly_anomolies[month]


## Formatting Values

Handling missing values and converting data types

Using the `fillna` function to populate `Nan` values

Define a function to convert values to floats, and return a 'NaN = Not a Number' if not possible

`import numpy as np`

`def clean_anomaly_value(raw_value):`

&nbsp;&nbsp;&nbsp;&nbsp;`try:`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return float(raw_value)`

&nbsp;&nbsp;&nbsp;&nbsp;`except:`

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;`return np.NaN`

`apply` the function to each raw_value in the `"Avg_Anomaly_deg_C"` column

`t["Avg_Anomaly_deg_C"] = t["Avg_Anomaly_deg_C"].apply(lambda raw_value: clean_anomaly_value(raw_value))`

Fill NaN values using `method="ffill"`

`t.fillna(method="ffill", inplace=True)`

### Create a Simple Plot

`plt.figure(figsize=(10,8))`

`plt.xlabel('Time')`

`plt.ylabel('Temperature Anomaly (°Celsius)')`

`plt.plot(t, color='#1C7C54', linewidth=1.0)`

### Resampling Data

Sometimes data are too granular to visualize nicely. The `resample` function can change the frequency from months to years.

Downsample the temperature data into years, the string ‘A’ represents ‘calendar year-end’. 

`t.resample('A').mean().head()`

#### Plot the Resampled Data

`plt.figure(figsize=(10,8))`

`plt.xlabel("Time")`

`plt.ylabel("Temperature Anomaly (°Celsius)")`

`plt.plot(t.resample("A").mean(), color="#1C7C54", linewidth=1.0)`


### Wrangling CO2 Data

* Slicing and Searching

* Useful functions

Select only the row representing the CO₂ emissions for the entire world. Create a new dataframe that uses a DateTime index — and then use the raw data to populate it.

In [29]:
raw_e

Create a function to wrangle the raw CO2 data into a new dataframe

In [19]:
# Define function to pull value from raw data, using DateIndex from new DataFrame row
def populate_df(row):
    index = str(row['date'].year)
    value = raw_e_world.loc[index]
    return value
  
# Select just the row with co2 emissions for the 'world', and the columns for the years 1960-2018 
raw_e_world = raw_e[raw_e['Country Name']=='World'].loc[:,'1960':'2018']
#print (raw_e_world)

# 'Traspose' the resulting slice, making the columns become rows and vice versa
raw_e_world = raw_e_world.T
#print (raw_e_world)
raw_e_world.columns = ['value']

# Create a new DataFrame with a daterange the same the range for.
# the Temperature data (after resampling to years)
date_rng = pd.date_range(start='31/12/1960', end='31/12/2018', freq='y')
e = pd.DataFrame(date_rng, columns=['date'])

# Populate the new DataFrame using the values from the raw data slice
v = e.apply(lambda row: populate_df(row), axis=1)
e['Global CO2 Emissions per Capita'] = v
e.set_index('date', inplace=True)
e.head()


Unnamed: 0_level_0,Global CO2 Emissions per Capita
date,Unnamed: 1_level_1
1960-12-31,3.099157
1961-12-31,3.070018
1962-12-31,3.140957
1963-12-31,3.245109
1964-12-31,3.36138


Select all data after the year 2011:

`e[e.index.year>2011]`

Fill NaN vales using forward fill

`e.fillna(method='ffill', inplace=True)`

`e[e.index.year>2011]`

Use the DateTime index to search on a range

`e['1984-01-04':'1990-01-06']`

#### Plotting the Temperature Data

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

t_resampled = t.resample('A').mean()

fig, ax = plt.subplots(figsize=(10,8))
ax.plot(t_resampled, color='#1C7C54', linewidth=2.5)
ax.set(xlabel="Time (years)", ylabel="Temperature Anomaly (deg. Celsius)", title="Global Temperature Anomalies")
ax.grid()

#### Plot the CO2 Data

Create figures and axes

`fig, ax = plt.subplots(figsize=(10,8))`

Plot co2 emissions data with specific colour and line thickness

`ax.plot(e, color='#3393FF', linewidth=2.5)`

Set axis labels and graph title

`ax.set(xlabel="Time (years)", ylabel="Emissions (Metric Tons per Capita)", title="Global CO2 Emission over Time")`

Enable grid

`ax.grid()`

#### Creating Interactive Plots

In [443]:
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

`t.resample('A').mean().iplot(kind='line', xTitle='Time (years)', color='#1C7C54',
                  yTitle='Temperature Anomaly (deg. Celsius)', title='Global Temperature Anomalies')

`e.iplot(kind='line', xTitle='Time (years)', color='#3393FF', yTitle='Emissions (Metric Tons per Capita)', title='Global CO2 Emission over Time')`