# Time Series E2E Exercise

### James Allen

You will use the earth surface temperature data offered by Berkeley Earth through Kaggle.com. You will select one location, could be a city or a state or something relatively similar in size and analyze the patterns in temperature over time. You will then model those patterns to forecast temperature into the future (how far is up to you, but should be something meaningful).

- Use the data from this kaggle competition: https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data

In [1]:
# imports
import pandas as pd
import numpy as np
import os

from datetime import datetime
from sklearn.metrics import mean_squared_error
from math import sqrt

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from pandas.plotting import register_matplotlib_converters

from matplotlib.ticker import StrMethodFormatter
from matplotlib.dates import DateFormatter

import statsmodels.api as sm
from statsmodels.tsa.api import Holt

import warnings
warnings.filterwarnings("ignore")

In [2]:
# getting the climate change data
df = pd.read_csv('GlobalLandTemperaturesByCity.csv')

In [3]:
df.head() # check_yo_head

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,6.068,1.737,Århus,Denmark,57.05N,10.33E
1,1743-12-01,,,Århus,Denmark,57.05N,10.33E
2,1744-01-01,,,Århus,Denmark,57.05N,10.33E
3,1744-02-01,,,Århus,Denmark,57.05N,10.33E
4,1744-03-01,,,Århus,Denmark,57.05N,10.33E


In [4]:
# selecting my hometown on "Sacramento", California
smf_df = df[df['City'] == "Sacramento"]

In [5]:
smf_df.head() # check_yo_head

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
6506228,1849-01-01,8.092,2.192,Sacramento,United States,37.78N,122.03W
6506229,1849-02-01,9.508,1.85,Sacramento,United States,37.78N,122.03W
6506230,1849-03-01,11.701,2.129,Sacramento,United States,37.78N,122.03W
6506231,1849-04-01,13.102,2.559,Sacramento,United States,37.78N,122.03W
6506232,1849-05-01,14.045,1.909,Sacramento,United States,37.78N,122.03W


In [6]:
# taking a look at the shape of the "Sacramento" data
smf_df.shape

(1977, 7)

In [7]:
# taking a look at the info and Dtype of the "Sacramento" data
smf_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1977 entries, 6506228 to 6508204
Data columns (total 7 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   dt                             1977 non-null   object 
 1   AverageTemperature             1977 non-null   float64
 2   AverageTemperatureUncertainty  1977 non-null   float64
 3   City                           1977 non-null   object 
 4   Country                        1977 non-null   object 
 5   Latitude                       1977 non-null   object 
 6   Longitude                      1977 non-null   object 
dtypes: float64(2), object(5)
memory usage: 123.6+ KB


In [8]:
smf_df.rename(columns={'AverageTemperature':'avg_temp', 'dt':'date', 'City': 'city', 'Country':'country'}, inplace=True)

In [9]:
smf_df = smf_df.drop(['AverageTemperatureUncertainty', 'Latitude', 'Longitude'], axis = 1)

In [10]:
smf_df.tail()

Unnamed: 0,date,avg_temp,city,country
6508200,2013-05-01,17.434,Sacramento,United States
6508201,2013-06-01,19.759,Sacramento,United States
6508202,2013-07-01,20.657,Sacramento,United States
6508203,2013-08-01,19.731,Sacramento,United States
6508204,2013-09-01,20.471,Sacramento,United States


In [12]:
smf_df.index = pd.to_datetime(smf_df.index)

In [13]:
smf_df.head()

Unnamed: 0,date,avg_temp,city,country
1970-01-01 00:00:00.006506228,1849-01-01,8.092,Sacramento,United States
1970-01-01 00:00:00.006506229,1849-02-01,9.508,Sacramento,United States
1970-01-01 00:00:00.006506230,1849-03-01,11.701,Sacramento,United States
1970-01-01 00:00:00.006506231,1849-04-01,13.102,Sacramento,United States
1970-01-01 00:00:00.006506232,1849-05-01,14.045,Sacramento,United States
