# Machine Learning Example
## Groundhog Day Forecasts and Temperatures
### How accurate is Punxsutawney Phil's winter weather forecast?

In case you don't know what GroundHob Day is, it's a tradition celebrated in the United States and Candada.

It derives from the Pennsylvania Dutch superstition that if a groundhog (Deitsch: Grundsau, Grunddax, Dax) emerging from its burrow on this day sees a shadow due to clear weather, it will retreat to its den and winter will persist for six more weeks, and if he does not, due to cloudiness, spring season will arrive early ([Wikipedia](https://en.wikipedia.org/wiki/Groundhog_Day)).

Groundhob day this year is Friday, February 2 (in 7 days)! Let's see if we can predict the groundhog's fate and the fate  North America!

## Imports

In [1]:
# Imports
# Note that we're aliasing the package names
# These aliases are the most common way of using these modules
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
# Machine Learning modules
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import metrics

# Configure visualisations
%matplotlib inline
mpl.style.use( 'ggplot' )
pylab.rcParams[ 'figure.figsize' ] = 18.5, 10.5

## Load Data

In [2]:
# Load Data
base_path = './data/'
groundhog_df = pd.read_csv(base_path + 'groundhog.csv')

# Preview
groundhog_df.head(n=15)

Unnamed: 0,Year,Punxsutawney Phil,February Average Temperature,February Average Temperature (Northeast),February Average Temperature (Midwest),February Average Temperature (Pennsylvania),March Average Temperature,March Average Temperature (Northeast),March Average Temperature (Midwest),March Average Temperature (Pennsylvania)
0,1886,No Record,,,,,,,,
1,1887,Full Shadow,,,,,,,,
2,1888,Full Shadow,,,,,,,,
3,1889,No Record,,,,,,,,
4,1890,No Shadow,,,,,,,,
5,1891,No Record,,,,,,,,
6,1892,No Record,,,,,,,,
7,1893,No Record,,,,,,,,
8,1894,No Record,,,,,,,,
9,1895,No Record,26.6,15.6,21.9,17.0,39.97,27.6,40.2,31.3


It looks like there are quite a bit of rows with `NaN` values. Lets filter them out, and map `Punxsutawney Phil` values to a numbers.

In [3]:
groundhog_df = groundhog_df.dropna()  # remove NaN rows

# map the values
mapper = {
        'No Record': 0,
        'No Shadow': 1,
        'Partial Shadow': 2,
        'Full Shadow': 3
    }
def map_phil_to_number(sample):
    return(mapper[sample])

groundhog_df['Punxsutawney Phil'] = groundhog_df['Punxsutawney Phil'].apply(map_phil_to_number)

# Prevew data
groundhog_df.head()

Unnamed: 0,Year,Punxsutawney Phil,February Average Temperature,February Average Temperature (Northeast),February Average Temperature (Midwest),February Average Temperature (Pennsylvania),March Average Temperature,March Average Temperature (Northeast),March Average Temperature (Midwest),March Average Temperature (Pennsylvania)
9,1895,0,26.6,15.6,21.9,17.0,39.97,27.6,40.2,31.3
10,1896,0,35.04,22.2,33.5,26.6,38.03,25.3,36.9,27.8
11,1897,0,33.39,23.6,34.7,27.9,38.79,32.0,44.0,36.9
12,1898,3,35.37,24.8,33.3,26.7,41.05,38.0,46.0,42.0
13,1899,0,25.5,18.1,22.2,20.0,37.63,29.3,38.4,34.0


### Let's set up the data

I want to predict this year's groundhog prediction, and maybe even later years. We can use last year's weather data to predict this year's prediction.

To do this, lets create a [Support Vector Machines](http://scikit-learn.org/stable/modules/svm.html) to regress againts each of the temperature features.

For this I'll use [scikit-learn](http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html).

However, first let's setup the data appropriately.


In [4]:
# Get training features
training_features = groundhog_df.loc[:, 'February Average Temperature':'March Average Temperature (Pennsylvania)']
training_features.head()

Unnamed: 0,February Average Temperature,February Average Temperature (Northeast),February Average Temperature (Midwest),February Average Temperature (Pennsylvania),March Average Temperature,March Average Temperature (Northeast),March Average Temperature (Midwest),March Average Temperature (Pennsylvania)
9,26.6,15.6,21.9,17.0,39.97,27.6,40.2,31.3
10,35.04,22.2,33.5,26.6,38.03,25.3,36.9,27.8
11,33.39,23.6,34.7,27.9,38.79,32.0,44.0,36.9
12,35.37,24.8,33.3,26.7,41.05,38.0,46.0,42.0
13,25.5,18.1,22.2,20.0,37.63,29.3,38.4,34.0


In [5]:
# Shift the data down one row, so that last year's temp readings
# can be used for this year's prediction
shifted_training_features = training_features.shift()
shifted_training_features.head()

Unnamed: 0,February Average Temperature,February Average Temperature (Northeast),February Average Temperature (Midwest),February Average Temperature (Pennsylvania),March Average Temperature,March Average Temperature (Northeast),March Average Temperature (Midwest),March Average Temperature (Pennsylvania)
9,,,,,,,,
10,26.6,15.6,21.9,17.0,39.97,27.6,40.2,31.3
11,35.04,22.2,33.5,26.6,38.03,25.3,36.9,27.8
12,33.39,23.6,34.7,27.9,38.79,32.0,44.0,36.9
13,35.37,24.8,33.3,26.7,41.05,38.0,46.0,42.0


In [6]:
# Get the groundhog predictions
groundhog_shawdow = groundhog_df.loc[:, ['Year', 'Punxsutawney Phil']]
groundhog_shawdow.head()

Unnamed: 0,Year,Punxsutawney Phil
9,1895,0
10,1896,0
11,1897,0
12,1898,3
13,1899,0


In [7]:
training_data = pd.concat([groundhog_shawdow, shifted_training_features], axis=1)
training_data.head()

Unnamed: 0,Year,Punxsutawney Phil,February Average Temperature,February Average Temperature (Northeast),February Average Temperature (Midwest),February Average Temperature (Pennsylvania),March Average Temperature,March Average Temperature (Northeast),March Average Temperature (Midwest),March Average Temperature (Pennsylvania)
9,1895,0,,,,,,,,
10,1896,0,26.6,15.6,21.9,17.0,39.97,27.6,40.2,31.3
11,1897,0,35.04,22.2,33.5,26.6,38.03,25.3,36.9,27.8
12,1898,3,33.39,23.6,34.7,27.9,38.79,32.0,44.0,36.9
13,1899,0,35.37,24.8,33.3,26.7,41.05,38.0,46.0,42.0


Great! Now we have the previous year weather temperatures with the current year `Punxsutawney Phil` predictions. All we have to do now is remove that `NaN` row introduced, create our model, and test!

In [8]:
# Remove the NaN row introduced
training_data = training_data.dropna()
X = training_data.loc[:, 'February Average Temperature':'March Average Temperature (Pennsylvania)']
y = training_data.loc[:, 'Punxsutawney Phil']

groundhog_svr = SVC(kernel='rbf')
groundhog_model = groundhog_svr.fit(X, y)


In [9]:
# Plot the results

groundhog_predictions = groundhog_model.predict(X)
groudhog_ground_truth = groundhog_df.loc[10:, 'Punxsutawney Phil']

print(accuracy_score(groudhog_ground_truth, groundhog_predictions))
                                  
print(metrics.classification_report(groudhog_ground_truth, groundhog_predictions))

0.9586776859504132
             precision    recall  f1-score   support

          0       1.00      0.80      0.89         5
          1       1.00      0.80      0.89        15
          2       0.00      0.00      0.00         1
          3       0.95      1.00      0.98       100

avg / total       0.95      0.96      0.95       121



  'precision', 'predicted', average, warn_for)


Now lets predict this year's groundhog event, and consecuently our weather future!
![title](http://theotherpress.ca/wp-content/uploads/2017/01/humour_groundhog-1024x683.jpg)

In [24]:
last_years_weather = groundhog_df.tail(1).loc[:, 
                                              'February Average Temperature':'March Average Temperature (Pennsylvania)']
pred = groundhog_model.predict(last_years_weather)

inv_mapper = {v: k for k, v in mapper.items()}
inv_mapper[pred[0]]

'Full Shadow'

# Conclusion

We will have 6 more weeks of winter 