# Machine Learning - Out-of-Sample Forecasts with ARIMA

In this notebook, we are going to explore a couple of diffrent machine learning models to predict time-series data.

Here is a link to all articles/tutorials:
 - [Time Series Archive](http://machinelearningmastery.com/category/time-series/)
 
Here are links to specific articles:
 - [How to Make Out-of-Sample Forecasts with ARIMA in Python](http://machinelearningmastery.com/make-sample-forecasts-arima-python/)
 - [Sensitivity Analysis of History Size to Forecast Skill with ARIMA in Python](http://machinelearningmastery.com/sensitivity-analysis-history-size-forecast-skill-arima-python/)
 - [Feature Selection for Time Series Forecasting with Python](http://machinelearningmastery.com/feature-selection-time-series-forecasting-python/)
 - [Simple Time Series Forecasting Models to Test So That You Don’t Fool Yourself](http://machinelearningmastery.com/simple-time-series-forecasting-models/)
 - [Autoregression Models for Time Series Forecasting With Python](http://machinelearningmastery.com/autoregression-models-time-series-forecasting-python/)

## Out-of-Sample Forecasts with ARIMA

This machine learning technique is broken down into the following 5 steps:

1. Dataset Description
2. Split Dataset
3. Develop Model
4. One-Step Out-of-Sample Forecast
5. Multi-Step Out-of-Sample Forecast

### Dataset Description

In [13]:
import pandas as pd
import numpy as np

import warnings

from sklearn.metrics import mean_squared_error
from statsmodels.tsa.arima_model import ARIMA

import matplotlib.pyplot as plt
%matplotlib inline

In [14]:
# load dataset
data = pd.read_csv('data/slo_weather_history.csv', index_col=0)

# display first few rows
data.head()

Unnamed: 0_level_0,dew_point_f_avg,dew_point_f_high,dew_point_f_low,events,humidity_%_avg,humidity_%_high,humidity_%_low,precip_in_sum,sea_level_press_in_avg,sea_level_press_in_high,sea_level_press_in_low,temp_f_avg,temp_f_high,temp_f_low,visibility_mi_avg,visibility_mi_high,visibility_mi_low,wind_gust_mph_high,wind_mph_avg,wind_mph_high
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2012-1-1,44.0,50.0,34.0,Fog,80.0,100.0,25.0,0.0,30.15,30.23,30.08,56.0,73.0,39.0,6.0,10.0,0.0,0.0,1.0,8.0
2012-1-2,47.0,52.0,43.0,Fog,93.0,100.0,63.0,0.0,30.23,30.3,30.19,52.0,63.0,42.0,4.0,10.0,0.0,0.0,3.0,14.0
2012-1-3,43.0,50.0,37.0,Fog,85.0,100.0,32.0,0.01,30.24,30.28,30.17,58.0,77.0,39.0,6.0,10.0,0.0,0.0,2.0,10.0
2012-1-4,42.0,47.0,37.0,,69.0,96.0,33.0,0.0,30.24,30.3,30.2,56.0,73.0,39.0,10.0,10.0,8.0,0.0,1.0,9.0
2012-1-5,42.0,51.0,36.0,,66.0,93.0,23.0,0.0,30.15,30.22,30.09,60.0,78.0,42.0,10.0,10.0,7.0,22.0,4.0,18.0


### Split Data

In [15]:
split_point = len(data) - 7
data_train, data_test = data[0:split_point], data[split_point:]

print('Training data: %d, Test data: %d' % (len(data_train), len(data_test)))

Training data: 1971, Test data: 7


### Develop Model

The data doens't have a strong seasonal component, but we decided to neutralize it and make it stationary by taking the seasonal difference. That is, we can take the observation for a day and subtract the observation from the same day one year ago.

We can invert this operation by adding the value of the observation one year ago. We will need to do this to any forecasts made by a model trained on the seasonally adjusted data.

In [16]:
# create a differenced series
def difference(data, interval=1):
    diff = list()
    for i in range(interval, len(data)):
        value = data[i] - data[i - interval]
        diff.append(value)
    return np.array(diff)

In [17]:
# invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

In [19]:
# seasonal difference
X = data['temp_f_low'].values
days_in_year = 365
differenced = difference(X, days_in_year)

In [20]:
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)

# print summary of fit model
print(model_fit.summary())

                              ARMA Model Results                              
Dep. Variable:                      y   No. Observations:                 1613
Model:                     ARMA(7, 1)   Log Likelihood               -4913.777
Method:                       css-mle   S.D. of innovations              5.090
Date:                Sat, 10 Jun 2017   AIC                           9847.554
Time:                        20:26:06   BIC                           9901.412
Sample:                             0   HQIC                          9867.545
                                                                              
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.6724      0.418      1.608      0.108        -0.147     1.492
ar.L1.y       -0.0571      0.272     -0.210      0.834        -0.591     0.477
ar.L2.y        0.4525      0.164      2.753      0.0

### One-Step Out-of-Sample Forecast

In [21]:
# one-step out-of sample forecast
forecast = model_fit.forecast()[0]

# invert the differenced forecast to something usable
forecast = inverse_difference(X, forecast, days_in_year)

print('Forecast: %f, Actual: %f' % (forecast, data_test.iloc[0]['temp_f_low']))

Forecast: 53.500663, Actual: 57.000000


### Multi-Step Out-of-Sample Forecast

In [23]:
# multi-step out-of-sample forecast
forecast = model_fit.forecast(steps=7)[0]

# invert the differenced forecast to something usable
history = [x for x in X]
day = 1

tests = []
predictions = []

for yhat in forecast:
    test = data_test.iloc[day - 1]['temp_f_low']
    predicted = inverse_difference(history, yhat, days_in_year)
    
    print('Day %d -- Forecast: %f, Actual: %f' % (day, predicted, test))
    history.append(predicted)
    day += 1
    
    tests.append(test)
    predictions.append(predicted)
    
test_score = np.sqrt(mean_squared_error(tests, predictions))
print('Test RMSE: %.3f' % test_score)

Day 1 -- Forecast: 53.500663, Actual: 57.000000
Day 2 -- Forecast: 53.421839, Actual: 54.000000
Day 3 -- Forecast: 52.347355, Actual: 48.000000
Day 4 -- Forecast: 53.944593, Actual: 53.000000
Day 5 -- Forecast: 53.859196, Actual: 53.000000
Day 6 -- Forecast: 55.152610, Actual: 52.000000
Day 7 -- Forecast: 57.190460, Actual: 51.000000
Test RMSE: 3.409
