# Presentation

In this notebook, we are making a few graphs that will help us understand the data.

In [1]:
import pandas as pd
import sys

%matplotlib inline

sys.path.append('./src/')
from plots import *

In [2]:
data = pd.read_csv("data/slo_weather_history.csv")
data.head()

Unnamed: 0,date,dew_point_f_avg,dew_point_f_high,dew_point_f_low,events,humidity_%_avg,humidity_%_high,humidity_%_low,precip_in_sum,sea_level_press_in_avg,...,sea_level_press_in_low,temp_f_avg,temp_f_high,temp_f_low,visibility_mi_avg,visibility_mi_high,visibility_mi_low,wind_gust_mph_high,wind_mph_avg,wind_mph_high
0,2012-01-01,44.0,50.0,34.0,Fog,80.0,100.0,25.0,0.0,30.15,...,30.08,56.0,73.0,39.0,6.0,10.0,0.0,0.0,1.0,8.0
1,2012-01-02,47.0,52.0,43.0,Fog,93.0,100.0,63.0,0.0,30.23,...,30.19,52.0,63.0,42.0,4.0,10.0,0.0,0.0,3.0,14.0
2,2012-01-03,43.0,50.0,37.0,Fog,85.0,100.0,32.0,0.01,30.24,...,30.17,58.0,77.0,39.0,6.0,10.0,0.0,0.0,2.0,10.0
3,2012-01-04,42.0,47.0,37.0,,69.0,96.0,33.0,0.0,30.24,...,30.2,56.0,73.0,39.0,10.0,10.0,8.0,0.0,1.0,9.0
4,2012-01-05,42.0,51.0,36.0,,66.0,93.0,23.0,0.0,30.15,...,30.09,60.0,78.0,42.0,10.0,10.0,7.0,22.0,4.0,18.0


In [3]:
plot_all_data(data)

## Simple Time Series Forecasting Models

### Persistence Forecast

The persistence forecast involves using the previous observation to predict the next time step.

For this reason, the approach is often called the naive forecast.

Instead of blindly using the previous observation, in this section, we will look at automating the persistence forecast and evaluate the use of any arbitrary prior time step to predict the next time step.

We will explore using each of the prior 730 days (2 years) of point observations in a persistence model. Each configuration will be evaluated using the test harness and RMSE scores collected. We will then display the scores and graph the relationship between the persisted time step and the model skill.

In [4]:
simple_time_series_forecast_persistence_rmse(data)

Unfortunately, from the results, it is clear that the best result is achieved from t-1 with an RMSE of 2.970 &deg;F. The second best result is from t-365 with an RMSE of 6.568 &deg;F.

### Expanding Window Forecast

An expanding window refers to a model that calculates a statistic on all available historic data and uses that to make a forecast. It is an expanding window because it grows as more real observations are collected.

Two good starting point statistics to calculate are the mean and the median historical observation.

In [5]:
simple_time_series_forecast_expanding_window_rmse(data)

RMSE for mean: 7.260
RMSE for median: 7.004


The plot shows what a poor forecast looks like and how it does not follow the movements of the data at all, other than a slight rising trend.

### Rolling Window Forecast

A rolling window model involves calculating a statistic on a fixed contiguous block of prior observations and using it as a forecast. It is much like the expanding window, but the window size remains fixed and counts backwards from the most recent observation. It may be more useful on time series problems where recent lag values are more predictive than older lag values.

We will automatically check different rolling window sizes from 1 to 730 days (2 years) and start by calculating the mean observation and using that as a forecast.

In [6]:
simple_time_series_forecast_rolling_window_rmse(data)

On both plots, we can see that best results were achieved with a window size of w=1 with an RMSE of 4.311342 &deg;F, which was essentially a t-1 persistence model.

We could imagine better results with a weighted combination of window observations, this idea leads to using linear models such as ARIMA and Autoregression (AR).

## Out-of-Sample Forecasts with ARIMA

### One-Step Out-of-Sample Forecast
ARIMA models are great for one-step forecasts.

A one-step forecast is a forecast of the very next time step in the sequence from the available data used to fit the model.

The statsmodel ARIMAResults object provides a forecast() function for making predictions.

By default, this function makes a single step out-of-sample forecast. As such, we can call it directly and make our forecast. The result of the forecast() function is an array containing the forecast value, the standard error of the forecast, and the confidence interval information. Now, we are only interested in the first element of this forecast, as follows.

### Multi-Step Out-of-Sample Forecast

We can also make multi-step forecasts using the forecast() function.

It is common with weather data to make one week (7-day) forecasts, so in this section we will look at predicting the minimum daily temperature for the next 7 out-of-sample time steps.

The forecast() function has an argument called steps that allows you to specify the number of time steps to forecast. By default, this argument is set to 1 for a one-step out-of-sample forecast. We can set it to 7 to get a forecast for the next 7 days.

In [7]:
out_of_sample_forecast_arima(data)

Day 1 -- Forecast: 53.191107, Actual: 50.000000
Day 2 -- Forecast: 51.915044, Actual: 52.000000
Day 3 -- Forecast: 48.315959, Actual: 53.000000
Day 4 -- Forecast: 46.614705, Actual: 51.000000
Day 5 -- Forecast: 47.324334, Actual: 55.000000
Day 6 -- Forecast: 51.281892, Actual: 48.000000
Day 7 -- Forecast: 56.719232, Actual: 48.000000
Test RMSE: 5.306


In [8]:
fixed_autoregression_forecast(data)

Day 1 -- Forecast: 50.641760, Actual: 50.000000
Day 2 -- Forecast: 51.111098, Actual: 52.000000
Day 3 -- Forecast: 51.143308, Actual: 53.000000
Day 4 -- Forecast: 51.286968, Actual: 51.000000
Day 5 -- Forecast: 51.153076, Actual: 55.000000
Day 6 -- Forecast: 51.454193, Actual: 48.000000
Day 7 -- Forecast: 51.247733, Actual: 48.000000

Test RMSE: 2.450


In [9]:
rolling_autoregression_forecast(data)

Day 1 -- Forecast: 50.641760, Actual: 50.000000
Day 2 -- Forecast: 50.677101, Actual: 52.000000
Day 3 -- Forecast: 51.739833, Actual: 53.000000
Day 4 -- Forecast: 52.533108, Actual: 51.000000
Day 5 -- Forecast: 51.006448, Actual: 55.000000
Day 6 -- Forecast: 54.041265, Actual: 48.000000
Day 7 -- Forecast: 48.966444, Actual: 48.000000

Test RMSE: 2.915
