# Topic 38: Time Series Models

https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/

* AR model (autoregression)
* MA model (moving average)
* ARMA model (autoregression + moving average)
* Differencing model
* ARIMA model (autoregression + differencing + moving average)
* SARIMA model (seasonal ARIMA)
* ARIMAX model (ARIMA + exogenous variables)
* SARIMAX model (seasonal ARIMA + exogenous variables)

## Auto-regressive Time Series Model

An autoregression model makes an assumption that the observations at previous time steps are useful to predict the value at the next time step. It is one of the simplest time series models in which we use a linear model to predict the value at the present time using the value at the previous time. 

<p style='text-align:center; font-size: 30px;'>𝑌<sub>t</sub>=𝜙<sub>1</sub>𝑌<sub>𝑡−1</sub>+𝜖<sub>𝑡</sub></p>

The numeral one (1) denotes that the next instance is solely dependent on the previous instance.  The 𝜙(phi) is a coefficient which we seek so as to minimize the error function.

The order of AR is the number of lag terms we are using to predict the present value (AR(1) uses only 1 lag - one value directly preceding the value you are trying to predict, AR(2) use the two values directly preceding the value you are trying to predict) 

#### How do we determine the order aka how many lag terms do we include? 

Using ACF and PACF! 


<img src='../resources/AR(1).png'>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-notebook')
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA

# Load Data and Process Data

In [None]:
#read in csv
taxes = pd.read_csv('../resources/google-trends_taxes_us.csv', header=1).iloc[:-1]

#set index to datetime and set frequency to Month Start ('MS')
taxes['Month'] = pd.to_datetime(taxes['Month'], infer_datetime_format=True)
taxes = taxes.set_index('Month')
taxes.index.freq = 'MS'

#plot of data to see visualize trends
taxes.plot()

Nope.  Do you remember what we did in the last class to make it stationary?

We logged it to help remove the trend and changing covariance, took a 12 month difference to remove seasonality, and took another difference of 1 month.

## Making Data Stationary

In [None]:
adfuller(taxes)

In [None]:
stationary_taxes = np.log(taxes).diff().diff()
stationary_taxes = stationary_taxes.dropna()
adfuller(stationary_taxes)

# Train-Test Split for Time Series

When dividing your time series data for training and testing, we don't split it randomly like for some other kinds of modeling.  

Why Not?


*YOUR ANSWER HERE*

In [None]:
# train-test-split for time series isn't random!
split = int(len(taxes) * .9)
full_train, holdout = stationary_taxes.iloc[:split], stationary_taxes.iloc[split:]

second_split = int(len(full_train) * .9)
train, test = full_train[:second_split], full_train[second_split:]

### AR(1) Model

What's going on below?  We have ARIMA(order=(1,0,0)).fit()

First off, we are going to learn about MA, or moving average, models next.  The I denotes differencing that the model will perform (in case we didn't do it ourselves) to make the data stationary, and is called the Integrated component.

AR(auto-regressive)

I(Integrated)

MA(moving average)

ARIMA is a combination model, or an ARMA model with differencing.

The 'order' argument tells the ARIMA model which lags to include in the model and many times to difference it.
Order is often represented as (p,d,q)

p = lags used in the AR part

d = order of differencing to perform

q = lags to use in the MA part.

In the model below we will use (1,0,0) which is equivalent to just auto-regressive model only using 1 lag.

In [None]:
# define a function to evaluate our models for us.
def evaluate_model(model, data, train, test, train_predict_start=0):
    """
    Takes a fitted ARIMA model, the original data, the train split and the test split.
    optionally, can take a training starting point between 0 and len(train).  default = 0
    optionally, can take forecast length to determine how far to forecast
    shows a chart of data, train prediction, and test prediction
    prints the MSE of the training prediction and the testing prediction
    returns a summary object from model.summary()
    """
    trainpreds = model.predict()
    testpreds = model.forecast(len(test))

    ax = plt.subplot(111)
    trainpreds.name = 'training prediction'
    testpreds.name = 'testing prediction'
    trainpreds.plot(ax=ax)
    testpreds.plot(ax=ax)
    data.plot(ax=ax, color='Yellow')
    plt.legend()
    plt.show()

    training_MSE = mean_squared_error(train, trainpreds)**.5
    testing_MSE = mean_squared_error(test, testpreds)**.5
    print('Training RMSE = ', training_MSE)
    print('Testing RMSE = ', testing_MSE)
    
    return model.summary()

In [None]:
# order = (p, d, q)
# p - autoregressive
# d - differences
# q - moving average


ar1 = ARIMA(train, order=(1,0,0)).fit()
evaluate_model(ar1, stationary_taxes, train, test)

#### Interpretation of AR(1) model

How did our model do?  Let's interpret:

The training preds are created by the `ar1.predict()` call.  This makes a prediction for each point in the data, based on the actual previous data.  So the prediction for time period k is used by regressing on the values for time points 0:k-1.

The testing preds are created by `ar1.forecast()`.  Forecast predicts values for time periods after the stopping point of the training set passed.  It returns a number of predictions equal to the value passed, so `ar1.forecast(len(test))` will make a forecast a number of months ahead equal to the size of the testing slice of the data.

What do you notice, visually comparing the actual data to the predictions?

*YOUR ANSWER HERE*

If we look at the coefficients:
const = intercept

ar.L1 = coefficient for the first lag

sigma2 = variance of the error term

### AR(2) Model

What is the difference between an AR(2) and an AR(1) model?  Write your answer below.

*YOUR ANSWER HERE*

In [None]:
ar2 = ARIMA(train, order=(2,0,0)).fit()
evaluate_model(ar2, stationary_taxes, train, test)

#### Interpretation of AR(2) model

Did we do better?  The training predictions were a little better, but we still don't realy have a great model yet.

## Moving Average Time Series Model

Sometimes, a past value is not a useful indicator of what value will come next. Consider a system that is subject to a lot of shocks/volatility. If a previous time period experiences a shock it may cause an error for future predictions if we just that value. A moving average model helps address this behavior. 

A moving average term in a time series model is a past error (multiplied by a coefficient).

An MA model assumes present value is related to errors in the past - includes memory of past errors


<p style='text-align: center; font-size:30px;'>𝑌<sub>t</sub>=μ + 𝜖<sub>𝑡</sub>+𝜃<sub>1</sub>𝜖<sub>𝑡−1</sub></p>

For more details on how this model is fit: https://stats.stackexchange.com/questions/26024/moving-average-model-error-terms/74826#74826 

## Differencing Model aka Integrated Model

The differenced value is equal to the present value minus the value at the next lag. A time series which needs to be differenced to be made stationary is said to be an "integrated" time series.

<p>If d=0:  y<sub>t</sub>  =  Y<sub>t</sub></p>

If d=1:  y<sub>t</sub> =  Y<sub>t</sub> - Y<sub>t-1</sub>

If d=2:  y<sub>t</sub> =  (Y<sub>t</sub> - Y<sub>t-1</sub>) - (Y<sub>t-1</sub> - Y<sub>t-2</sub>)  =  Y<sub>t</sub> - 2Y<sub>t-1</sub> + Y<sub>t-2</sub>

## ARIMA

Combines AR, Differencing (I), and MA

The differenced value is equal to the present value minus the value at the next lag. A time series which needs to be differenced to be made stationary is said to be an "integrated" time series.

<p style ='text-align:center; font-size: 30px;'>𝑌<sub>t</sub>=𝜙<sub>1</sub>𝑌<sub>𝑡−1</sub>+𝜙<sub>2</sub>𝑌<sub>𝑡−2</sub>...𝜙<sub>𝑝</sub>𝑌<sub>t−𝑝</sub>+𝜖<sub>𝑡</sub>+𝜃<sub>1</sub>𝜖<sub>𝑡−1</sub>+𝜃<sub>2</sub>𝜖<sub>𝑡−2</sub>+...𝜃<sub>𝑞</sub>𝜖<sub>𝑡−𝑞</sub></p>



ARIMA has three main parameters we need to input, p, d, & q

<b>p:</b> The number of AR terms we are going to include<br/>
<b>d:</b> The number of times we are differencing our data<br/>
<b>q:</b> The number MA terms we are going to include

The ACF help us to find the right logs to use with the MA component of our model, or our `q` component.

In [None]:
#ACF/PACF to determine which terms in include (MA or AR or Both?)
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

#take the log of the original taxes data, since we both logged and differenced to make our data stationary before.  
# ARIMA does not log the data for us.
log_taxes = np.log(taxes)

#remember we set 'split' to be .9 * len(taxes)
log_full_train, log_holdout = log_taxes[:split], log_taxes[split:]

# log_train, log_test = log_full_train[:second_split], log_full_train[second_split:]
log_train, log_test = log_full_train[:second_split], log_full_train[second_split:]

#plot autocorrelation for each lag (alpha is confidence interval).  We used the differenced data because we are looking 
# because we will difference it in the ARIMA model.  
plot_acf(log_train.diff().diff().dropna(), alpha=.05)
plt.show()

We use the PACF to determine logs for the AR part of the model, the `p` component.

In [None]:
plot_pacf(log_train.diff().diff().dropna(), alpha=.05)
plt.show()

**Note**
The ARIMA model reverses the differencing in the predictions, so the predictions from an ARIMA model with `d=2` will return predictions on the same scale as the original data.  This is one of the benefits of this model type.

In [None]:
from statsmodels.tsa.arima.model import ARIMA

arima1 = ARIMA(log_train, order=(2,2,1)).fit()

evaluate_model(arima1, log_taxes, log_train, log_test)

Examining the p-values for the lags tells us that many of the lags we included are not statistically relevant.  We can give an ARIMA model a tuple of lags, rather than just a integer.  An integer includes all lags up to the integer value.  A tuple only includes those lags in the tuple.  

Examining the coefficients above, which lags should we try including in our model next time?

*YOUR ANSWER HERE*

Even better, we could use a SARIMAX model to account for the seasonality in the data, rather than trying to figure out the right lags to fit on for them.  There is some information about SARIMAX models at the bottom of this notebook.

### Tuning P, D, and Q

In practice, we often need to tune p,d, and q using strategies similar to hyperparameter tuning in other models.  Often we try many different values and see which ones work best.  However ACF and PACF graphs help us make informed guesses rather than random ones.

## ARIMAX

ARIMA with eXogenous variables - extend ARIMA to include additional variables that might have an impact on what we are are trying to forecast. 

Considerations: 

1) Does our exogenous variable actually impact our endogenous variable (and not the other way around - use granger causality test) 

2) Exogenous variables need to be differenced at the same order as the endogenous 

### Endogenous Variables

So far we've only been using Endogenous variables.  Those are the variables that we trying to forecast, in this case the valume of searches for taxes on Google Search in the USA.

### Exogenous Variables

Exogenous Variables are extra variables outside of the data we are trying to predict.  These might be holidays or weekends, they might be search volumes for other terms, or anything else that can be aligned with the datetime period and are not endogenous.  



<img src='../resources/seasonal_data.png'/>

### SARIMAX

SARIMAX integrates the ARIMA model we've been using, as well as dealing more gracefully with (S)easonal trends and  e(X)ogenous variables.

You can read more about SARIMAX models below


https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html

1. Plot data, see if there are trends <br/>
2. If trends, remove them (differencing, log transform, etc) <br/>
3. If seasonal trends are there determine periodicity. <br/>
4. ACF and PACF of  data <br/>
5. Determine order of differencing, AR, or MA (or both) <br/>
6. Build model and evaluate 


seasonal_order - (p,d,q,s)

s = periodicity of the seasonality, or how many periods between each season.

In our case, it's 12 periods, or 1 year.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

res = seasonal_decompose(log_full_train)
plot_acf(res.seasonal)
plt.show()

In [None]:
plot_pacf(res.seasonal)
plt.show()

Try the below SARIMAX model on your own.  What might be some exogenous variables you coudl use for this dataset?

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

sarimax = SARIMAX(log_train, exog=None, order=(1, 2, 2), seasonal_order=(4, 2, 2, 12)).fit()
evaluate_model(sarimax, log_taxes, log_train, log_test)

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

sarimax2 = SARIMAX(log_full_train, exog=None, order=(1, 2, 2), seasonal_order=(4, 2, 2, 12)).fit()
evaluate_model(sarimax2, log_taxes, log_full_train, log_holdout)

# Additional Resources

* Modeling cheat sheet: https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
* AutoARIMA: https://towardsdatascience.com/time-series-forecasting-using-auto-arima-in-python-bb83e49210cd
* SARIMAX Project walkthrough: https://towardsdatascience.com/newyork-taxi-demand-forecasting-with-sarimax-using-weather-data-d46c041f3f9c