# Problem Session 7

In this problem session you will get some practice with:
* $\operatorname{AR}(p)$ models
* $\operatorname{MA}(q)$ models
* Time series cross validation

Question 1 and 2 use simulation to investigate $\operatorname{AR}(p)$ and $\operatorname{MA}(q)$ models respectively.

Question 3 applies a few different models to real data.  We use time series cross validation for model selection purposes.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 1) Autoregressive Models

Recall that an autoregressive model of order $p$ is defined as follows:

The $\operatorname{AR}(p)$ model is

$$
y_t = \beta_1 y_{t-1} + \beta_2 y_{t-2} + \dots  + \beta_p y_{t - p} + \epsilon_t
$$

where $\epsilon_t \sim \operatorname{NID}(0,\sigma^2)$

$p$ is a hyperparameter and the $\beta_i$ and $\sigma^2$ are parameters which need to be fit. 

##### a)

In this first problem we will *simulate* some $\operatorname{AR}(2)$ data.  In particular we will simulate the following:

$$
\begin{cases}
y_0 &= \epsilon_0\\
y_1 &= \epsilon_1\\
y_t &= 0.5 y_{t-1} + 0.2 y_{t-2} + \epsilon_t
\end{cases}
$$

with $\epsilon_t \sim \mathcal{N}(0,1)$

Write python code to simulate one realization of this process!

Hint:  You will need to use `np.random.normal` and a "for loop".

In [2]:
sample_size = 1000

# Initialize y as a numpy array of zeros of size sample_size 
y = 

# Assign the first two values to be draws from the standard normal distribution.
y[0] = 
y[1] = 

# Implement the recursive definition of y[i]
for i in range(2,sample_size):
    y[i] = 

# Note:  I simulated 10000 times and these asserts always passed.  
# Then I tried 100000 and one of them failed.
# So if you fail once you can be almost certain your code is wrong.
assert(type(y) == np.ndarray)
assert(len(y) == sample_size)
assert(np.abs(y.mean()) < 1)
assert(np.abs((y[1:]*y[:-1]).mean()) > 0.5)
assert(np.abs((y[1:]*y[:-1]).mean()) < 3)
assert(np.abs((y[50:]*y[:-50])).mean())


Plot the series.

##### b)

Plot the ACF and PACF plots of this time series.  Before making the plots, discuss what you *expect* to see with your group based on the theory covered in lecture.

In [5]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

##### c)  

We will now attempt to estimate the parameters by simply regressing $y_t$ on $y_{t-1}$ and $y_{t-2}$.

Write a function called `X_y_for_lags` which works as follows:

$$
\operatorname{X\_y\_for\_lags}([1,2,3,4,5,6],2) =  \left(
\begin{bmatrix} 
2 & 1\\
3 & 2\\
4 & 3\\
5 & 4
\end{bmatrix},
\begin{bmatrix} 3, 4, 5, 6\end{bmatrix} 
\right)
$$

Note:  I got stuck on this for quite a while myself.  I had an indexing error which was hard to resolve.  If you spend more than 5 minutes on this, just copy/paste from the solutions.

In [8]:
def X_y_for_lags(ts, num_lags):
    '''
    Inputs
        ts: A numpy array of size (n,) representing a time series
        num_lags: The number of lags to include in the resulting design matrix

    Outputs
        X: A numpy array of size (n - num_lags, num_lags). 
            The first column is lag 1, second column is lag 2, etc 
        y: The time series starting at entry num_lags
    '''
    # Your code here
    return X, y

# Test cases
assert np.array_equal(
    X_y_for_lags(np.array([1., 2., 3., 4., 5., 6.]), 2)[0], 
    np.array([
        [2., 1.], 
        [3., 2.], 
        [4., 3.], 
        [5., 4.]])
)

assert np.array_equal(
    X_y_for_lags(np.array([1., 2., 3., 4., 5., 6.]), 2)[1], 
    np.array([3., 4., 5., 6.])
)

Now fit a linear regression model to estimate the parameters.  How close do you get to recovering the parameters (which were $0.5$ and $0.2$)?

In [9]:
from sklearn.linear_model import 

In [None]:
# Instantiate the linear regression model
ar_model = 

# Fit the model

# Look at the coefficients


Instead of doing this manually, we can instead use a python package which can handle $\operatorname{AR}(p)$ model estimation.  One such package is `pmdarima`, which wraps statsmodels ARIMA packages.

ARIMA stands for AutoRegressive Moving Averages.  We can use ARIMA with just the AR part for our purposes.

In [11]:
import pmdarima as pm

`pm.ARIMA` has a kwarg called `order`.  If we set `order = (p,d,q)` then our ARIMA model has 

* An $\operatorname{AR}(p)$ component 
* Has been differenced $d$ times before estimation (and so needs to be "Integrated" $d$ times)
* Has an $\operatorname{MA}(q)$ component.

So we can fit an $\operatorname{AR}(2)$ model using `order = (2,0,0)`

In [None]:
arima = pm.ARIMA(order=(2, 0, 0))
arima.fit(y)

In [None]:
arima.params()[[1,2]]

This should be close to what we got using linear regression.

In [14]:
# It might be interesting to "run all cells above" 
# a few times to see the variability in parameter estimates.
# Note that this is 1000 time points from a series we *know* follows an  
# AR(2) process.  You can imagine how hard it is to estimate these things 
# on real data.

### 2) Moving Average models

The $\operatorname{MA}(q)$ model is

$$
y_t = \epsilon_t + \alpha_1 \epsilon_{t-1} + \alpha_2 \epsilon_{t-2} + \dots + \alpha_q \epsilon_{t-q}
$$

where $\epsilon_t \sim \operatorname{NID}(0,\sigma^2)$

$q$ is a hyperparameter and the $\alpha_i$ and $\sigma^2$ are parameters which need to be fit. 

##### a)

We will simulate some $\operatorname{MA}(2)$ data.  In particular we will simulate the following:

$$
y_t = \epsilon_t + 0.5\epsilon_{t-1} + 0.2\epsilon_{t-2}
$$

with $\epsilon_t \sim \mathcal{N}(0,1)$

Write python code to simulate one realization of this process!

Hint:  You will need to use `np.random.normal` but you will not need a "for loop"!

In [15]:
sample_size = 1000

# It may be useful to first define your epsilons and then sum appropriate shifts.

ts = 

# Note:  These asserts all passed 100000 times.
assert(len(ts) == sample_size)
assert((ts[1:]*ts[:-1]).mean() > 0.1)
assert(np.abs((ts[3:]*ts[:-3]).mean()) < 0.3)

Plot the time series:

##### b)

Plot the ACF and PACF plots of this time series.  Before making the plots, discuss what you *expect* to see with your group based on the theory covered in lecture.

##### c)

Parameter estimation for $\operatorname{MA}(q)$ models is tougher than for $\operatorname{AR}(p)$ models so we will not attempt this "by hand".

Fit an $\operatorname{MA}(2)$ model to the data using `pm.ARIMA`.

In [None]:
arima = 
arima.fit(ts)

In [None]:
arima.params()

In [21]:
# Again it might be nice to "run all cells above" a few times.

### 3) Lynx data

The Lynx dataset records the number of lynx skins collected by the Hudson’s Bay Company from 1821 to 1934.

In [22]:
from pmdarima.datasets import load_lynx
y = load_lynx(True)

In [23]:
# Use all but the last 20 years as our training set.
y_train = 

In [None]:
plt.plot(y_train.index, y_train.values)
plt.show()

##### a)  

Plot the ACF and PACF plots of the training data.  What models do they suggest we might try?  Discuss with your group.

Interestingly $\operatorname{PACF}(1)$ is positive and $\operatorname{PACF}(2)$ is negative. Think about what that means in terms of the regression coefficients!  This is what is driving the "boom/bust" cycle we observe.

##### b)

`pm.auto_arima` will search through different values of $(p,d,q)$ and attempt to find one that minimizes the [Akaike information criterion (AIC)](https://en.wikipedia.org/wiki/Akaike_information_criterion).  This information theoretic approach to model selection is an alternative to cross validation (it is an approximation of leave-out-one cross validation error).

In [27]:
from pmdarima import auto_arima

In [None]:
arima_model = auto_arima(y_train, trace = True)

In [None]:
# This is how we predict the next four values beyond the training data:

arima_model.predict(4)

`auto_arima` has selected a model with both an $\operatorname{AR}(2)$ component and an $\operatorname{MA}(2)$ component.  In other words, the model is:

$$
y_{t} = \beta_1 y_{t-1} + \beta_2 y_{t-2} +  \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2}
$$

where $\epsilon_t$ are Gaussian white noise.

##### c)

In this last section we will compare the following 4 models using time series cross validation:

* Model 0: A baseline "Naive" forecast
* Model 1: An $\operatorname{AR}(2)$ model which regresses $y_t$ on $y_{t-1}, y_{t-2}$
* Model 2: A model which regresses $y_t$ on $y_{t-1}, y_{t-2}$, and $y_{t-8}$.  This model is suggested by the significant value of $\operatorname{PACF}(8)$.
* Model 3: Whatever model is selected by `auto_arima` in each fold.  Note that we are allowing the order to change as we see new data!

Here is our cross validation scheme:

* We are focused on a forecasting horizon of one year.
* We will reserve the last 10 years as a testing set, so we will not look at them during cross validation.
* In each fold:
    * Fold 1: Train on [:-10] and then predict [-10].
    * Fold 2: Train on [:-9] and predict [-9].
    * $\vdots$
    * Fold 10: Train on [:-1] and predict on [-1]
* Store these predictions and compare them to y_train[-10:].


In [29]:
model1 = LinearRegression()
model2 = LinearRegression()

model0_preds = []
model1_preds = []
model2_preds = []
model3_preds = []

# Model 2 is "custom" so we will fit it using linear regression.
design_matrix, targets = X_y_for_lags(y_train.values, 8)

# Note:  you could also use TimeSeriesSplit here, but I find it more trouble than it is worth.
for i in range(-10, 0):
    X_tt, y_tt = 

    # The holdout data should have a single row!
    X_ho, y_ho = 

    # X_tt_1 is used for model 1 and should only use lags 1 and 2
    X_tt_1 = 

    # X_tt_2 is used for model 2 an should only use lags 1, 2 and 8
    X_tt_2 =

    # Again, these should have a single row.
    X_ho_1 = 
    X_ho_2 = 

    # Fit the models
    # Note:  you could alternatively define model1 using pm.ARIMA
    model1.fit(X_tt_1, y_tt)
    model2.fit(X_tt_2, y_tt)
    # model3 is auto_arima
    model3 = 

    # Model 0 is the naive forecast:  just predict the last training value.
    model0_preds.append()

    # Model 1 and 2 are linear regressions.  You should know how to predict on the
    # hold out set.
    model1_preds.append()
    model2_preds.append()

    # Model 3 is the auto_arima selected model.  We need one prediction beyond the training set.
    model3_preds.append()

In [None]:
plt.plot(y_train.index[-20:], y_train.iloc[-20:], label = 'data')
plt.plot(y_train.index[-10:], model0_preds, label = 'Naive')
plt.plot(y_train.index[-10:], model1_preds, label = 'AR(2)')
plt.plot(y_train.index[-10:], model2_preds, label = 'Custom AR')
plt.plot(y_train.index[-10:], model3_preds, label = 'auto_arima')
plt.legend()
plt.show()

None of these look especially great, but let's see if we at least found a model which outperforms our baseline.

In [31]:
from sklearn.metrics import mean_squared_error as mse

In [None]:
mse0 = mse(y_train[-10:],model0_preds)
mse1 = mse(y_train[-10:],model1_preds)
mse2 = mse(y_train[-10:],model2_preds)
mse3 = mse(y_train[-10:],model3_preds)

unordered_dict = {'Model 0':mse0, 'Model 1':mse1, 'Model 2':mse2, 'Model 3':mse3}
ordered_dict = dict(sorted(unordered_dict.items(), key=lambda item: item[1], reverse=True))
ordered_dict

##### d)

In order of decreasing MSE we have:

* Model 0: the naive baseline coming in last place
* Model 3: `auto_arima`
* Model 1: the $\operatorname{AR}(2)$
* Model 2: the "custom" $\operatorname{AR}$ model wins the prize!

See how model 2 does on the testing set! 

Let's take a look at how we did visually:

In [None]:
plt.plot(y.index[-20:], y.iloc[-20:], label = 'data')
plt.plot(y.index[-10:], model2_preds, label = 'final model')
plt.show()