# Vector Auto Regression

- In our previous SARIMAX example, the forecast variable y_t was influenced by the exogenous predictor variable, but not vice versa
- That is, the occurance of a holiday affected restaurant patronage but not the other way around
- However, there are some cases where variables affect eachother
- What kind of model can we use in these situations?
    - We can attempt to use Vector Auto Regression!

- All varaibles in the VAR enter the model in the same way:
    - Each variable has an equation explaining its evolution based on its own lagged values, the lagged values of the other model variables, and an error term
- VAR modelling does not require as much knowledge about the forces influencing a variable
- The only prior knowledge required is a list of variables which can be hypothesized to affect each other intertemporally
- A K-dimensional VAR model of order p, denoted VAR(p), considers each variable y_k in the system

The general steps involved in building a VAR model are:
- Examine the data
- Visualize the data
- Test for stationarity
- Select appropriate order p
- Instantiate the model and fit it to a training set
- If necessary, invert the earlier transformation
- Evaluate model predictions against a known test set
- Forecast the future

#### Data
- Personal Consumption Expenditures (Y1)
- M2 Money Stock (Y2)
    - Savings deposits
    - Small-denomination time deposits
    - Balances in retail money market mutual funds
    
What is the best value of p?
- Pyramid Auto Arima won't do the grid search for us, but we can easily run various p values through a loop and then check which model has the best AIC
- Recall AIC will also punish models for being too complex, even if they perform slightly better on some other metric
- So we expect to see a drop in AIC as p gets larger and then at a certain point (lag order p value) an increasing AIC
- We'll also need to manually check for stationarity and difference the time series if they are not stationary
- In the case of this lecture, we'll notice the time series require different differencing amounts
    - However for the sake of the lecture - we'll difference them the same amount in order to make sure they have the same number of rows

In [1]:
import numpy as np
import pandas as pd
%matplotlib inline

# Load specific forecasting tools
from statsmodels.tsa.api import VAR, DynamicVAR
from statsmodels.tsa.stattools import adfuller
from statsmodels.tools.eval_measures import rmse

# Ignore harmless warnings
import warnings
warnings.filterwarnings("ignore")

# Load datasets
df = pd.read_csv('../UnusedData/M2SLMoneyStock.csv',index_col=0, parse_dates=True)
df.index.freq = 'MS'

sp = pd.read_csv('../UnusedData/PCEPersonalSpending.csv',index_col=0, parse_dates=True)
sp.index.freq = 'MS'

FileNotFoundError: [Errno 2] File b'../UnusedData/M2SLMoneyStock.csv' does not exist: b'../UnusedData/M2SLMoneyStock.csv'

# See real course notes for this lecture - I don't have access to the CSVs required