# Vector Auto Regression (VAR) modelling

### Import of relevant modules and libraries

In [2]:
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

### Import and manipulation of the relevant datasets

For our datasets, we will use data provided by FRED which will allow us to see how US GDP and unemployment rate interact from 1948 to 2023. 

In [3]:
gdp_data = pd.read_excel(r"C:\Users\pjhop\OneDrive\Documents\Programming & Coding\Python\Projects\Datasets\GDP.xls", header=10, index_col=0)
unemployment_rate = pd.read_excel(r"C:\Users\pjhop\OneDrive\Documents\Programming & Coding\Python\Projects\Datasets\UNRATE.xls", header=10, index_col=0)

We have specified the `header` term in our `read_excel` command, this specifies the row of the excel sheet to use as the header and avoids retrieving irrelevant information stored in the first ten rows. We have also specified the `index_col` to use the date variable as our index.

In [4]:
gdp_data.head()

Unnamed: 0_level_0,GDP
observation_date,Unnamed: 1_level_1
1947-01-01,243.164
1947-04-01,245.968
1947-07-01,249.585
1947-10-01,259.745
1948-01-01,265.742


In [5]:
unemployment_rate.head()

Unnamed: 0_level_0,UNRATE
observation_date,Unnamed: 1_level_1
1948-01-01,3.4
1948-02-01,3.8
1948-03-01,4.0
1948-04-01,3.9
1948-05-01,3.5


When looking at the data, we can see how the dates in the two columns do not align, the manufacturing output recordings begin later than gdp data. We use the merge function to combine our datasets and retain the data for which the dates are present in both. 

In [6]:
df = pd.merge(gdp_data, unemployment_rate, how='inner', left_index=True, right_index=True)

In [7]:
df.head()

Unnamed: 0_level_0,GDP,UNRATE
observation_date,Unnamed: 1_level_1,Unnamed: 2_level_1
1948-01-01,265.742,3.4
1948-04-01,272.567,3.9
1948-07-01,279.196,3.6
1948-10-01,280.366,3.7
1949-01-01,275.034,4.3


## Model fit and results

The Vector Autoregression model is a method used to forecast two or more time series, by modeling their interdependencies. The main assumption of the model is that the time series must depend on its own past values and the past values of the other time series in the system.

It can be used to model interactions between economic variables, such as GDP and the unemployment rate. This is what we will model in our example.  

In [10]:
model = sm.tsa.VAR(df)
results = model.fit(2)

  self._init_dates(dates, freq)


In [11]:
results.summary()

  Summary of Regression Results   
Model:                         VAR
Method:                        OLS
Date:           Sun, 06, Aug, 2023
Time:                     20:29:22
--------------------------------------------------------------------
No. of Equations:         2.00000    BIC:                    8.82807
Nobs:                     300.000    HQIC:                   8.75402
Log likelihood:          -2147.05    FPE:                    6030.69
AIC:                      8.70461    Det(Omega_mle):         5834.59
--------------------------------------------------------------------
Results for equation GDP
               coefficient       std. error           t-stat            prob
----------------------------------------------------------------------------
const           -77.165421        33.445376           -2.307           0.021
L1.GDP            1.308303         0.087759           14.908           0.000
L1.UNRATE       117.252053        18.029142            6.503           0.000
L

When modelling, we can see from the results that the relatiosnhip between the unemployment rate, gdp and the lagged variables are statistically significant for all coefficients. 