## What is VAR (Vector Autoregression)

Vector Autoregression (VAR) is a multivariate forecasting algorithm that is used when two or more time series influence each other.

That means, the basic requirements in order to use VAR are:

  * You need atleast two time series (variables)
  * The time series should influence each other.
  
It is considered as an Autoregressive model because, each variable (Time Series) is modeled as a function of the past values, that is the predictors are nothing but the lags (time delayed value) of the series.

The primary difference is those models are uni-directional, where, the predictors influence the Y and not vice-versa. Whereas, Vector Auto Regression (VAR) is bi-directional. That is, the variables influence each other.

In [1]:
import data_prep_helper
from statsmodels.tsa.vector_ar.var_model import VAR
import pandas as pd
from math import sqrt
from sklearn.metrics import mean_squared_error
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np

  data_klasses = (pandas.Series, pandas.DataFrame, pandas.Panel)


In [2]:
do = data_prep_helper.ChartData()
do.apply_boll_bands("bitcoin_hist", append_chart=True)

In [3]:
corr_df = do.chart_df

In [4]:
cols = corr_df.columns

As the value of the test statistic > the critical value at all confidence intervals, we reject the null hypothesis that the series is stationary. --> isn't stationary

So in summary, the ADF test has an alternate hypothesis of linear or difference stationary, while the KPSS test identifies trend-stationarity in a series.

In [12]:
#creating the train and validation set
train = corr_df[:int(0.8*(len(corr_df)))]
valid = corr_df[int(0.8*(len(corr_df))):]
valid_index = valid.index

In [13]:
model = VAR(endog=train)
model_fit = model.fit()

# make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid))

#converting predictions to dataframe

cols = train.columns
valid_pred = pd.DataFrame(prediction, columns=cols, index=valid_index)

#check rmse
for i in cols:
    print('rmse value for', i, 'is : ', sqrt(mean_squared_error(valid_pred[i], valid[i])))

rmse value for bitcoin_Price is :  3644.6994885634576
rmse value for sp500_Price is :  435.68586605910775
rmse value for dax_Price is :  949.7486705246323
rmse value for googl_Price is :  244.53926406285356
rmse value for gold_Price is :  4.255034559415077
rmse value for bitcoin_Google_Trends is :  528.3699709037252
rmse value for cryptocurrency_Google_Trends is :  48.62187664688103
rmse value for trading_Google_Trends is :  501.52006429502416
rmse value for bitcoin_pos_sents is :  0.021123142533208798
rmse value for bitcoin_neg_sents is :  0.02002391435487122
rmse value for bitcoin_quot_sents is :  0.7943693860719941
rmse value for economy_pos_sents is :  0.016653897519113545
rmse value for economy_neg_sents is :  0.02097883737214715
rmse value for economy_quot_sents is :  0.3319244307636539
rmse value for bitcoin_30_day_ma is :  3522.2119617684048
rmse value for bitcoin_30_day_std is :  290.84213689392436
rmse value for bitcoin_boll_upp is :  3826.4813375633594
rmse value for bitcoin


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



In [14]:
train = train.diff().dropna()

In [15]:
model = VAR(endog=train)
model_fit = model.fit()

# make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid))

#converting predictions to dataframe

cols = train.columns
valid_pred = pd.DataFrame(prediction, columns=cols, index=valid_index)

#check rmse
for i in cols:
    print('rmse value for', i, 'is : ', sqrt(mean_squared_error(valid_pred[i], valid[i])))


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In [16]:
valid_pred.tail(300)

Unnamed: 0_level_0,bitcoin_Price,sp500_Price,dax_Price,googl_Price,gold_Price,bitcoin_Google_Trends,cryptocurrency_Google_Trends,trading_Google_Trends,bitcoin_pos_sents,bitcoin_neg_sents,bitcoin_quot_sents,economy_pos_sents,economy_neg_sents,economy_quot_sents,bitcoin_30_day_ma,bitcoin_30_day_std,bitcoin_boll_upp,bitcoin_boll_low
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2019-02-26,3.189210e+01,2.880048e+00,2.913123e+01,1.718318e+00,-5.194635e-03,2.345787e+01,1.672519e+00,3.463108e+01,3.879721e-03,9.076133e-03,-6.486665e-01,2.962165e-02,-3.631640e-03,4.258080e-01,2.310471e+01,3.692067e+00,3.161385e+01,1.447058e+01
2019-02-27,1.303886e+14,-6.370036e+12,-2.462473e+13,-4.090529e+12,-5.141943e+10,-2.120241e+14,-2.583426e+13,-1.518064e+14,-1.525049e+09,-2.970170e+09,2.621576e+11,3.103649e+09,7.217067e+09,-4.510835e+10,4.279229e+13,9.783571e+12,6.235943e+13,2.322515e+13
2019-02-28,-9.675702e+13,1.597728e+12,-2.473901e+13,1.924145e+12,3.838627e+10,9.510776e+13,1.508393e+13,1.649267e+13,-7.130317e+07,3.808428e+09,-2.786360e+11,-3.271557e+09,-2.382365e+09,-2.845416e+10,-1.195719e+13,3.161096e+13,4.975290e+13,-7.421703e+13
2019-03-01,-1.443059e+26,5.832187e+24,1.017764e+25,4.146636e+24,5.314452e+22,1.970745e+26,2.504084e+25,1.192081e+26,1.345304e+21,3.680181e+21,-2.647114e+23,-2.820784e+21,-4.695668e+21,1.393526e+22,-4.860071e+25,3.353620e+24,-4.189346e+25,-5.530795e+25
2019-03-04,2.771462e+26,-1.118256e+25,-1.843612e+25,-7.980799e+24,-1.094261e+23,-4.116392e+26,-5.326829e+25,-2.478298e+26,-2.056812e+21,-7.904430e+21,6.499157e+23,7.701516e+21,1.298651e+22,-4.191100e+22,6.724650e+25,-3.324546e+24,6.286414e+25,7.314001e+25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-03-04,,,,,,,,,,,,,,,,,,
2020-03-05,,,,,,,,,,,,,,,,,,
2020-03-06,,,,,,,,,,,,,,,,,,
2020-03-09,,,,,,,,,,,,,,,,,,


In [54]:
df_cols = ["bitcoin_Price", 
           "bitcoin_30_day_ma", 
           "bitcoin_30_day_std", 
           "bitcoin_boll_upp",
           "bitcoin_boll_low",
           "bitcoin_Google_Trends", 
           "googl_Price", 
           "cryptocurrency_Google_Trends"]

valid_01 = valid[df_cols]
model = VAR(endog=train[df_cols])
model_fit = model.fit()

# make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid_01))

#converting predictions to dataframe

cols = df_cols
valid_pred = pd.DataFrame(prediction, columns=cols, index=valid_index)

#check rmse
for i in cols:
    print('rmse value for', i, 'is : ', sqrt(mean_squared_error(valid_pred[i], valid[i])))


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In [16]:
#make final predictions
num_forcast = 5
 
model = VAR(endog=corr_df)
model_fit = model.fit()
yhat = model_fit.forecast(model_fit.y, steps=num_forcast)

yhat_df = pd.DataFrame(yhat, columns=cols)

forecast_dates = pd.date_range(start=corr_df.index[-1], periods=num_forcast+1)[1:]

yhat_df = yhat_df.set_index(forecast_dates)


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



In [17]:
fig = make_subplots(
    rows=2, 
    cols=1, 
    shared_xaxes=True, 
    vertical_spacing=0.2,
    subplot_titles=(["Bitcoin Price Chart<br>with Validation<br>and Forecast",
                     "Bitcoin 30-day-Mean"])
)

fig.add_trace(go.Scatter(x=corr_df.index, 
                         y=corr_df['bitcoin_Price'],
                         name="BITCOIN Closing Price"), row=1, col=1)

fig.add_trace(go.Scatter(x=valid_pred.index, 
                         y=valid_pred['bitcoin_Price'],
                         name="BITCOIN Validation Prediction"), row=1, col=1)

fig.add_trace(go.Scatter(x=yhat_df.index, 
                         y=yhat_df['bitcoin_Price'],
                         name="BITCOIN Current Forecast"), row=1, col=1)

fig.add_trace(go.Scatter(x=corr_df.index, 
                         y=corr_df['bitcoin_30_day_ma'],
                         name="BITCOIN 30 Closing Price"), row=2, col=1)

fig.add_trace(go.Scatter(x=valid_pred.index, 
                         y=valid_pred['bitcoin_30_day_ma'],
                         name="Validation Prediction"), row=2, col=1)

fig.add_trace(go.Scatter(x=yhat_df.index, 
                         y=yhat_df['bitcoin_30_day_ma'],
                         name="Current Forecast"), row=2, col=1)

fig.update_layout(height=1000, width=1500, title_text="Bitcoin Prediction using VAR (Vector Autoregression)")


KeyError: 'bitcoin_30_day_ma'