## What is VAR (Vector Autoregression)

Vector Autoregression (VAR) is a multivariate forecasting algorithm that is used when two or more time series influence each other.

That means, the basic requirements in order to use VAR are:

  * You need atleast two time series (variables)
  * The time series should influence each other.
  
It is considered as an Autoregressive model because, each variable (Time Series) is modeled as a function of the past values, that is the predictors are nothing but the lags (time delayed value) of the series.

The primary difference is those models are uni-directional, where, the predictors influence the Y and not vice-versa. Whereas, Vector Auto Regression (VAR) is bi-directional. That is, the variables influence each other.

In [57]:
import data_prep_helper
from statsmodels.tsa.vector_ar.var_model import VAR
import pandas as pd
from math import sqrt
from sklearn.metrics import mean_squared_error
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np

In [59]:
do = data_prep_helper.ChartData()
do.apply_boll_bands("bitcoin_hist", append_chart=True)

In [60]:
corr_df = do.chart_df

In [61]:
cols = corr_df.columns

In [62]:
#creating the train and validation set
train = corr_df[:int(0.8*(len(corr_df)))]
valid = corr_df[int(0.8*(len(corr_df))):]
valid_index = valid.index

In [63]:
model = VAR(endog=train)
model_fit = model.fit()

# make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid))

#converting predictions to dataframe

cols = train.columns
valid_pred = pd.DataFrame(prediction, columns=cols, index=valid_index)

#check rmse
for i in cols:
    print('rmse value for', i, 'is : ', sqrt(mean_squared_error(valid_pred[i], valid[i])))

rmse value for bitcoin_Price is :  3644.6994885634576
rmse value for sp500_Price is :  435.68586605910775
rmse value for dax_Price is :  949.7486705246323
rmse value for googl_Price is :  244.53926406285356
rmse value for gold_Price is :  4.255034559415077
rmse value for bitcoin_Google_Trends is :  528.3699709037252
rmse value for cryptocurrency_Google_Trends is :  48.62187664688103
rmse value for trading_Google_Trends is :  501.52006429502416
rmse value for bitcoin_pos_sents is :  0.021123142533208798
rmse value for bitcoin_neg_sents is :  0.02002391435487122
rmse value for bitcoin_quot_sents is :  0.7943693860719941
rmse value for economy_pos_sents is :  0.016653897519113545
rmse value for economy_neg_sents is :  0.02097883737214715
rmse value for economy_quot_sents is :  0.3319244307636539
rmse value for bitcoin_30_day_ma is :  3522.2119617684048
rmse value for bitcoin_30_day_std is :  290.84213689392436
rmse value for bitcoin_boll_upp is :  3826.4813375633594
rmse value for bitcoin


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



In [None]:
train["bitcoin_Price"] = train["bitcoin_Price"].diff()
train["bitcoin_30_day_ma"] = train["bitcoin_30_day_ma"].diff()
train["bitcoin_30_day_std"] = train["bitcoin_30_day_std"].diff()
train["bitcoin_boll_upp"] = train["bitcoin_boll_upp"].diff()
train["bitcoin_boll_low"] = train["bitcoin_boll_low"].diff()
train["googl_Price"] = train["googl_Price"].diff()

train=train.dropna()

In [67]:
df_cols = ["bitcoin_Price", 
           "bitcoin_30_day_ma", 
           "bitcoin_30_day_std", 
           "bitcoin_boll_upp",
           "bitcoin_boll_low",
           "bitcoin_Google_Trends", 
           "googl_Price", 
           "cryptocurrency_Google_Trends"]

valid_01 = valid[df_cols]
model = VAR(endog=train[df_cols])
model_fit = model.fit()

# make prediction on validation
prediction = model_fit.forecast(model_fit.y, steps=len(valid_01))

#converting predictions to dataframe

cols = df_cols
valid_pred = pd.DataFrame(prediction, columns=cols, index=valid_index)

#check rmse
for i in cols:
    print('rmse value for', i, 'is : ', sqrt(mean_squared_error(valid_pred[i], valid[i])))

rmse value for bitcoin_Price is :  8513.391506571024
rmse value for bitcoin_30_day_ma is :  8329.18305454735
rmse value for bitcoin_30_day_std is :  696.2286402962807
rmse value for bitcoin_boll_upp is :  9623.45693438338
rmse value for bitcoin_boll_low is :  7068.889509747011
rmse value for bitcoin_Google_Trends is :  714.2096595953985
rmse value for googl_Price is :  1251.3813842495676
rmse value for cryptocurrency_Google_Trends is :  42.78587613516394



A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



In [69]:
valid_pred["bitcoin_Price"].head()

Date
2019-02-26    130.080958
2019-02-27    -98.956305
2019-02-28     99.357698
2019-03-01    -75.285316
2019-03-04     50.283790
Name: bitcoin_Price, dtype: float64

In [26]:
#make final predictions
num_forcast = 5
 
model = VAR(endog=corr_df)
model_fit = model.fit()
yhat = model_fit.forecast(model_fit.y, steps=num_forcast)

yhat_df = pd.DataFrame(yhat, columns=cols)

forecast_dates = pd.date_range(start=corr_df.index[-1], periods=num_forcast+1)[1:]

yhat_df = yhat_df.set_index(forecast_dates)


A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.



ValueError: Shape of passed values is (5, 18), indices imply (5, 8)

In [25]:
fig = make_subplots(
    rows=2, 
    cols=1, 
    shared_xaxes=True, 
    vertical_spacing=0.2,
    subplot_titles=(["Bitcoin Price Chart<br>with Validation<br>and Forecast",
                     "Bitcoin 30-day-Mean"])
)

fig.add_trace(go.Scatter(x=corr_df.index, 
                         y=corr_df['bitcoin_Price'],
                         name="BITCOIN Closing Price"), row=1, col=1)

fig.add_trace(go.Scatter(x=valid_pred.index, 
                         y=valid_pred['bitcoin_Price'],
                         name="BITCOIN Validation Prediction"), row=1, col=1)

fig.add_trace(go.Scatter(x=yhat_df.index, 
                         y=yhat_df['bitcoin_Price'],
                         name="BITCOIN Current Forecast"), row=1, col=1)

fig.add_trace(go.Scatter(x=corr_df.index, 
                         y=corr_df['bitcoin_30_day_ma'],
                         name="BITCOIN 30 Closing Price"), row=2, col=1)

fig.add_trace(go.Scatter(x=valid_pred.index, 
                         y=valid_pred['bitcoin_30_day_ma'],
                         name="Validation Prediction"), row=2, col=1)

fig.add_trace(go.Scatter(x=yhat_df.index, 
                         y=yhat_df['bitcoin_30_day_ma'],
                         name="Current Forecast"), row=2, col=1)

fig.update_layout(height=1000, width=1500, title_text="Bitcoin Prediction using VAR (Vector Autoregression)")


NameError: name 'yhat_df' is not defined