# Autoregression

The autoregression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps.

The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p). For example, AR(1) is a first-order autoregression model.

The method is suitable for univariate time series without trend and seasonal components.

In [1]:
# AR example
from statsmodels.tsa.ar_model import AutoReg

In [4]:
# Lets use oil price ETF data from yahoo finance as an example
from yahoo_fin import stock_info as si
# Set the desired stock ticker symbol
TICKER = 'OIL'
# Set a start for historical analysis, end date is current day
startDate = '1/1/2020'
# populate a dataframe with historical data
df = si.get_data(TICKER, start_date = startDate)

In [5]:
df

Unnamed: 0,open,high,low,close,adjclose,volume,ticker
2020-01-02,19.000000,19.059999,18.969999,19.059999,19.059999,500,OIL
2020-01-03,19.620001,19.620001,19.620001,19.620001,19.620001,100,OIL
2020-01-06,19.750000,19.750000,19.549999,19.549999,19.549999,400,OIL
2020-01-07,19.530001,19.530001,19.530001,19.530001,19.530001,100,OIL
2020-01-08,18.680000,18.830000,18.680000,18.830000,18.830000,900,OIL
...,...,...,...,...,...,...,...
2022-03-15,30.080000,33.250000,29.680000,29.740000,29.740000,1153300,OIL
2022-03-16,30.639999,30.799999,29.330000,29.580000,29.580000,349500,OIL
2022-03-17,31.510000,32.470001,31.459999,31.980000,31.980000,234000,OIL
2022-03-18,32.560001,32.590000,32.000000,32.330002,32.330002,116700,OIL


In [8]:
data = df['adjclose'].values

In [9]:
data

array([19.05999947, 19.62000084, 19.54999924, 19.53000069, 18.82999992,
       18.82999992, 18.54000092, 18.25      , 18.37999916, 18.37999916,
       18.36000061, 18.43000031, 18.34000015, 17.88999939, 17.55999947,
       17.17000008, 16.75      , 16.95000076, 16.86000061, 16.76000023,
       16.46999931, 16.03000069, 15.94999981, 16.36000061, 16.39999962,
       16.20999908, 15.94999981, 16.11000061, 16.55999947, 16.59000015,
       16.71999931, 16.73999977, 17.03000069, 17.10000038, 16.92000008,
       16.31999969, 15.81999969, 15.51000023, 14.97000027, 14.42000008,
       15.07999992, 15.03999996, 15.05000019, 14.67000008, 13.39999962,
       10.31999969, 11.31000042, 10.93999958, 10.43000031, 10.92000008,
        9.73999977,  9.35000038,  8.35999966,  9.26000023,  8.92000008,
        8.15999985,  9.11999989,  9.59000015,  9.40999985,  9.39999962,
        9.40999985,  9.35000038,  9.        ,  9.80000019, 10.22999954,
       10.38000011, 10.51000023, 10.40999985, 10.39999962, 10.80

In [19]:
# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

[32.38161016]


In [20]:
# next day predicted change is
forecast_percent_change = ((yhat/data[-1])-1)*100.0
forecast_percent_change

array([0.15962983])

# Autoregressive Integrated Moving Average (ARIMA)

The Autoregressive Integrated Moving Average (ARIMA) method models the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps.

It combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of the sequence to make the sequence stationary, called integration (I).

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function, e.g. ARIMA(p, d, q). An ARIMA model can also be used to develop AR, MA, and ARMA models.

The method is suitable for univariate time series with trend and without seasonal components

In [21]:
# ARIMA example
from statsmodels.tsa.arima.model import ARIMA
# fit model
model = ARIMA(data, order=(1, 1, 1))
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data), typ='levels')
print(yhat)

[32.2853601]


In [22]:
# next day predicted change is
forecast_percent_change = ((yhat/data[-1])-1)*100.0
forecast_percent_change

array([-0.13808144])

# Seasonal Autoregressive Integrated Moving-Average (SARIMA)

The Seasonal Autoregressive Integrated Moving Average (SARIMA) method models the next step in the sequence as a linear function of the differenced observations, errors, differenced seasonal observations, and seasonal errors at prior time steps.

It combines the ARIMA model with the ability to perform the same autoregression, differencing, and moving average modeling at the seasonal level.

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the number of time steps in each season (the seasonal period). A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models.

The method is suitable for univariate time series with trend and/or seasonal components.

In [24]:
# SARIMA example
from statsmodels.tsa.statespace.sarimax import SARIMAX
# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

[32.2853601]


In [25]:
# next day predicted change is
forecast_percent_change = ((yhat/data[-1])-1)*100.0
forecast_percent_change

array([-0.13808144])

# Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)

The Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX) is an extension of the SARIMA model that also includes the modeling of exogenous variables.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX, ARMAX, and ARIMAX.

The method is suitable for univariate time series with trend and/or seasonal components and exogenous variables.

In [26]:
# Set the desired stock ticker symbol
TICKER = 'SPY'
# Set a start for historical analysis, end date is current day
startDate = '1/1/2020'
# populate a dataframe with historical data
df = si.get_data(TICKER, start_date = startDate)
data2 = df['adjclose'].values

In [28]:
# fit model
model = SARIMAX(data, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
# is this a good assumption?
exog2 = data2[-1]
yhat = model_fit.predict(len(data), len(data), exog=[exog2])
print(yhat)

[32.41327384]


In [29]:
# next day predicted change is
forecast_percent_change = ((yhat/data[-1])-1)*100.0
forecast_percent_change

array([0.25756882])

In [30]:
# what is we use SARMIA for data2 prediction?
model = SARIMAX(data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
yhat2 = model_fit.predict(len(data2), len(data2))
print(yhat2)

[444.69947377]


In [31]:
# fit model with updated data2 prediction
model = SARIMAX(data, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
# is this a good assumption?
exog2 = yhat2
yhat = model_fit.predict(len(data), len(data), exog=[exog2])
print(yhat)

[32.41876383]


In [32]:
# next day predicted change is
forecast_percent_change = ((yhat/data[-1])-1)*100.0
forecast_percent_change

array([0.27454992])

# Vector Autoregression (VAR)

The Vector Autoregression (VAR) method models the next step in each time series using an AR model. It is the generalization of AR to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p).

The method is suitable for multivariate time series without trend and seasonal components.

In [33]:
# VAR example
from statsmodels.tsa.vector_ar.var_model import VAR

In [59]:
# for a second time series data_Set, let's use USO
# Set the desired stock ticker symbol
TICKER = 'USO'
# Set a start for historical analysis, end date is current day
startDate = '1/1/2020'
# populate a dataframe with historical data
df = si.get_data(TICKER, start_date = startDate)
data3 = df['adjclose'].values

In [60]:
multivariate_data = list()
for i in range(len(data)):
    v1 = data[i]
    v2 = data3[i]
    row = [v1, v2]
    multivariate_data.append(row)

In [61]:
# fit model
model = VAR(multivariate_data)
model_fit = model.fit()
# make prediction
yhat = model_fit.forecast(model_fit.endog, steps=1)
print(yhat)

[[32.38460865 74.89542445]]


In [62]:
# next day predicted change is
forecast_percent_change = ((yhat[0][0]/data[-1])-1)*100.0
forecast_percent_change

0.1689044865493372

# Vector Autoregression Moving-Average (VARMA)

The Vector Autoregression Moving-Average (VARMA) method models the next step in each time series using an ARMA model. It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA function, e.g. VARMA(p, q). A VARMA model can also be used to develop VAR or VMA models.

The method is suitable for multivariate time series without trend and seasonal components.

In [63]:
from statsmodels.tsa.statespace.varmax import VARMAX
# fit model
model = VARMAX(multivariate_data, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.forecast()
print(yhat)

  warn('Estimation of VARMA(p,q) models is not generically robust,'


[[32.38076478 74.9642516 ]]


In [64]:
# next day predicted change is
forecast_percent_change = ((yhat[0][0]/data[-1])-1)*100.0
forecast_percent_change

0.15701499866234148

# Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)

The Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that also includes the modeling of exogenous variables. It is a multivariate version of the ARMAX method.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and VMAX.

The method is suitable for multivariate time series without trend and seasonal components with exogenous variables.

In [65]:
data_exog = data2

In [66]:
# fit model
model = VARMAX(multivariate_data, exog=data_exog, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
data_exog2 = yhat2
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)

  warn('Estimation of VARMA(p,q) models is not generically robust,'


[[32.27984945 74.7611296 ]]


  warn('Estimation of VARMA(p,q) models is not generically robust,'
  warn('Estimation of VARMA(p,q) models is not generically robust,'


In [67]:
# next day predicted change is
forecast_percent_change = ((yhat[0][0]/data[-1])-1)*100.0
forecast_percent_change

-0.15512645241271272

# Simple Exponential Smoothing (SES)

The Simple Exponential Smoothing (SES) method models the next time step as an exponentially weighted linear function of observations at prior time steps.

The method is suitable for univariate time series without trend and seasonal components.

In [69]:
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
# fit model
model = SimpleExpSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

[32.33000183]


In [71]:
# next day predicted change is
forecast_percent_change = ((yhat[0]/data[-1])-1)*100.0
forecast_percent_change

0.0

# Holt Winter’s Exponential Smoothing (HWES)

The Holt Winter’s Exponential Smoothing (HWES) also called the Triple Exponential Smoothing method models the next time step as an exponentially weighted linear function of observations at prior time steps, taking trends and seasonality into account.

The method is suitable for univariate time series with trend and/or seasonal components

In [73]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# fit model
model = ExponentialSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

[32.33000183]


In [74]:
# next day predicted change is
forecast_percent_change = ((yhat[0]/data[-1])-1)*100.0
forecast_percent_change

-2.220446049250313e-14