# Practice #4. "Dynamic predictive models. Part 2"

This notebook is dedicated to:
* Predicting Time series: Moving Average Model

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.ar_model import AutoReg
from scipy.stats import boxcox

import warnings
warnings.filterwarnings('ignore')

## 0. Data reading and visualization

Please, specify path to data

In [None]:
path_to_datafile = "../data/airline-passengers.csv"

In [None]:
# data reading to pandas.DataFrame
df = pd.read_csv(path_to_datafile)

Please, rename time column to `ds` and data column to `y`(you can use `df.rename`) . If use dataset with multiple features select only one and drop NaN values

In [None]:
# your code here
df.rename(columns={"Month": "ds", "Passengers": "y"}, inplace=True)


Convert date column to datetime format and set as index

In [None]:
df["ds"] = pd.to_datetime(df["ds"])
df.set_index("ds", inplace=True)

Number of data data points:

In [None]:
df.shape[0]

Print slice of the timeseries:

In [None]:
df.head()

Let's plot the data

In [None]:
plt.figure(figsize=(20, 5))
plt.ylabel("y")
plt.xlabel("ds")
plt.plot(df);

## 1. Predicting Time series: Moving Average Model

The residual errors from forecasts on a time series provide another source of information that we can model. Residual errors themselves from a time series that can have a temporal structure. A simple autoregression model of this structure can be used to predict the forecast error, which in turn can be used to correct forecasts. This type of model is called a moving average model, the same name but very different from moving average smoothing. The difference between what was ground truth and what was predicted is called the residual error.<br>
Just like the input observations themselves, the residual errors from a time series can have a temporal structure like trends, bias, and seasonality. Any temporal structure in the time series of residual forecast errors is useful as a diagnostic as it suggests information that could be incorporated into the predictive model. An ideal model would leave no structure in the residual error, just random fluctuations that cannot be modeled.
Structure in the residual error can also be modeled directly. There may be complex signals in the residual error that are difficult to directly incorporate into the model. Instead, you can create a model of the residual error time series and predict the expected error for your model. The predicted error can then be subtracted from the model prediction and in turn, provide an additional lift in performance.

Split data on train and test set.

In [None]:
# your code here
# df_train = ...
# df_test = ...

Please, calculate residual error using AR model and train set.

In [None]:
#your code here
# df_forecast = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df_train, label='train')
plt.plot(df_forecast, label='firecast')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Please, train AR model for residual error

In [None]:
# youre code here
# re_forecast = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(residual_error, label='residual error')
plt.plot(re_forecast, label='modeled residual error')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Now correct predictions with a AR model of residuals. With a good estimate of forecast error at a time step, we can make better predictions. For example, we can add the expected forecast error to a prediction to correct it and in turn improve the skill of the model.
$$\textit{improved forecast = forecast + estimated error}$$

In [None]:
# your code here
# corrected_forecast = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df_train, label='train')
plt.plot(df_test, label='test')
plt.plot(corrected_forecast, label='corrected forecast')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Moved Average model RMSE:

In [None]:
np.sqrt(mean_squared_error(df_test['y'], corrected_forecast))