# Extrapolating with a TS Model

This notebook is meant to explore the long-run behavior of fitted time series models when extrapolating into the future, or "forecasting". We distinguish a "fitted" model from the underlying statistical process where you are not introducing new noise and simply extrapolating the deterministic component of the model. The limiting behavior of the processes is related to whether it is stationary.

This notebook adds mathematical clarity to the other notebook on forecasting with an ARX.

In [None]:
# Set up environment
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima_process import ArmaProcess
from ts_tools import sim_arma, plot_ts, plt_acf

## Pure MA Process

### Stationarity

The pure MA processes with a constant mean is **always stationary**. An MA(q) processes is defined as:

$$
z_t = \mu+ \sum_{i=0}^q\theta_i\epsilon_{t-i}, \quad\epsilon_t \overset{\text{i.i.d.}}{\sim}  N(0,\sigma^2)
$$

Where $\theta_0 = 1$, we define it in one sum to make the autocovariance derivation easier. 

The expected value and variance are constant for the process:

$$
E[z_t] = \mu
$$

$$
\begin{align*}
Var[z_t] &= Var[\mu] + \sum_{i=0}^q Var[\theta_i\epsilon_{t-i}] \\
&= \sigma^2 \sum_{i=0}^q \theta_i^2 \\
&= \sigma^2(1+\sum_{i=1}^q \theta_i^2)
\end{align*}
$$

The autocovariance function is a bit tedious to derive, as it involves lining up the indices of the coefficient terms for appropriate lags. The ACF ends up as a function of lag $k$, not of time:

$$
\begin{align*}
Cov[z_t, z_{t-k}] &= Cov\left[\mu+ \sum_{i=1}^q\theta_i\epsilon_{t-i} +\epsilon_t, \mu+ \sum_{i=1}^q\theta_i\epsilon_{t-i-k} +\epsilon_{t-k}\right] \\ 
&= 
\begin{cases} 
    \sigma^2\sum_{i=0}^q \theta_i\theta_{i-k} &\text{ for } 0\leq k \leq q \\
    0 &\text{ for } k > q
\end{cases}
\end{align*}
$$

These properties combine to show weak stationarity (it's also strong stationary)

### Extrapolation with Fitted MA Model

We describe the behavior of a MA(1) model. The general MA(q) extends these concepts straightforwardly. Consider an MA(1) model:

$$
y_t = \mu + \theta\epsilon_{t-1}+\epsilon_t
$$

We estimate the coefficients with sample time series $y_1, \dots, y_N$. We wish to extrapolate the model forward without introducing new noise. The estimated extrapolated value one time step in the future is generated using the residual from the previous time step. The current $\epsilon_t$ value is set to zero:

$$
\begin{align*}
\hat y_{N+1} &= \hat \mu + \hat \theta \hat \epsilon_{t-1} \\
&= \hat \mu + \hat\theta (y_N - \hat y_N)
\end{align*}
$$

So we can use the residual calculated by comparing the fitted values of the model to the observed values. For the next forecasted value, we have:

$$
\begin{align*}
\hat y_{N+2} &= \hat \mu + \hat\theta \hat \epsilon_{N+1}\\
&= \hat \mu
\end{align*}
$$

But we assumed that we generated $y_{N+1}$ with zero additional noise, so the modeling error at $N+1$ is assumed to be zero. Thus, after 1 time step into the future, all of the forecasts decay to the estimated mean $\hat \mu$. For a general MA(q) process, the forecasts decay to the estimated mean after $q$ steps.

### Simulations

We simulate a mean-zero MA(3) process with known coefficients, then fit a model to estimate the coefficients, and finally extrapolate/forecast the model into the future. 

In [None]:
np.random.seed(42)
Nt = 100 
z = sim_arma(n=Nt, ma_coefs = [1, 0.8, 0.6, 0.4])
plot_ts(z, title="MA(3)")

In [None]:
plt_acf(z)

In [None]:
ma3 = ARIMA(z, order=(0, 0, 3)).fit() # MA(1)
ma3.summary().tables[1]

The ACF plot shows 3 significant non-zero lags, as expected. The fitted model coefficients are relatively close to the target values. The estimated mean is the value that the extrapolated series will converge to. When forecasting, the values decay to the mean of zero after 3 time steps

In [None]:
Nf = 30
fits = ma3.fittedvalues
preds = ma3.forecast(steps=Nf)

plt.title("Fitted MA(3) Long Run Behavior")
plt.plot(np.concatenate([fits, preds]))
plt.axvline(x = Nt, linestyle="dashed", color="k")
plt.text(Nt, 2, 'Forecast Start', rotation=90,
         verticalalignment='center', horizontalalignment='right',
         color='k')

In [None]:
Nf = 12
fits = ma3.fittedvalues
preds = ma3.forecast(steps=Nf)

fig, axs = plt.subplots(1, 2)


axs[0].plot(np.concatenate([fits, preds]))
axs[0].axvline(x = Nt, linestyle="dashed", color="k", alpha=.7)
axs[0].text(Nt, 2, 'Forecast Start', rotation=90,
         verticalalignment='center', horizontalalignment='right',
         color='k', fontsize=8, alpha=.7)

axs[1].plot(np.concatenate([preds]))
axs[1].set_ylim(-1, 1)
axs[1].axvline(x = 0, linestyle="dashed", color="k", alpha=.7)
axs[1].axvline(x = 3, linestyle="dashed", color="k", alpha=.7)
axs[1].text(3, 0.5, 'Time N+3', rotation=90,
         verticalalignment='center', horizontalalignment='right',
         color='k', fontsize=8, alpha=.7)
xticks = np.linspace(0, Nf, 5)
axs[1].set_xticks(xticks)
axs[1].set_xticklabels([str(int(tick + Nt)) for tick in xticks])

fig.suptitle("Fitted MA(3) Long Run Behavior")

## Pure AR Models

Autoregressive models can have 3 types of long range behavior when forecasted into the future with no new sources of noise:

1. Converge to a finite value (weakly stationary)
2. Diverge to +/- infinity, or oscillate to those extremes
3. A Random Gaussian Walk

These behaviors can be shown when examining the mean of the processes.

### Stationarity

We consider an AR(1) processes, but the concept extends to general AR(p) processes straightforwardly. The AR(1) process is defined as:

$$
y_t = \mu + \gamma y_{t-1} + \epsilon_t, \quad\epsilon_t \overset{\text{i.i.d.}}{\sim}  N(0,\sigma^2)
$$

The derivation of the mean involves an infinite recursive relationship, and then can be analyzed using the convergence properties for geometric series:

$$
\begin{align*}
    E[y_t] &= E[\mu + \gamma y_{t-1}+\epsilon_t] \\ 
    &= \mu + \gamma E[y_{t-1}] \\
    & = \mu + \gamma \mu + \gamma E[y_{t-2}] \\
    \vdots\\
    & = \mu \left(\sum_{j=0}^\infty \gamma^j\right)
\end{align*}
$$

For $|\gamma|<1$, the series converges, and the mean of the process is the constant value:

$$
E[y_t] = \frac{\mu}{1-\gamma}, \quad |\gamma|<1
$$

For $|\gamma|>1$, the series diverges to plus or minus infinity, possible with oscillatory behavior if $\gamma$ is negative. 

For $|\gamma|=1$ the series also does not converge, so the mean is undefined. 


For completion, we will show the weak-stationarity conditions for the situation where $|\gamma|<1$. The variance ends up constant and is derived similarly as the mean:

$$
\begin{align*}
    V[y_t] &= V[\mu + \gamma y_{t-1}+\epsilon_t] \\ 
    &= \sigma^2 + \gamma^2 V[y_{t-1}] \\
    & = \sigma^2 + \gamma \sigma^2 + \gamma V[y_{t-2}] \\
    \vdots\\
    & = \sigma^2 \left(\sum_{j=0}^\infty (\gamma^2)^j\right)
\end{align*}
$$

This series converges for $|\gamma^2|<1$, or simply $|\gamma|<1$ as before, and the variance is the constant value:

$$
V[y_t] = \frac{\sigma^2}{1-\gamma^2}, \quad |\gamma|<1
$$

Similarly, the autocovariance ends up only as a function of lag. We show the derivation for a couple of timesteps and then deduce the pattern:

$$
\begin{align*}
    Cov[y_t, y_{t-1}] &= Cov[\mu + \gamma y_{t-1}+\epsilon_t, y_{t-1} \\
    & = \gamma Cov[y_{t-1}, y_{t-1}] \\
    & = \gamma Var[y_t]\\
    & = \gamma \frac{\sigma^2}{1-\gamma^2}, \quad |\gamma|<1
\end{align*}
$$

$$
\begin{align*}
    Cov[y_t, y_{t-2}] &= Cov[\mu + \gamma y_{t-1}+\epsilon_t, y_{t-2} \\
    & = Cov[\mu + \gamma^2 y_{t-2}+..., y_{t-2}
    & = \gamma^2 Cov[y_{t-2}, y_{t-2}] \\
    & = \gamma^2 Var[y_t]\\
    & = \gamma^2 \frac{\sigma^2}{1-\gamma^2}, \quad |\gamma|<1
\end{align*}
$$

And,
$$
Cov(y_t, y_{t-k})= \gamma^k \frac{\sigma^2}{1-\gamma^2}, \quad |\gamma|<1
$$

These properties combined show that the AR(1) process is stationary for coefficient $|\gamma|<1$

### Case 1: Stationary Process Converges to Equilibrium

We simulate a mean zero AR(1) with autoregressive coefficient less than one in absolute terms. The forecast decays to zero at a rate relative to the absolute value of the coefficient. 

In [None]:
np.random.seed(42)
Nt = 100 
z = sim_arma(n=Nt, ar_coefs = [1, 0.8])
plot_ts(z, title="AR(1), $y_t = -0.8y_{t-1}$")

In [None]:
plt_acf(z)

In [None]:
ar1 = ARIMA(z, order=(1, 0, 0)).fit() # AR(1)
ar1.summary().tables[1]

In [None]:
Nf = 20
fits = ar1.fittedvalues
preds = ar1.forecast(steps=Nf)

fig, axs = plt.subplots(1, 2)


axs[0].plot(np.concatenate([fits, preds]))
axs[0].axvline(x = Nt, linestyle="dashed", color="k", alpha=.7)
axs[0].text(Nt, 2, 'Forecast Start', rotation=90,
         verticalalignment='center', horizontalalignment='right',
         color='k', fontsize=8, alpha=.7)

axs[1].plot(np.concatenate([preds]))
axs[1].set_ylim(-1, 1)
axs[1].axvline(x = 0, linestyle="dashed", color="k", alpha=.7)
xticks = np.linspace(0, Nf, 6)
axs[1].set_xticks(xticks)
axs[1].set_xticklabels([str(int(tick + Nt)) for tick in xticks])

fig.suptitle("Fitted AR(1) Long Run Behavior")

### Case 2: Diverges to +/- Infinity

We simulate a mean zero AR(1) with autoregressive coefficient greater than one in absolute terms. The simulations explode to +/- infinity, oscillating for the negative coefficient

In [None]:
np.random.seed(42)
Nt = 30
z1 = sim_arma(n=Nt, ar_coefs = [1, -1.5])
z2 = sim_arma(n=Nt, ar_coefs = [1, 1.5])

In [None]:
plot_ts(z1, title="AR(1), $y_t = 1.5y_{t-1}$")

In [None]:
plot_ts(z2, title="AR(1), $y_t = -1.5y_{t-1}$")

### Case 3: Gaussian Random Walk

With $|\gamma|=1$, the process is:

$$
y_t = y_{t-1} + \epsilon_t, \quad \epsilon_t \sim N(0,\sigma^2)
$$

This is a type of statistical random walk, where at each time you step up or down by an amount determined by a sample from a Gaussian random variable. This can also be thought of as the discrete time approximation of Brownian Motion. 

We can analyze the mean and variance of the processes starting from an initial value of $y_0=0$. Note: this was possible to do in the other cases, but less informative. The mean of the process ends up being the initial value, and the variance grows with time:

$$
\begin{align*}
    E[y_t] &= E[y_{t-1}+\epsilon_t]\\
    &= E[y_{t-2}+\epsilon_{t-1}] \\
    & \vdots \\
    &= E[y_0] = 0
\end{align*}
$$

$$
\begin{align*}
    V[y_t] &= V[y_{t-1}+\epsilon_t]\\
    &= V[t-1]+\sigma^2\\
    &= V[t-2]+2\sigma^2\\
    & \vdots \\
    &= t\sigma^2
\end{align*}
$$

We will simulate several realizations with $\gamma=1$, starting from the same initial state of zero. Gaussian Random walks are strange. It can be shown that a Gaussian Random walk is *recurrent*, in that it will revisit its starting location infinitely many times, but the expected recurrence time is infinite.

In [None]:
np.random.seed(42)
Nt = 100
nsims=10

for i in range(0, nsims):
    z = sim_arma(n=Nt, ar_coefs = [1, -1])
    plt.plot(z)

plt.grid()
plt.title("Gaussian Random Walk, $y_t = y_{t-1}+\\epsilon_t$")