# Forecasting: ARIMA

From [Investopedia](http://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp):
    
> A statistical analysis model that uses time series data to predict future trends. It is a form of regression analysis that seeks to predict future movements along the seemingly random walk taken by stocks and the financial market by examining the differences between values in the series instead of using the actual data values. Lags of the differenced series are referred to as "autoregressive" and lags within forecasted data are referred to as "moving average."

### Data Preparation

We will use historical data of CO<sub>2</sub> levels, included in `statsmodel` package.

In [1]:
import pandas as pd
import statsmodels.api as sm
raw = sm.datasets.co2.load()
index = pd.DatetimeIndex(start=raw.data['date'][0].decode('utf-8'), periods=len(raw.data), freq='W-SAT')
dataset = pd.DataFrame(raw.data['co2'], index=index, columns=['co2'])
dataset.head()

ModuleNotFoundError: No module named 'statsmodels'

The series have some missing values, we should compute monthly averages first:

In [None]:
monthly_avg = dataset.co2.resample('MS').mean()
monthly_avg[:5]

In [None]:
data = monthly_avg.fillna(monthly_avg.bfill())
data[:5]

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib
matplotlib.rcParams['font.family'] = "SF Mono"

In [None]:
data.plot()

## Model parameters

ARIMA is a model governed by 3 parameters:

- `p`, to consider past values in our model.
- `d`, to consider latest differences when asserting current state.
- `q`, to consider errors in the model.

It's easy to observe that C0<sub>2</sub> values follow a _seasonality_ pattern, probably different in summer and winter. To deal with _seasonal_ effects we add another 3 components to the model `(P, D, Q)` that would be only applied to the seasonal component of the time series. Also, there's a 4th parameter called `s` to define the periodicity of the time series:

```
ARIMA(p,d,q)(P,D,Q)s
```

Big question is, what are the values we have to pick for each one of the parameters? For this exercise we'll magicaly pick the parameters, you might check [this R functions](https://www.rdocumentation.org/packages/forecast/versions/7.3/topics/auto.arima) to perform a proper auto-evaluation:

```
ARIMA(1,1,1)(1,1,1)12
```

That model can be fitted using `SARIMAX` function from `statsmodel` package.

In [None]:
model = sm.tsa.statespace.SARIMAX(data, 
    order=(1, 1, 1), seasonal_order=(1, 1, 1, 12), 
    enforce_stationarity=False, enforce_invertibility=False)

result = model.fit()

The important value here is the `P > |z|` one.

In [None]:
result.summary().tables[1]

The fitting results can be also plotted:

In [None]:
import matplotlib.pyplot as plt
result.plot_diagnostics(figsize=(10, 8))
plt.show()

## Predicting Values

Now the fun part, can we use the computed model to predict the future?

Let's see if we can _predict past past values_ first.

In [None]:
import pandas as pd
past_prediction = result.get_prediction(start=pd.to_datetime('1998-01-01'), dynamic=False)

data['1990':].plot(label="observed")
past_prediction.predicted_mean.plot(label="forecast", linewidth=3, alpha=0.55)
plt.legend()

The function to predict future values is called `get_forecast()`.

In [None]:
prediction = result.get_forecast(steps=300)
prediction.predicted_mean[:3]

In [None]:
prediction.summary_frame()[:3]

We will add the confident interval provided by `conf_int()` to the graph. Intuitivelly the prediction is _weaker_ as we evolve in time.

In [None]:
ci = prediction.conf_int()
ci[:3]

In [None]:
ax = data.plot(label='observed')
prediction.predicted_mean.plot(ax=ax, label='forecast')
ax.fill_between(ci.index, ci.iloc[:, 0], ci.iloc[:, 1], color='k', alpha=.25)

plt.legend()

**Conclusion**: leave your car at home and buy a bicycle.

---

_This notebook was adapted from the work made by [Thomas Vicent](https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3) under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0) license and therefore it's licensed under the same terms._