NOTE: Will need to run in Google Colab!

# Time Series: Seasonal ARIMA and AutoARIMA

(and maybe [SKTime](https://www.sktime.org/en/stable/index.html)!)

In [None]:
!pip install sktime
!pip install pmdarima

### Data Set Up

Airline Passenger Data: https://www.kaggle.com/rakannimer/air-passengers

(it's a pretty common dataset, available in several different places, but here's a source - we'll just load it up straight from SKTime!)

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

In [None]:
from sktime.datasets import load_airline
df = pd.DataFrame(load_airline())

In [None]:
df.head()

In [None]:
df.info()

In [None]:
# Decompose
from statsmodels.tsa.seasonal import seasonal_decompose

df.index=df.index.to_timestamp()

decomp = seasonal_decompose(df)
decomp.plot();

In [None]:
# Train test split - test set will be the last 3 years
train, test = train_test_split(df, test_size = 36, shuffle=False)

In [None]:
# Let's visualize our train and test sets
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(train, label='train')
ax.plot(test, label='test')
ax.set_title('Train-Test Split');
plt.legend();

### Where We Left Off

In [None]:
# Import ARIMA
from statsmodels.tsa.arima.model import ARIMA

In [None]:
# From our data exploration earlier, what do we want to use as params?
# Order will be (p,d,q)
arima_order = (1, 1, 1) # should be a tuple, like (0,0,0)

In [None]:
# Now let's fit our ARIMA model
result = ARIMA(train, order=arima_order).fit()

In [None]:
# Since it's statsmodels, we have a summary we can explore
result.summary()

In [None]:
# We can also make predictions using .forecast
# Steps will be the length of our test data
test_preds = result.forecast(steps=len(test))

In [None]:
# How can we visualize our predictions?
plt.plot(train, label = 'Train')
plt.plot(test, label = 'Test')
plt.plot(test_preds, label = 'Test Preds')
plt.legend();

In [None]:
# To plot with a confidence interval - use get_forecast instead
fcast = result.get_forecast(steps=len(test)).summary_frame()
fig, ax = plt.subplots()

ax.plot(train, label = 'Train')
ax.plot(test, label = 'Test')
ax.plot(fcast['mean'], label = 'Test Preds')
ax.fill_between(fcast.index, fcast['mean_ci_lower'], fcast['mean_ci_upper'], color='k', alpha=0.1);
ax.legend();

Why do these predictions level out? Because they're not able to build upon the previous month! 

### Seasonal!

A reminder sbout order terms!

> ARIMA models are made up of three different terms:
> 
> p: The order of the auto-regressive (AR) model (i.e., the number of lag observations). A time series is considered AR when previous values in the time series are very predictive of later values. An AR process will show a very gradual decrease in the ACF plot.
> 
> d: The degree of differencing.
> 
> q: The order of the moving average (MA) model. This is essentially the size of the “window” function over your time series data. An MA process is a linear combination of past errors.

PLUS:

> Seasonal ARIMA models have three parameters that heavily resemble our p, d and q parameters:
> 
> P: The order of the seasonal component for the auto-regressive (AR) model.
> 
> D: The integration order of the seasonal process.
> 
> Q: The order of the seasonal component of the moving average (MA) model.
> 
> P and Q and be estimated similarly to p and q via auto_arima, and D can be estimated via a Canova-Hansen test, however m generally requires subject matter knowledge of the data.

Source: https://alkaline-ml.com/pmdarima/tips_and_tricks.html#understand-p-d-and-q

There's also an `s` term in the Seasonal order (called out as `m` above), where you set the periodicity.

In [None]:
# Add seasonal order to our ARIMA
seas_arima = ARIMA(train,
                   order=(0,0,0),
                   seasonal_order=(1, 1, 1, 12))
res_sarima = seas_arima.fit()

# Print out summary information on the fit
print(res_sarima.summary())

In [None]:
# Visualize it!
plt.figure(figsize=(10,6))
plt.plot(train, color='blue', label='actual train')
plt.plot(test, color='orange', label='actual test')
plt.plot(res_sarima.forecast(steps = len(test)), color='green', label='predicted test')
plt.legend();

#### Discuss:

- 


### PMDArima - Using their Auto ARIMA! 

Basically, grid search for ARIMA

In [None]:
# More imports
import pmdarima as pm
from pmdarima import model_selection
from pmdarima.utils import decomposed_plot
from pmdarima.arima import decompose, auto_arima

In [None]:
# Train test split - but now using PMDArima's function
train, test = model_selection.train_test_split(df, test_size=36)

In [None]:
train.tail()

In [None]:
# checking stationarity
from pmdarima.arima.stationarity import ADFTest

# beyond statsmodels
adf_test = ADFTest(alpha=0.05)
p_val, should_diff = adf_test.should_diff(df)  # (0.01, False)

print(f"P-Value: {p_val}, so should you difference the data? {should_diff}")

Documentation: http://alkaline-ml.com/pmdarima/modules/generated/pmdarima.arima.AutoARIMA.html#pmdarima.arima.AutoARIMA

Let's decide our parameters!

Be sure to set `trace=True`

In [None]:
# time to model!
arima = auto_arima(train,
    start_p=1,
    d=None,
    start_1=1,
    trace=True,
    m=12,
    seasonal=True) 

In [None]:
# check the output summary
arima.summary()

In [None]:
test_preds = pd.Series(arima.predict(n_periods=len(test)), index=test.index)

In [None]:
# Plot actual test vs. forecasts:
plt.plot(train, label = 'Train')
plt.plot(test, label = 'Test')
plt.plot(test_preds, label = 'Test Preds')
plt.legend();

Thoughts?

- 


In [None]:
train

In [None]:
# Fit the best fit model from grid search SARIMAX(1, 1, 0)x(0, 1, 0, 12)
seas_arima = ARIMA(train,
                   order=(1, 1, 0),
                   seasonal_order=(0, 1, 0, 12))
res_sarima = seas_arima.fit()

# Print out summary information on the fit
print(res_sarima.summary())

More gems at the end for those digging back into this notebook:

`pmdarima` has a set of tips and tricks: https://alkaline-ml.com/pmdarima/tips_and_tricks.html

Also:

- https://towardsdatascience.com/time-series-forecasting-using-auto-arima-in-python-bb83e49210cd
- https://machinelearningmastery.com/develop-arch-and-garch-models-for-time-series-forecasting-in-python/

And, what I'm really looking into right now:
- https://towardsdatascience.com/sktime-a-unified-python-library-for-time-series-machine-learning-3c103c139a55