<a href="https://colab.research.google.com/github/ngupta23/medium_articles/blob/main/time_series/pycaret/pycaret_ts_differences.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
def what_is_installed():
    import pycaret
    from pycaret import show_versions
    show_versions()

try:
    what_is_installed()
except:
    !pip install pycaret-ts-alpha
    what_is_installed()


System:
    python: 3.7.12 (default, Jan 15 2022, 18:48:18)  [GCC 7.5.0]
executable: /usr/bin/python3
   machine: Linux-5.4.144+-x86_64-with-Ubuntu-18.04-bionic

Python dependencies:
          pip: 21.1.3
   setuptools: 57.4.0
      pycaret: 3.0.0
      sklearn: 1.0.2
       sktime: 0.10.1
  statsmodels: 0.12.2
        numpy: 1.21.5
        scipy: 1.7.3
       pandas: 1.3.5
   matplotlib: 3.2.2
       plotly: 5.5.0
       joblib: 1.0.1
        numba: 0.55.1
       mlflow: 1.23.1
     lightgbm: 3.3.2
      xgboost: 0.90
     pmdarima: 1.8.5
        tbats: Installed but version unavailable
      prophet: Not installed
      tsfresh: Not installed


In [2]:
import numpy as np
import pandas as pd
from pycaret.datasets import get_data
from pycaret.time_series import TSForecastingExperiment

  defaults = yaml.load(f)


In [3]:
data = get_data("airline")

Period
1949-01    112.0
1949-02    118.0
1949-03    132.0
1949-04    129.0
1949-05    121.0
Freq: M, Name: Number of airline passengers, dtype: float64

In [4]:
#### Create Time Series Forecasting Experiment ----
exp = TSForecastingExperiment()
global_plot_settings = {"renderer": "colab"}
exp.setup(data=data, fh=12, fig_kwargs=global_plot_settings, session_id=42)

Unnamed: 0,Description,Value
0,session_id,42
1,Target,Number of airline passengers
2,Original Data,"(144, 1)"
3,Missing Values,False
4,Approach,Univariate
5,Exogenous Variables,Not Present
6,Transformed Train Target,"(132,)"
7,Transformed Test Target,"(12,)"
8,Transformed Train Exogenous,"(132, 0)"
9,Transformed Test Exogenous,"(12, 0)"


<pycaret.internal.pycaret_experiment.time_series_experiment.TSForecastingExperiment at 0x7f1d6ac2f1d0>

In [5]:
exp.plot_model()

In [6]:
exp.plot_model(
    plot="diff",
    data_kwargs={"lags_list":[1, 12, [1, 12]], "acf": True, "pacf": True, "periodogram": True},
    fig_kwargs={"height": 800, "width": 1600}
)

**Observations**

1. Original Data (row 1) shows extended autocorrelations and a seasonal effect of 12 (ACF peaking at lag = 12, 24, etc).
2. Removing the trend by diffeencing (row 2), still leaves the seasonal term (ACF peaking at lag = 12, 24 etc.)
3. Removing the seasonal term by differencing with lag = 12 still leaves the extended autocorrelations in the ACF
4. Applying both difference with lag = 1 and 12 models the time series to a large extent. There may still be a dependence at lag = 1 (as incicated by the PACF) and some other autoregressive properties as indicated by high values in the periodogram, but rest of the characteristics are captured very well by this process.

## Now lets model this

In [7]:
# order=(0,1,0) for difference at lag = 1
# seasonal_order=(0,1,0,12) for difference at lag = 12
model1 = exp.create_model("arima", order=(0,1,0), seasonal_order=(0,1,0,12))
exp.plot_model(model1)


Unnamed: 0,cutoff,MAE,RMSE,MAPE,SMAPE,MASE,RMSSE,R2
0,1956-12,11.245,14.3538,0.028,0.0285,0.3851,0.4385,0.9329
1,1957-12,20.2184,22.5504,0.0561,0.0541,0.6613,0.6642,0.8668
2,1958-12,46.4548,48.4249,0.1085,0.1153,1.6258,1.4897,0.4754
Mean,NaT,25.9727,28.443,0.0642,0.066,0.8907,0.8642,0.7584
SD,NaT,14.9392,14.5202,0.0334,0.0364,0.5319,0.4518,0.2019


In [8]:
# Lets look at the residuals ----
exp.plot_model(model1, plot="diagnostics", fig_kwargs={"height": 800, "width": 1200})
exp.check_stats(model1, test="white_noise")

Unnamed: 0,Test,Test Name,Data,Property,Setting,Value
0,White Noise,Ljung-Box,Residual,Test Statictic,"{'alpha': 0.05, 'K': 24}",33.546954
1,White Noise,Ljung-Box,Residual,Test Statictic,"{'alpha': 0.05, 'K': 48}",56.709833
2,White Noise,Ljung-Box,Residual,p-value,"{'alpha': 0.05, 'K': 24}",0.093049
3,White Noise,Ljung-Box,Residual,p-value,"{'alpha': 0.05, 'K': 48}",0.182105
4,White Noise,Ljung-Box,Residual,White Noise,"{'alpha': 0.05, 'K': 24}",True
5,White Noise,Ljung-Box,Residual,White Noise,"{'alpha': 0.05, 'K': 48}",True


This is a decent model (residuals are white noise), but we have not incorporated any remainder autoregressive properties as identified by the last row in the difference plots. Let's do that. On closer examination of the residual ACF plot, we  noticed that the ACF at lag = 1 is still significant and negative and overall the ACF shows an oscillatory behavior (one lag positive, the next lag negative and so on). Also, the residual time series shows oscillatory behavior (when one time point is positive, the next one tends to be negative) as well. This can be  characteristic of a AR term with single negative phi value. Let's add this to the model.

In [9]:
# order=(1,1,0) for lag = 1 + p = 1
# seasonal_order=(0,1,0,12) for lag = 12
model2 = exp.create_model("arima", order=(1,1,0), seasonal_order=(0,1,0,12))

Unnamed: 0,cutoff,MAE,RMSE,MAPE,SMAPE,MASE,RMSSE,R2
0,1956-12,10.3216,13.4315,0.0255,0.026,0.3535,0.4103,0.9413
1,1957-12,20.9235,23.2653,0.0581,0.056,0.6844,0.6853,0.8582
2,1958-12,45.685,47.6955,0.1066,0.1132,1.5988,1.4673,0.4911
Mean,NaT,25.6434,28.1308,0.0634,0.0651,0.8789,0.8543,0.7635
SD,NaT,14.8178,14.4051,0.0333,0.0362,0.5267,0.4477,0.1956


In [10]:
# Lets look at the residuals ----
exp.plot_model(model2, plot="diagnostics", fig_kwargs={"height": 800, "width": 1200})
exp.check_stats(model2, test="white_noise")

Unnamed: 0,Test,Test Name,Data,Property,Setting,Value
0,White Noise,Ljung-Box,Residual,Test Statictic,"{'alpha': 0.05, 'K': 24}",21.299935
1,White Noise,Ljung-Box,Residual,Test Statictic,"{'alpha': 0.05, 'K': 48}",43.239497
2,White Noise,Ljung-Box,Residual,p-value,"{'alpha': 0.05, 'K': 24}",0.620974
3,White Noise,Ljung-Box,Residual,p-value,"{'alpha': 0.05, 'K': 48}",0.667944
4,White Noise,Ljung-Box,Residual,White Noise,"{'alpha': 0.05, 'K': 24}",True
5,White Noise,Ljung-Box,Residual,White Noise,"{'alpha': 0.05, 'K': 48}",True


**We seem to have taken care of the oscillatory behavior to some extend and the metrics such as MASE have also improved slightly. White noise characterictics of the residuals also looks better than before (higher p-value than before). We can decide to stop here in the modeling process or try to extract the remainder autoregressive terms as well (keeping in mind that a model with more autogergrssive terms may begin to overfit).**