# Kats 205 Forecasting with Global Model

This tutorial will introduce how to use the global model in Kats.  The global model is a new and powerful forecasting method that combines exponential smoothing models with recurrent neural networks, resulting in higher accuracy than other approaches. The table of contents for Kats 205 is as follows:

1. Overview of global model for forecasting  
2. Building your own global model/global ensemble from scratch  
    2.1 Introduction to `GMParam`  
    2.2 Forecasting using a single global model with `GMModel`  
    2.3 Forecasting using a global model ensemble with `GMEnsemble`  
    2.4 Backtesting with `GMBacktester`  
3. Using pretrained global model/global ensemble  

**Note:** We provide two types of tutorial notebooks
- **Kats 101**, basic data structure and functionalities in Kats 
- **Kats 20x**, advanced topics, including advanced forecasting techniques, advanced detection algorithms, `TsFeatures`, meta-learning, global model etc. 

## 1. Overview of global model for forecasting

The Global Model, henceforth abbreviated as GM, is a powerful forecasting model that effectively combines exponential smoothing models with LSTM neural networks in a way that results in higher accuracy than a method that only uses pure statistics or machine learning.  The [original model](https://www.sciencedirect.com/science/article/pii/S0169207019301153) was first proposd by Slawek Smyl and won the Computational Intelligence in Forecasting International Time Series Competition (2016) and the M4 Forecasting competition (2018).

Unlike traditional forecasting model (e.g., ARIMA or Prophet), GM is trained with a large amount of time series and can be used for forecasting any new unseen time series of the same time granularity. Its award winning performance verifies that GM is of high accuracy. Moreover, GM is generic for batch processing (i.e., generating forecasts for several time series at the same time) and enjoys suprior efficiency. 

In Kats, we build upon the original model and provide two types of GMs: RNN-GM (for short-term forecasting) and S2S-GM (for mid-term/long-term forecasting).


## 2. Building your own global model/global ensemble from scratch

The `GMModel` and `GMEnsemble` class allow you to build a single GM or an ensemble of several independent GMs (GME). The `GMParam` class encodes all the necessary configerations of a GM including NN structure, time series granularity and etc. In addition, we provide class `GMBacktester` for parameter tunning and backtesting. 

In this section, we will only display the functionality of each class (hence the models are not well-trained and may not provide good performance.)

### 2.1 Introduction to `GMParam`

A `GMParam` object carries all the necessary configerations of a GM (or a GME), and it performs basic parameter checking when initialized. Here we list several importand arguments for `GMParam`:

* **freq**: String or pd.Timedelta; The time granularity of the model (and the time series.) For example, `freq='D'` indicates a daily model.
* **model_type**: String; The name of neural network type. Should be either 'rnn' or 's2s'. Default is 'rnn'.
* **input_window**: Integer; An integer representing length of input TS of each step and it should be greater than seasonality.
* **fcst_window**: Integer; An integer representing the length of each forecast step. When `model_type='s2s'`, the loss function is computed over the sub time series of length `fcst_window*fcst_step_num`. Note that GM/GME can generate forecasts of any length regardless of `fcst_window`. 
* **seasonality**: Integer; An integer representing representing the seasonality period. When `seasonality=1`, the global model is non-seasonal. Default is 1.
* **quantile**: List of floats; A list of floats representing the forecast quantile (the first element should be 0.5 representing the mean/median value). Default value is [0.5,0.05,0.95,0.99].
* **nn_structure**: List of lists of integers; A list of lists of integers representing the neural network structure. If None, default value is [[1,3]].
* **loss_function**: String; The name of loss function, can be 'pinball' or 'adjustedpinball'.
* **gmfeature**: List of strings or string; A single or a list of feature names.

For the definition of other parameters, please see our documentation.

In [1]:
import numpy as np
import pandas as pd
import sys
import warnings
import os

warnings.simplefilter(action='ignore')
sys.path.append("../")

from kats.models.globalmodel.utils import GMParam

In [2]:
# GMParam example  -- for daily model
gmparam = GMParam(
    input_window = 35, 
    fcst_window = 31,
    seasonality = 7,
    freq = 'D',
    loss_function = 'adjustedpinball',
    nn_structure = [[1,3]],
    gmfeature = ['last_date'],
    epoch_num = 1, 
    epoch_size = 2, # use a small num just for demonstration
    gmname = "daily_default",
)

### 2.2 Forecasting using a single global model with GMModel

After initiating a `GMParam` object, you are ready to build and train a single global model. To initiate a `GMModel` object, one only needs to input the `GMParam` object.

In [3]:
from kats.models.globalmodel.model import GMModel, load_gmmodel_from_file
from kats.models.globalmodel.serialize import global_model_to_json, load_global_model_from_json
from kats.consts import TimeSeriesData

# build `GMModel` object

gm = GMModel(gmparam)

To train a `GMModel` object, we need a list or a dictionary of `TimeSeriesData` objects. We will simulate two dictionaries, one for training and one for testing, using the `get_ts` method from our test functions.

In [4]:
from kats.tests.test_globalmodel import get_ts

train_TSs = [get_ts(n*5, '2020-05-06') for n in range(20, 40)]
test_TSs = [get_ts(n*2, '2020-05-06') for n in range(40, 45)]

In [5]:
# train the model
training_info = gm.train(train_TSs)

#training_info saves the information of training process
print(training_info)

{'train_loss_monitor': [0.100397155], 'valid_loss_monitor': [{'epoch': 0}], 'valid_fcst_monitor': [], 'train_loss_val': [0.36393964290618896]}


Now we can use the trained model to generate forecasts. The input can be a `TimeSeriesData` object or a list/dictionary of TimeSeriesData objects. The returned value is a dictionary of `pd.DataFrame` objects.

In [6]:
# generate the forecasts of a batch of time series.
fcsts = gm.predict(test_TSs, steps = 3)
fcsts[3]

Unnamed: 0,fcst_quantile_0.5,fcst_quantile_0.05,fcst_quantile_0.95,fcst_quantile_0.99,time
0,-0.186097,-1.135893,-0.514409,-0.269798,2020-07-31
1,1.808438,1.561757,1.562111,1.113977,2020-08-01
2,0.159525,1.085651,-0.064262,0.549102,2020-08-02


Let's now display how to save and reload the model.

In [7]:
# save model
gm.save_model("gm_example_1.p")

# load model
gm2 = load_gmmodel_from_file("gm_example_1.p")

# remove the saved model
os.remove("gm_example_1.p")

In [8]:
fcsts2 = gm2.predict(test_TSs, steps = 3)
fcsts2[3]

Unnamed: 0,fcst_quantile_0.5,fcst_quantile_0.05,fcst_quantile_0.95,fcst_quantile_0.99,time
0,-0.186097,-1.135893,-0.514409,-0.269798,2020-07-31
1,1.808438,1.561757,1.562111,1.113977,2020-08-01
2,0.159525,1.085651,-0.064262,0.549102,2020-08-02


We also provide methods for encoding GM into a json string, and loading a model from a json string.

In [9]:
# encode model into json string
gm_str = global_model_to_json(gm)

# load model from json string
gm3 = load_global_model_from_json(gm_str)

In [10]:
fcsts3 = gm3.predict(test_TSs, steps = 3)
fcsts3[3]

Unnamed: 0,fcst_quantile_0.5,fcst_quantile_0.05,fcst_quantile_0.95,fcst_quantile_0.99,time
0,-0.186097,-1.135893,-0.514409,-0.269798,2020-07-31
1,1.808438,1.561757,1.562111,1.113977,2020-08-01
2,0.159525,1.085651,-0.064262,0.549102,2020-08-02


### 2.3 Forecasting using a single global model with GMEnsemble

You can also easily build one ensemble of several individual GMs with `GMEnsemble` class. In addition to a `GMParam` object, one also needs to specify how training data set should be splitted and how many independent `GMModel` objects. 

Here is the list of attributs:
* **gmparam**: A GMParam object; This is used for initiating each global model.
* **ensemble_type**: String; A string representing how forecasts are combined. Can be 'median' or 'mean'. Default is 'median'.
* **splits**: Integer; An positive integer representing the number of sub-datasets to be built. Default is 3.
* **overlap**: Boolean; A boolean representing whether or not sub-datasets overlap with each other or not. Default is True. For example, when `splits=3` and `overlap=True`, then each sub-dataset contains 2/3 of training data.
* **replicate**: Integer; A positive integer representing the number of global models to be trained on each sub-datasets. Default is 1.
* **multi**: Boolean; A boolean representing whether or not to use multi-processing for training and prediction. Default is False.

Note that a GMEnsemble object will build `splits*replicate` independent `GMModel` objects, and the final forecasts are aggregated from the forecasts generated from each trained `GMModel` object.

In [11]:
from kats.models.globalmodel.ensemble import GMEnsemble, load_gmensemble_from_file

# Initiate 
gme = GMEnsemble(gmparam, splits=3, overlap=True, replicate=1, multi=True)


Now we can train the `GMEnsemble` object. Note that one has the choice of setting aside a test set from the training data to measure the performance of each `GMModel` object throughout the training process.

In [12]:
gme.train(train_TSs, test_size = 0.1)

In [13]:
# the information of training process and the evaluation results on the set-aside test set are saved in attribute gm_info.
gme.gm_info

[{'train_loss_monitor': [0.09908965],
  'valid_loss_monitor': [{'epoch': 0}],
  'valid_fcst_monitor': [],
  'train_loss_val': [0.3220413774251938],
  'test_info': [      smape     sbias  exceed_0.05  exceed_0.95  exceed_0.99  step  idx  epoch
   0  1.106313 -0.095148     0.322581     0.483871     0.322581     0   14      0
   1  1.395352  0.233617     0.483871     0.322581     0.258065     0    5      0]},
 {'train_loss_monitor': [0.10728503],
  'valid_loss_monitor': [{'epoch': 0}],
  'valid_fcst_monitor': [],
  'train_loss_val': [0.20115943998098373],
  'test_info': [      smape     sbias  exceed_0.05  exceed_0.95  exceed_0.99  step  idx  epoch
   0  1.282441 -0.136447     0.387097     0.290323     0.290323     0   14      0
   1  1.451053  0.225978     0.451613     0.258065     0.354839     0    5      0]},
 {'train_loss_monitor': [0.12492487],
  'valid_loss_monitor': [{'epoch': 0}],
  'valid_fcst_monitor': [],
  'train_loss_val': [0.24984973669052124],
  'test_info': [      smape   

After training the `GMEnsemble` object, you now can use it to generate forecasts. Similar to the `GMModel` object, the input can be a `TimeSeriesData` object or a list/dictionary of TimeSeriesData objects and the returned value is a dictionary of `pd.DataFrame` objects. 

In [14]:
# generate forecasts
fcsts=gme.predict(test_TSs, steps = 3)
print(f"The generated forecasts is of type {type(fcsts)}, and it is {fcsts}.")

The generated forecasts is of type <class 'dict'>, and it is {0:    fcst_quantile_0.5  fcst_quantile_0.05  fcst_quantile_0.95  \
0           0.301588            1.079990            0.784806   
1           0.095462           -0.148101           -0.513890   
2          -1.093136           -1.584313           -1.484397   

   fcst_quantile_0.99       time  
0            0.397259 2020-07-25  
1            0.284109 2020-07-26  
2           -1.385541 2020-07-27  , 1:    fcst_quantile_0.5  fcst_quantile_0.05  fcst_quantile_0.95  \
0          -0.121881            0.656311            0.333295   
1           0.252593            0.028812           -0.301240   
2           1.575222            0.693038            0.950418   

   fcst_quantile_0.99       time  
0           -0.039196 2020-07-27  
1            0.477079 2020-07-28  
2            1.142173 2020-07-29  , 2:    fcst_quantile_0.5  fcst_quantile_0.05  fcst_quantile_0.95  \
0           0.298223            1.093175            0.885630   
1    

Similar to the `GMModel` object, you can also easily save/load and serilize the `GMEnsemble` object.

In [15]:
# save model
gme.save_model("gme_example_1.p")

# load model
gme2 = load_gmensemble_from_file("gme_example_1.p")

# remove the saved model
os.remove("gme_example_1.p")


# encode model into json string
gme.gm_info=None # Note that pd.DataFrame is not serilizable
gme_str = global_model_to_json(gme)

# load model from json string
gme3 = load_global_model_from_json(gme_str)

### 2.4 Backtesting with `GMBacktester`

A `GMBacktester` object helps evaluate the hyper-parameter setting (i.e., the `GMParam` object). Here is a list of some of the attributes:
* **data**: A list or a dictionary of `kats.consts.TimeSeriesData` objects for training and validation.
* **gmparam**: A `GMParam` object.
* **backtest_timestamp**: A list of strings or `pandas.Timestamp` objects representing the backtest timestamps. A backtest timestamp is used to split the time series into the training and testing set.
* **splits**: Integer; An positive integer representing the number of sub-datasets to be built. Default is 3.
* **overlap**: Boolean; A boolean representing whether or not sub-datasets overlap with each other or not. Default is True. For example, when `splits=3` and `overlap=True`, then each sub-dataset contains 2/3 of training data.
* **replicate**: Integer; A positive integer representing the number of global models to be trained on each sub-datasets. Default is 1.

For the full list of attributes, please see our documents.

In [16]:
from kats.models.globalmodel.backtester import GMBackTester

# initiate backtester
gbm = GMBackTester(train_TSs, gmparam, backtest_timestamp = ['2020-08-10'])

Now one can run backtesting.

In [17]:
gbm.run_backtest()

Unnamed: 0,smape,sbias,exceed_0.05,exceed_0.95,exceed_0.99,model_num,step,idx,type,backtest_ts
0,1.615404,-0.467151,0.258065,0.612903,0.548387,0.0,0,15.0,single,2020-08-10
1,1.531443,-0.40704,0.354839,0.548387,0.580645,1.0,0,15.0,single,2020-08-10
2,1.551499,-0.301163,0.322581,0.516129,0.548387,2.0,0,15.0,single,2020-08-10
3,1.604865,-0.377048,0.322581,0.548387,0.548387,,0,,ensemble,2020-08-10
4,1.368509,0.052446,0.645161,0.387097,0.032258,0.0,1,15.0,single,2020-08-10
5,1.379891,0.056302,0.709677,0.16129,0.16129,1.0,1,15.0,single,2020-08-10
6,1.421035,0.426862,0.354839,0.096774,0.096774,2.0,1,15.0,single,2020-08-10
7,1.492669,0.210611,0.645161,0.16129,0.096774,,1,,ensemble,2020-08-10
8,0.977653,-0.710323,0.419355,0.354839,0.0,0.0,2,15.0,single,2020-08-10
9,1.164694,-0.383823,0.419355,0.096774,0.064516,1.0,2,15.0,single,2020-08-10


## 3. Using pretrained global model/global ensemble

In Kats, we provide two pre-trained daily `GMEnsemble` objects (one is S2S-GME and and the other one is RNN-GME). Both of them are trained with M4 dataset. One can use them for forecasting exploration or benchmark.

In [18]:
gme_rnn = load_gmensemble_from_file("../kats/models/globalmodel/pretrained_daily_rnn.p")
gme_rnn

ERROR:root:Fail to load GMEnsemble from ../kats/models/globalmodel/pretrained_daily_rnn.p with Exception [Errno 2] No such file or directory: '../kats/models/globalmodel/pretrained_daily_rnn.p'.


ValueError: Fail to load GMEnsemble from ../kats/models/globalmodel/pretrained_daily_rnn.p with Exception [Errno 2] No such file or directory: '../kats/models/globalmodel/pretrained_daily_rnn.p'.

You can use this loaded pre-trained model to generate forecasts.

In [None]:
fcsts = gme_rnn.predict(test_TSs, steps = 3)
fcsts