# Introduction

### Purposes

This tutorial:
* provides a theoretical description of adaptive selection;
* demonstrates how to use `OnTheFlySelector` for a problem of forecasting large amount of time series.

### Background

In order to avoid copying and pasting sentences from docstrings, let us extract and format all necessary information. This is done with a class that hides too specialized code under easy-to-read names.

In [1]:
from utils_for_demo_of_on_the_fly_selector import Helper

In [2]:
Helper().why_adaptive_selection()

    Time series forecasting has a property that all observations are
    ordered. Depending on position, behavior of series can vary and so
    one method can yield better results at some moments while
    another method can outperform it at some other moments. This is
    the reason why adaptive selection is useful for many series.


To continue, read what `OnTheFlySelector` class is.

In [3]:
Helper().what_is_on_the_fly_selector()


    This class provides functionality for adaptive short-term
    forecasting based on selection from a pool of models.

    The class is designed for a case of many time series and many
    simple forecasters - if so, it is too expensive to store all
    forecasts in any place other than operating memory and it
    is better to compute them on-the-fly and then store only selected
    values.

    What about terminology, simple forecaster means a forecaster that
    has no fitting. By default, the class uses moving average,
    moving median, and exponential moving average, but you can pass
    your own simple forecaster to initialization of a new instance.

    Selection is preferred over stacking, because base forecasters are
    quite similar to each other and so they have many common mistakes.

    Advantages of adaptive on-the-fly selection are as follows:
    * It always produces sane results, abnormal over-forecasts or
      under-forecasts are impossible;
    * Each time serie

# Application

### Import Statements

In [4]:
import os
import datetime
from typing import List

import pandas as pd
from sklearn.metrics import r2_score

from forecastonishing.selection.on_the_fly_selector import OnTheFlySelector
from forecastonishing.selection.paralleling import (
    fit_selector_in_parallel,
    predict_with_selector_in_parallel
)

### Preparations

The dataset that is used here is a set of synthetic time series that are drawn from a generative model trained on many real-world time series, so the problem under consideration is quite realistic.

First of all, download the dataset if it has not been downloaded before.

In [5]:
path_to_dataset = 'time_series_dataset.csv'
if os.path.isfile(path_to_dataset):
    df = pd.read_csv(path_to_dataset, parse_dates=[2])
else:
    df = pd.read_csv(
        "https://docs.google.com/spreadsheets/" +
        "d/1TF0bAf9wOpIXIvIsazMCLEoHQ1y6dTkYYdYRRleC5lM/export?format=csv",
        parse_dates=[2]
    )
    df.to_csv(path_to_dataset, index=False)
df.head()

Unnamed: 0,unit,item,date,value
0,1,1,2017-11-01,7.0
1,1,1,2017-11-02,12.0
2,1,1,2017-11-03,8.0
3,1,1,2017-11-04,14.0
4,1,1,2017-11-05,6.0


How many time series are there?

In [6]:
len(df.groupby(['unit', 'item']))

7949

Each time series includes two months of observations.

Now let us define some metrics. An interesting combination is to use both $R^2$ coefficient computed in a batch for all time series and $R^2$ coefficient computed for each time series separately and then averaged over all of them. The former metric reports how levels of different time series are grasped, whereas the latter one reports how well individual dynamics and deviations from a corresponding mean are predicted.

In [7]:
def batch_r_squared(df: pd.DataFrame) -> float:
    """
    Compute coefficient of determination ignoring keys of time series.
    
    :param df:
        DataFrame with columns 'actual_value' and 'prediction'
    :return:
        R^2 coefficient of determination
    """
    return r2_score(df['actual_value'], df['prediction'])


def averaged_r_squared(df: pd.DataFrame, series_keys: List[str]) -> float:
    """
    Compute coefficient of determination for each of time series
    and then average results.
    
    :param df:
        DataFrame with columns 'actual_value', 'prediction', and all
        columns from `series_keys`
    :param series_keys:
        identifiers of individual time series
    :return:
        R^2 coefficient of determination
    """
    return df.groupby(series_keys).apply(batch_r_squared).mean()

To see why this two metric differ, look at the below example.

In [8]:
example_df = pd.DataFrame(
    [[1, 2, 3],
     [1, 4, 5],
     [2, 10, 9],
     [2, 9, 10]],
    columns=['key', 'actual_value', 'prediction']
)
batch_r_squared(example_df), averaged_r_squared(example_df, ['key'])

(0.91061452513966479, -1.5)

The first metric is high, because two series from `df` have different levels and predictions are near the corresponding levels which means that variation across levels is reflected in predictions. The second metric is negative, because variation around individual means is not reflected at all.

### The Launch Itself

In [9]:
horizon = 3
train_test_frontier = df['date'].max() - datetime.timedelta(days=horizon-1)
train_df = df[df['date'] < train_test_frontier]
test_df = df[df['date'] >= train_test_frontier]

In [10]:
selector = OnTheFlySelector(
    horizon=horizon,
    n_evaluational_rounds=10,
    verbose=1
)

In [11]:
%%time
selector = fit_selector_in_parallel(
    selector,
    train_df,
    name_of_target='value',
    series_keys=['unit', 'item'],
    n_processes=4
)

100%|██████████| 28/28 [21:58<00:00, 48.57s/it]
100%|██████████| 28/28 [21:58<00:00, 48.14s/it]
100%|██████████| 28/28 [22:00<00:00, 48.51s/it]
100%|██████████| 28/28 [22:00<00:00, 47.89s/it]


CPU times: user 672 ms, sys: 260 ms, total: 932 ms
Wall time: 22min 1s


In [12]:
%%time
predictions_df = predict_with_selector_in_parallel(
    selector,
    train_df,
    n_processes=4
)

CPU times: user 404 ms, sys: 100 ms, total: 504 ms
Wall time: 1min 47s


In [13]:
# Sorting is necessary after parallel execution.
evaluation_df = predictions_df.reset_index().sort_values(by=['unit', 'item', 'index'])
evaluation_df['actual_value'] = test_df.sort_values(by=['unit', 'item', 'date'])['value'].values

In [14]:
batch_r_squared(evaluation_df)

0.90524827491432336

In [15]:
averaged_r_squared(evaluation_df, ['unit', 'item'])

-7.0785745955648123

# Conclusion

As it can be seen, simple forecasters can not predict future dynamic of the time series under consideration, especially multiple steps ahead. However, cross-sectional variance is grasped almost perfectly.

If you need more examples of how to use `OnTheFlySelector` class, please look at `tests/on_the_fly_selector_tests.py` file.