In [None]:
import numpy as np
import pandas as pd
from sktime.utils.plotting import plot_series

## Basic deployment workflow in a nutshell

### step 1 - preparation of the data
* https://www.sktime.org/en/stable/examples/01_forecasting.html


In [None]:
df = pd.read_csv("../../data/later/profile_growth.csv")

#df.columns

followers = df[['Date', 'Followers']]

followers['Date'] = pd.PeriodIndex(pd.DatetimeIndex(followers['Date']), freq='D') 

y = followers.set_index('Date').sort_index()

plot_series(y)

### step 2 - specifying the forecasting horizon

In [None]:
# using a numpy forecasting horizon
fh = np.arange(1, 8)
fh

This will ask for daily predictions for the next seven days, since the original series period is 1 day.

### step 3 - specifying the forecasting algorithm

To make forecasts, a forecasting algorithm needs to be specified. This is done using a scikit-learn-like interface. Most importantly, all sktime forecasters follow the same interface, so the preceding and remaining steps are the same, no matter which forecaster is being chosen.

In [None]:
from sktime.forecasting.naive import NaiveForecaster
forecaster = NaiveForecaster(strategy="last")

### step 4 - fitting the forecaster to the seen data
Now the forecaster needs to be fitted to the seen data:

In [None]:
forecaster.fit(y)

### step 5 - requesting forecasts
Finally, we request forecasts for the specified forecasting horizon. This needs to be done after fitting the forecaster:

In [None]:
y_pred = forecaster.predict(fh)

In [None]:
# plotting predictions and past data
plot_series(y, y_pred, labels=["y", "y_pred"])

In [None]:
from sktime.forecasting.theta import ThetaForecaster
from sktime.forecasting.theta import ThetaForecaster
fh = np.arange(1, 8)

forecaster = ThetaForecaster(sp=7)
forecaster.fit(y)

forecaster.predict(fh)

In [None]:
# setting return_pred_int argument to True; alpha determines percentiles
#  intervals are lower = alpha/2-percentile, upper = (1-alpha/2)-percentile
#alpha = 0.05  # 2.5%/97.5% prediction intervals
#forecaster.predict(fh, return_pred_int=True, alpha=alpha)

In the example, we will us the same airline data as in Section 1.2. But, instead of predicting the next 3 years, we hold out the last 3 years of the airline data (below: y_test), and see how the forecaster would have performed three years ago, when asked to forecast the most recent 3 years (below: y_pred), from the years before (below: y_train). “how” is measured by a quantitative performance metric (below: mean_absolute_percentage_error). This is then considered as an indication of how well the forecaster would perform in the coming 3 years (what was done in Section 1.2). This may or may not be a stretch depending on statistical assumptions and data properties (caution: it often is a stretch - past performance is in general not indicative of future performance).

### step 1 - splitting a historical data set in to a temporal train and test batch

In [None]:
from sktime.forecasting.model_selection import temporal_train_test_split
y_train, y_test = temporal_train_test_split(y, test_size=36)
# we will try to forecast y_test from y_train

In [None]:
# plotting for illustration
plot_series(y_train, y_test, labels=["y_train", "y_test"])
print(y_train.shape[0], y_test.shape[0])

### step 2 - making forecasts for y_test from y_train
This is almost verbatim the workflow in Section 1.2, using y_train to predict the indices of y_test.

In [None]:
from sktime.forecasting.base import ForecastingHorizon
# we can simply take the indices from `y_test` where they already are stored
fh = ForecastingHorizon(y_test.index, is_relative=False)

forecaster = NaiveForecaster(strategy="last", sp=12)

forecaster.fit(y_train)

# y_pred will contain the predictions
y_pred = forecaster.predict(fh)

In [None]:
# plotting for illustration
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])

### steps 3 and 4 - specifying a forecasting metric, evaluating on the test set
The next step is to specify a forecasting metric. These are functions that return a number when input with prediction and actual series. They are different from sklearn metrics in that they accept series with indices rather than np.arrays. Forecasting metrics can be invoked in two ways:

using the lean function interface, e.g., mean_absolute_percentage_error which is a python function (y_true : pd.Series, y_pred : pd.Series) -> float

using the composable class interface, e.g., MeanAbsolutePercentageError, which is a python class, callable with the same signature

In [None]:
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
# option 1: using the lean function interface
mean_absolute_percentage_error(y_test, y_pred)
# note: the FIRST argument is the ground truth, the SECOND argument are the forecasts
#       the order matters for most metrics in general

To properly interpret numbers like this, it is useful to understand properties of the metric in question (e.g., lower is better), and to compare against suitable baselines and contender algorithms (see step 5).

In [None]:
from sktime.performance_metrics.forecasting import MeanAbsolutePercentageError
# option 2: using the composable class interface
mape = MeanAbsolutePercentageError(symmetric=False)
# the class interface allows to easily construct variants of the MAPE
#  e.g., the non-symmetric verion
# it also allows for inspection of metric properties
#  e.g., are higher values better (answer: no)?
mape.greater_is_better

In [None]:
# evaluation works exactly like in option 2, but with the instantiated object
mape(y_test, y_pred)

### step 5 - testing performance against benchmarks
In general, forecast performances should be quantitatively tested against benchmark performances.