# Time-series Forecasting

## 0. Wikipedia Traffic Dataset

https://www.kaggle.com/competitions/web-traffic-time-series-forecasting

In [None]:
import pandas as pd
import numpy as np

train = pd.read_pickle("train.pkl").asfreq('D')
test = pd.read_pickle("test.pkl").asfreq('D')

train

In [None]:
test

For simplicity reasons, let us just consider pages having at least 20 000 visits per days.

In [None]:
median_visits = train.median(axis = 0)
over20000 = median_visits >= 20_000
sum(over20000)

In [None]:
train_filtered = train.loc[:, over20000].apply(np.log1p).copy()
test_filtered = test.loc[:, over20000].apply(np.log1p).copy()

In [None]:
train_filtered.shape, test_filtered.shape   

In [None]:
train_filtered.columns

## 1. Plotting

Either select or pick randomly one column - try to plot the time series (training part). You can use `plot_series` from `sktime` package.

In [None]:
import random

selected_column = random.choice(train_filtered.columns)

selected_column

In [None]:
from sktime.utils.plotting import plot_series

# YOUR CODE HERE

Now try to plot both time series and the target (testing part)

In [None]:
# YOUR CODE HERE

## 2. Naive Forecasting

Let us start with a naive forecaster. 

In [None]:
from sktime.forecasting.naive import NaiveForecaster

fh = np.arange(3, 65)  # forecasting horizon, day 3 - day 65 after training

forecaster = NaiveForecaster(strategy="mean")

In [None]:
forecaster.fit(train_filtered[selected_column])

In [None]:
prediction = forecaster.predict(fh)

In [None]:
plot_series(train_filtered[selected_column], test_filtered[selected_column], prediction);

Now instead of average, try to repeat last week forever to the future (set `strategy` to `last` and period length `sp` to 7)

In [None]:
# YOUR CODE HERE

## 3. SMAPE

Kaggle competition used [symmetric mean absolute percentage error](https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error) (SMAPE) for evaluation. Calculate smape between your forecast and actual values.

In [None]:
def smape(y_true, y_pred):
    ## YOUR CODE HERE

In [None]:
smape(test_filtered[selected_column], prediction)

## 4. Forecasing Methods

Look through `sktime` [list of forecasting methods](https://www.sktime.net/en/stable/examples/01_forecasting.html#2.-Forecasters-in-sktime---lookup,-properties,-main-families).

Try a few of them (docs contain examples how to use them, e.g. see [TBATS](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.tbats.TBATS.html) code below). Which prediction is the best (minimize SMAPE metric)

In [None]:
from sktime.forecasting.tbats import TBATS

forecaster = TBATS(  
    use_box_cox=False,
    use_trend=False,
    use_damped_trend=False,
    sp=7,
    use_arma_errors=False,
    n_jobs=1)

In [None]:
## YOUR CODE HERE

In [None]:
forecaster.fit(train_filtered[selected_column])
prediction = forecaster.predict(fh)

In [None]:
plot_series(train_filtered[selected_column], test_filtered[selected_column], prediction);

In [None]:
smape(test_filtered[selected_column], prediction)

## 5. Prophet

[Prophet](https://facebook.github.io/prophet/) is (was?) Meta's tool for time series forecasting (easy handling of anomalies, common seasonalities and holidays).

In [None]:
from sktime.forecasting.fbprophet import Prophet

## YOUR CODE HERE

In [None]:
forecaster.fit(train_filtered[selected_column])
prediction = forecaster.predict(fh)

In [None]:
plot_series(train_filtered[selected_column], test_filtered[selected_column], prediction);

In [None]:
smape(test_filtered[selected_column], prediction)

## 6. Time Series Foundation Models

In [None]:
from sktime.forecasting.moirai_forecaster import MOIRAIForecaster

## YOUR CODE HERE

In [None]:
forecaster.fit(train_filtered[selected_column])
prediction = forecaster.predict(fh)

In [None]:
plot_series(train_filtered[selected_column], test_filtered[selected_column], prediction);