# Long-horizon Forecasting with FTN

FTN (Forecasted Trajectory Neighbors) is an instance-based (good old KNN) approach for improving multi-step forecasts, especially for long horizons. It's primarily designed to correct i) error propagations along the horizon (in recursive-based approaches), and ii) the implicit independence assumption of direct (1 model per horizon) forecasting approaches. Not suitable for MIMO (e.g. neural nets), except when the horizon is quite large.

This notebook explores how to couple FTN with NHITS for long horizon forecasting

1. Loading LongHorizon's ETTm2 dataset
2. Fitting a NHITS model
3. Fitting FTN
4. Getting forecasts from NHITS and 
5. Evaluating all models

In [1]:
import warnings

warnings.filterwarnings("ignore")

If necessary, install the package using pip:

In [2]:
# !pip install metaforecast -U

# 1. Data preparation

Let's start by loading the dataset.
This tutorial uses the ETTm2 dataset available on datasetsforecast.

We also set the forecasting horizon and input size (number of lags) to 192 and 96, respectively.

In [3]:
import pandas as pd

from datasetsforecast.long_horizon import LongHorizon

# ade is best suited for short-term forecasting
horizon = 196
n_lags = 92


df, *_ = LongHorizon.load('/Users/vcerq/Documents',group='ETTm2')

df['ds'] = pd.to_datetime(df['ds'])

Split the dataset into training and testing sets:

In [4]:
df_by_unq = df.groupby('unique_id')

train_l, test_l = [], []
for g, df_ in df_by_unq:
    df_ = df_.sort_values('ds')

    train_df_g = df_.head(-horizon)
    test_df_g = df_.tail(horizon)

    train_l.append(train_df_g)
    test_l.append(test_df_g)

train_df = pd.concat(train_l).reset_index(drop=True)
test_df = pd.concat(test_l).reset_index(drop=True)

train_df.query('unique_id=="HUFL"').tail()

Unnamed: 0,unique_id,ds,y
57399,HUFL,2018-02-18 21:45:00,-1.201683
57400,HUFL,2018-02-18 22:00:00,-1.201683
57401,HUFL,2018-02-18 22:15:00,-1.377739
57402,HUFL,2018-02-18 22:30:00,-1.617799
57403,HUFL,2018-02-18 22:45:00,-1.441742


In [5]:
test_df.query('unique_id=="HUFL"').head()

Unnamed: 0,unique_id,ds,y
0,HUFL,2018-02-18 23:00:00,-1.545771
1,HUFL,2018-02-18 23:15:00,-1.545771
2,HUFL,2018-02-18 23:30:00,-1.569844
3,HUFL,2018-02-18 23:45:00,-1.761853
4,HUFL,2018-02-19 00:00:00,-1.705875


# 2. Model setup and fitting

We focus on NHITS, which has been shown to excel on long-horizon forecasting

In [6]:
from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS

CONFIG = {
    'max_steps': 1000,
    'input_size': n_lags,
    'h': horizon,
    'enable_checkpointing': True,
    'accelerator': 'cpu'}

models = [NHITS(start_padding_enabled=True, **CONFIG),]

nf = NeuralForecast(models=models, freq='15min')

2024-10-10 18:04:18,481	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2024-10-10 18:04:18,533	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
INFO:lightning_fabric.utilities.seed:Seed set to 1


In [7]:
%%capture

nf.fit(df=train_df)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (mps), used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name         | Type          | Params | Mode 
-------------------------------------------------------
0 | loss         | MAE           | 0      | train
1 | padder_train | ConstantPad1d | 0      | train
2 | scaler       | TemporalNorm  | 0      | train
3 | blocks       | ModuleList    | 2.8 M  | train
-------------------------------------------------------
2.8 M     Trainable params
0         Non-trainable params
2.8 M     Total params
11.109    Total estimated model params size (MB)
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_steps=1000` reached.


# 3. Fitting FTN

Now, we can fit 

In [8]:
from metaforecast.longhorizon.ftn import MLForecastFTN as FTN

ftn = FTN(horizon=horizon,
          n_neighbors=150,
          apply_ewm=True)

In [9]:
ftn.fit(train_df)

In [10]:
fcst_nf = nf.predict()

fcst_ftn = ftn.predict(fcst_nf)
fcst_ftn = fcst_ftn.rename(columns={'NHITS':'NHITS(FTN)'})
fcst_ftn['NHITS'] = fcst_nf['NHITS'].values

fcst_ftn.head()

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (mps), used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs


Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 229.89it/s]


Unnamed: 0_level_0,ds,NHITS(FTN),NHITS
unique_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
HUFL,2018-02-18 23:00:00,-1.362009,-1.407756
HUFL,2018-02-18 23:15:00,-1.374306,-1.432245
HUFL,2018-02-18 23:30:00,-1.381242,-1.465016
HUFL,2018-02-18 23:45:00,-1.389424,-1.471828
HUFL,2018-02-19 00:00:00,-1.396442,-1.471012


Below are the weights of each model (equal across all unique ids because weight_by_uid=False)

Then, we refit the neural networks are get the test forecasts

# 4. Evaluation

Finally, we compare all approaches

In [11]:
test_df = test_df.merge(fcst_ftn, on=['unique_id','ds'], how="left")

In [12]:
from neuralforecast.losses.numpy import smape
from datasetsforecast.evaluation import accuracy

evaluation_df = accuracy(test_df, [smape], agg_by=['unique_id'])

In [13]:
eval_df = evaluation_df.drop(columns=['metric','unique_id'])

eval_df

Unnamed: 0,NHITS(FTN),NHITS
0,0.222831,0.215678
1,0.227395,0.236218
2,0.049777,0.042658
3,0.3091,0.262439
4,0.302872,0.315809
5,0.238063,0.199672
6,0.118353,0.121053


In [14]:
eval_df.mean().sort_values()

NHITS         0.199075
NHITS(FTN)    0.209770
dtype: float64