[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nepslor/B5203E-TSAF/blob/main/W9/foundational_models.ipynb)

# Fundational models
One of the most influential developments in deep learning has been the transformer architecture, introduced in Attention Is All You Need (Vaswani et al. 2017). Unlike traditional recurrent neural networks (RNNs) or long short-term memory (LSTM) networks, transformers process entire input sequences in parallel rather than sequentially. This eliminates the need for step-by-step recurrence, making them more scalable and efficient for large datasets.

In this exercise we will run two fundational models for:
* transfer learning
* predicting with covariates



In [None]:
%%capture
!pip install --upgrade pip setuptools wheel
!pip install numpy==1.26.4 pandas==2.1.4
!pip install matplotlib ipython python-dotenv
!pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cpu
!pip install nixtla
!pip install fpppy uni2ts
!pip install gluonts==0.14.4

❗ Due to incompatibility issues, you must restart the session now! `Runtime --> Restart Session`


In [None]:
%%capture
!pip install datasetsforecast neuralforecast
!pip install transformers==4.31.0

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
from nixtla import NixtlaClient
from dotenv import load_dotenv
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae
from datasetsforecast.m4 import M4
from neuralforecast.core import NeuralForecast
from neuralforecast.models import NHITS
from neuralforecast.utils import AirPassengersDF
from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import split
from uni2ts.model.moirai import MoiraiForecast, MoiraiModule

plt.style.use("ggplot")
import matplotlib as mpl
from cycler import cycler
mpl.rcParams['axes.prop_cycle'] = cycler(color=["#000000", "#000000"])

os.environ["NIXTLA_ID_AS_COL"] = "true"
pd.set_option("display.precision", 3)

from fpppy.utils import plot_series

# Transfer learinig
Transfer-learning refers to pre-training a model on a (usually large) source dataset to improve its performance on a new forecasting task with a target dataset.

The core idea of a foundation models is to leverage this principle by training it on large time series dataset, leveraging scaling laws on the dataset and model sizes. A diverse dataset, in terms of breadth and depth is necessary for the model to perform in a wide variety of domains.


In the following simple example we illustrate this process by pre-training the NHITS (Challu et al. 2023) on the M4 monthly dataset. The M4 dataset is a collection of 100,000 time series

Key techniques implemented in NHiTS (Neural Hierarchical Interpolation for Time Series Forecasting)

* Multi-Rate Data Sampling
* Neural Basis Expansion Analysis
* Doubly Residual Stacking
* Hierarchical Interpolation

<img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd520fbc5-2ad3-4751-88e4-67280423de22_700x454.png" width="500"/>



In [None]:
Y_df, _, _ = M4.load(directory="./", group="Monthly")
Y_df["ds"] = Y_df.groupby("unique_id")["ds"].transform(
    lambda x: pd.date_range(start="1970-01-01", periods=len(x), freq="MS")
)
Y_df.head()

### Fitting on M4

In [None]:
horizon = 12
stacks = 3
models = [
    NHITS(
        input_size=5 * horizon,
        h=horizon,
        max_steps=2_000,
        stack_types=stacks * ["identity"],
        n_blocks=stacks * [1],
        mlp_units=[[256, 256] for _ in range(stacks)],
        n_pool_kernel_size=stacks * [1],
        batch_size=32,
        scaler_type="standard",
        n_freq_downsample=[12, 4, 1],
    )
]

nf = NeuralForecast(models=models, freq="M")
nf.fit(df=Y_df)


### Predicting AirPassengers

In [None]:
air_passengers_df = AirPassengersDF.copy()
transfer_preds = nf.predict(df=air_passengers_df)


plot_series(air_passengers_df, transfer_preds,
            xlabel="Month", ylabel="Number of passengers",
            title = "International airline passengers",
            rm_legend=False,)

# Probabilistic forecasts with exogenous variables with Moirai
Moirai is an encoder-only transformer model developed by Salesforce. It is a probabilistic model that also relies on patching the input series. It supports both historical and future exogenous variables. A disctinct feature of Moirai is that it uses a mixture distribution, using four different distributions, to output more flexible prediction intervals. Its release also comes with LOTSA, an open archive of time series data with 27B data points.


<img src="https://otexts.com/fpppy/nbs/15-foundation-models_files/figure-html/fig-moirai-output-1.png" width="800"/>

 First, we start by converting the dataset to a GluonTS dataset.



In [None]:
df = pd.read_csv("https://otexts.com/fpppy/data/electricity_short.csv", parse_dates=["ds"])
df.head()

In [None]:
future_ex_vars_df = pd.read_csv(
    "https://otexts.com/fpppy/data/electricity_future_vars.csv", parse_dates=["ds"]
)
future_ex_vars_df.head()

In [None]:
full_df = pd.concat([df, future_ex_vars_df], axis=0)
full_df = full_df.set_index("ds")

ds = PandasDataset.from_long_dataframe(
    full_df,
    target="y",
    item_id="unique_id",
    feat_dynamic_real=[
        "Exogenous1",
        "Exogenous2",
        "day_0",
        "day_1",
        "day_2",
        "day_3",
        "day_4",
        "day_5",
        "day_6",
    ],
)

We set the forecast horizon to 24 time steps into the future.



In [None]:
train, test_template = split(ds, offset=-24)
test_data = test_template.generate_instances(
    prediction_length=24,
    windows=1,
    distance=24,
)

Then, we initialize the Moirai model. We can choose between small, base and large. Here, let’s use the large model.



In [None]:
model = MoiraiForecast(
    module=MoiraiModule.from_pretrained(f"Salesforce/moirai-1.0-R-large"),
    prediction_length=24,
    context_length=240,
    patch_size="auto",
    num_samples=50,
    target_dim=1,
    feat_dynamic_real_dim=ds.num_feat_dynamic_real,
    past_feat_dynamic_real_dim=ds.num_past_feat_dynamic_real,
)

Once the model is initialized, we can use it for zero-shot forecasting with the available exogenous features.

In [None]:
predictor = model.create_predictor(batch_size=32)
forecasts = predictor.predict(test_data.input)
forecasts = list(forecasts)

In [None]:
forecasts[0]

Since Moirai is a probabilistic model, it returns a distribution of future values. Thus, let’s take the median as the point forecast.

In [None]:
moirai_preds = np.median(forecasts[0].samples, axis=0)

In [None]:
fcst_df = pd.DataFrame({'unique_id':np.tile('NP', 24), 'ds':pd.date_range(df['ds'].iloc[-1], periods=24, freq='1h'), 'moirai':moirai_preds})
plot_series(df, fcst_df, max_insample_length=100,
            xlabel="Hour", ylabel="Price",
            title="Electricity price of the Nord Pool electricity market",
            rm_legend=False)

We can also plot the different samples:

In [None]:
samples_df = pd.DataFrame(forecasts[0].samples.T,
                          index=pd.date_range(df['ds'].iloc[-1], periods=24, freq='1h'),
                          columns=[f'sample_{i}' for i in range(forecasts[0].samples.shape[0])])

# 2. Join with the existing forecast DataFrame
samples_df = pd.merge(fcst_df, samples_df, left_on='ds', right_index=True, how='left')

plot_series(df, samples_df, max_insample_length=100,
            xlabel="Hour", ylabel="Price",
            title="Electricity price of the Nord Pool electricity market",
            rm_legend=True)
