<a href="https://colab.research.google.com/github/paranoia0121/PredictionComparison/blob/main/patch_tsmixer_getting_started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting started with `PatchTSMixer`
## Direct forecasting example

This notebooke demonstrates the usage of a `PatchTSMixer` model for a multivariate time series forecasting task. This notebook has a dependecy on HuggingFace [transformers](https://github.com/huggingface/transformers) repo. For details related to model architecture, refer to the [TSMixer paper](https://arxiv.org/abs/2306.09364).

In [2]:
# Clone the ibm/tsfm
! git clone https://github.com/IBM/tsfm.git

Cloning into 'tsfm'...
remote: Enumerating objects: 2286, done.[K
remote: Counting objects: 100% (828/828), done.[K
remote: Compressing objects: 100% (298/298), done.[K
remote: Total 2286 (delta 595), reused 548 (delta 530), pack-reused 1458[K
Receiving objects: 100% (2286/2286), 20.17 MiB | 20.53 MiB/s, done.
Resolving deltas: 100% (1343/1343), done.


In [5]:
# Change directory. Move inside the tsfm repo.
%cd tsfm

[Errno 2] No such file or directory: 'tsfm'
/content/tsfm


In [7]:
# Install the tsfm library
! pip install ".[notebooks]"

Processing /content/tsfm
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pandas>=2.2.0 (from tsfm_public==0.2.7.dev29+gc9bbacf)
  Downloading pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Collecting datasets (from tsfm_public==0.2.7.dev29+gc9bbacf)
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting deprecated (from tsfm_public==0.2.7.dev29+gc9bbacf)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting urllib3<2,>=1.26.19 (from tsfm_public==0.2.7.dev29+gc9bbacf)
  Downloading urllib3-1.26.19-py2.py3-none-any.whl.metadata (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting jupyter (from tsfm_public==0.2.7.dev29+gc9bbacf)
  Downloading jupyter-1.0.0-py2.py3-none-any.whl.metadat

In [1]:
# Standard
import os
import random

# Third Party
from transformers import (
    EarlyStoppingCallback,
    PatchTSMixerConfig,
    PatchTSMixerForPrediction,
    Trainer,
    TrainingArguments,
)
import numpy as np
import pandas as pd
import torch

# First Party
from tsfm_public.toolkit.dataset import ForecastDFDataset
from tsfm_public.toolkit.time_series_preprocessor import TimeSeriesPreprocessor
from tsfm_public.toolkit.util import select_by_index

In [2]:
# Set seed for reproducibility
SEED = 42
torch.manual_seed(SEED)
random.seed(SEED)
np.random.seed(SEED)

## Load and prepare datasets

In the next cell, please adjust the following parameters to suit your application:
- `dataset_path`: path to local .csv file, or web address to a csv file for the data of interest. Data is loaded with pandas, so anything supported by
`pd.read_csv` is supported: (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
- `timestamp_column`: column name containing timestamp information, use None if there is no such column
- `id_columns`: List of column names specifying the IDs of different time series. If no ID column exists, use []
- `forecast_columns`: List of columns to be modeled
- `context_length`: The amount of historical data used as input to the model. Windows of the input time series data with length equal to
context_length will be extracted from the input dataframe. In the case of a multi-time series dataset, the context windows will be created
so that they are contained within a single time series (i.e., a single ID).
- `forecast_horizon`: Number of time stamps to forecast in future.
- `train_start_index`, `train_end_index`: the start and end indices in the loaded data which delineate the training data.
- `valid_start_index`, `valid_end_index`: the start and end indices in the loaded data which delineate the validation data.
- `test_start_index`, `test_end_index`: the start and end indices in the loaded data which delineate the test data.
- `patch_length`: The patch length for the `PatchTSMixer` model. Recommended to have a value so that `context_length` is divisible by it.
- `num_workers`: Number of dataloder workers in pytorch dataloader.
- `batch_size`: Batch size.
The data is first loaded into a Pandas dataframe and split into training, validation, and test parts. Then the pandas dataframes are converted
to the appropriate torch dataset needed for training.

In [13]:
# dataset = "ETTh1"
num_workers = 8  # Reduce this if you have low number of CPU cores
batch_size = 32  # Reduce if not enough GPU memory available
context_length = 512
forecast_horizon = 96
patch_length = 8

In [14]:
# print(f"Loading target dataset: {dataset}")
# dataset_path = f"https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/{dataset}.csv"
dataset_path = "/content/sample_data/MES_total_renewables_CN.csv"
timestamp_column = "Time"
id_columns = []
forecast_columns = ["Value(GWh)"]
train_start_index = None  # None indicates beginning of dataset
# train_end_index = 12 * 30 * 24
train_end_index = 79

# we shift the start of the validation/test period back by context length so that
# the first validation/test timestamp is immediately following the training data
# valid_start_index = 12 * 30 * 24 - context_length
# valid_end_index = 12 * 30 * 24 + 4 * 30 * 24
valid_start_index = 63
valid_end_index = 79 + 32

test_start_index = 95
test_end_index = 79 + 32

In [15]:
data = pd.read_csv(
    dataset_path,
    parse_dates=[timestamp_column],
)

train_data = select_by_index(
    data,
    id_columns=id_columns,
    start_index=train_start_index,
    end_index=train_end_index,
)
valid_data = select_by_index(
    data,
    id_columns=id_columns,
    start_index=valid_start_index,
    end_index=valid_end_index,
)
test_data = select_by_index(
    data,
    id_columns=id_columns,
    start_index=test_start_index,
    end_index=test_end_index,
)

tsp = TimeSeriesPreprocessor(
    timestamp_column=timestamp_column,
    id_columns=id_columns,
    target_columns=forecast_columns,
    scaling=True,
)
tsp.train(train_data)

TimeSeriesPreprocessor {
  "categorical_encoder": null,
  "conditional_columns": [],
  "context_length": 64,
  "control_columns": [],
  "encode_categorical": true,
  "feature_extractor_type": "TimeSeriesPreprocessor",
  "freq": "30 days 00:00:00",
  "frequency_mapping": {
    "10min": 4,
    "15min": 5,
    "2min": 2,
    "30min": 6,
    "5min": 3,
    "W": 9,
    "d": 8,
    "h": 7,
    "min": 1,
    "oov": 0
  },
  "id_columns": [],
  "observable_columns": [],
  "prediction_length": null,
  "processor_class": "TimeSeriesPreprocessor",
  "scaler_dict": {},
  "scaler_type": "standard",
  "scaling": true,
  "scaling_id_columns": [],
  "static_categorical_columns": [],
  "target_columns": [
    "Value(GWh)"
  ],
  "target_scaler_dict": {
    "0": {
      "copy": true,
      "feature_names_in_": [
        "Value(GWh)"
      ],
      "mean_": [
        149521.40934177212
      ],
      "n_features_in_": 1,
      "n_samples_seen_": 79,
      "scale_": [
        35831.741047819116
      ],
 

In [16]:
train_dataset = ForecastDFDataset(
    tsp.preprocess(train_data),
    id_columns=id_columns,
    target_columns=forecast_columns,
    context_length=context_length,
    prediction_length=forecast_horizon,
)
valid_dataset = ForecastDFDataset(
    tsp.preprocess(valid_data),
    id_columns=id_columns,
    target_columns=forecast_columns,
    context_length=context_length,
    prediction_length=forecast_horizon,
)
test_dataset = ForecastDFDataset(
    tsp.preprocess(test_data),
    id_columns=id_columns,
    target_columns=forecast_columns,
    context_length=context_length,
    prediction_length=forecast_horizon,
)

## Testing with a `PatchTSMixer` model that was trained on the training part of the `ETTh1` data

A pre-trained model (on `ETTh1` data) is available at [ibm-granite/granite-timeseries-patchtsmixer](https://huggingface.co/ibm-granite/granite-timeseries-patchtsmixer).

In [17]:
print("Loading pretrained model")
inference_forecast_model = PatchTSMixerForPrediction.from_pretrained(
    "ibm-granite/granite-timeseries-patchtsmixer"
)
print("Done")

Loading pretrained model
Done


In [18]:
inference_forecast_trainer = Trainer(
    model=inference_forecast_model,
)

print("\n\nDoing testing on MEA/test data")
result = inference_forecast_trainer.evaluate(test_dataset)
print(result)



Doing testing on MEA/test data


{'eval_loss': 1.0162259340286255, 'eval_runtime': 0.1241, 'eval_samples_per_second': 8.058, 'eval_steps_per_second': 8.058}


## If we want to train from scratch

Adjust the following model parameters according to need.
- `d_model` (`int`, *optional*, defaults to 8):
    Hidden dimension of the model. Recommended to set it as a multiple of patch_length (i.e. 2-8X of
    patch_len). Larger value indicates more complex model.
- `expansion_factor` (`int`, *optional*, defaults to 2):
    Expansion factor to use inside MLP. Recommended range is 2-5. Larger value indicates more complex model.
- `num_layers` (`int`, *optional*, defaults to 3):
    Number of layers to use. Recommended range is 3-15. Larger value indicates more complex model.

In [None]:
config = PatchTSMixerConfig(
    context_length=context_length,
    prediction_length=forecast_horizon,
    patch_length=patch_length,
    num_input_channels=len(forecast_columns),
    patch_stride=patch_length,
    d_model=48,
    num_layers=3,
    expansion_factor=3,
    dropout=0.5,
    head_dropout=0.7,
    mode="common_channel",  # change it `mix_channel` if we need to explicitly model channel correlations
    scaling="std",
)
model = PatchTSMixerForPrediction(config=config)

In [None]:
train_args = TrainingArguments(
    output_dir="./checkpoint/patchtsmixer/direct/train/output/",
    overwrite_output_dir=True,
    learning_rate=0.0001,
    num_train_epochs=100,
    do_eval=True,
    evaluation_strategy="epoch",
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    dataloader_num_workers=num_workers,
    report_to="tensorboard",
    save_strategy="epoch",
    logging_strategy="epoch",
    save_total_limit=3,
    logging_dir="./checkpoint/patchtsmixer/direct/train/logs/",  # Make sure to specify a logging directory
    load_best_model_at_end=True,  # Load the best model when training ends
    metric_for_best_model="eval_loss",  # Metric to monitor for early stopping
    greater_is_better=False,  # For loss
    label_names=["future_values"],
)

# Create a new early stopping callback with faster convergence properties
early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience=5,  # Number of epochs with no improvement after which to stop
    early_stopping_threshold=0.001,  # Minimum improvement required to consider as improvement
)

trainer = Trainer(
    model=model,
    args=train_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    callbacks=[early_stopping_callback],
)

print("\n\nDoing forecasting training on Etth1/train")
trainer.train()



Doing forecasting training on Etth1/train


Epoch,Training Loss,Validation Loss
1,0.4991,0.705115
2,0.4064,0.68981
3,0.3821,0.681682
4,0.3666,0.681303
5,0.3588,0.684668
6,0.3519,0.690913
7,0.3479,0.690822
8,0.3476,0.701287


TrainOutput(global_step=2016, training_loss=0.3825622891622876, metrics={'train_runtime': 28.0282, 'train_samples_per_second': 28660.44, 'train_steps_per_second': 899.095, 'total_flos': 597127745765376.0, 'train_loss': 0.3825622891622876, 'epoch': 8.0})

In [None]:
trainer.evaluate(test_dataset)

{'eval_loss': 0.367517352104187,
 'eval_runtime': 0.6978,
 'eval_samples_per_second': 3991.024,
 'eval_steps_per_second': 126.108,
 'epoch': 8.0}

## If we want to train from scratch for a few specific forecast channels

In [None]:
forecast_channel_indices = [
    -4,
    -1,
]  # add the channel indices (i.e., the column number) for which the model should forecast

In [None]:
config = PatchTSMixerConfig(
    context_length=context_length,
    prediction_length=forecast_horizon,
    patch_length=patch_length,
    num_input_channels=len(forecast_columns),
    patch_stride=patch_length,
    d_model=48,
    num_layers=3,
    expansion_factor=3,
    dropout=0.5,
    head_dropout=0.7,
    mode="common_channel",
    scaling="std",
    prediction_channel_indices=forecast_channel_indices,
)
model = PatchTSMixerForPrediction(config=config)

In [None]:
trainer = Trainer(
    model=model,
    args=train_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    callbacks=[early_stopping_callback],
)

print("\n\nDoing forecasting training on Etth1/train")
trainer.train()



Doing forecasting training on Etth1/train


Epoch,Training Loss,Validation Loss
1,0.2753,0.496316
2,0.2312,0.485542
3,0.2182,0.478069
4,0.2099,0.470516
5,0.2064,0.47701
6,0.2026,0.474555
7,0.2006,0.474283
8,0.1983,0.472296
9,0.196,0.464579
10,0.1948,0.467563


TrainOutput(global_step=4284, training_loss=0.20359806520264356, metrics={'train_runtime': 60.2954, 'train_samples_per_second': 13322.735, 'train_steps_per_second': 417.942, 'total_flos': 1268896459751424.0, 'train_loss': 0.20359806520264356, 'epoch': 17.0})

In [None]:
trainer.evaluate(test_dataset)

{'eval_loss': 0.1160622164607048,
 'eval_runtime': 0.7379,
 'eval_samples_per_second': 3774.245,
 'eval_steps_per_second': 119.258,
 'epoch': 17.0}

#### Sanity check: Compute number of forecasting channels

In [None]:
output = trainer.predict(test_dataset)

In [None]:
output.predictions[0].shape

(2785, 96, 2)