# Getting started with TinyTimeMixer (TTM)

This notebooke demonstrates the usage of a pre-trained `TinyTimeMixer` model for several multivariate time series forecasting tasks. For details related to model architecture, refer to the [TTM paper](https://arxiv.org/pdf/2401.03955.pdf).

In this example, we will use a pre-trained TTM-512-96 model. That means the TTM model can take an input of 512 time points (`context_length`), and can forecast upto 96 time points (`forecast_length`) in the future. We will use the pre-trained TTM in two settings:
1. **Zero-shot**: The pre-trained TTM will be directly used to evaluate on the `test` split of the target data. Note that the TTM was NOT pre-trained on the target data.
2. **Few-shot**: The pre-trained TTM will be quickly fine-tuned on only 5% of the `train` split of the target data, and subsequently, evaluated on the `test` part of the target data.

Note: Alternatively, this notebook can be modified to try any other TTM model from a suite of TTM models. For details, visit the [Hugging Face TTM Model Repository](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2).

1. IBM Granite TTM-R1 pre-trained models can be found here: [Granite-TTM-R1 Model Card](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1)
2. IBM Granite TTM-R2 pre-trained models can be found here: [Granite-TTM-R2 Model Card](https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2)
3. Research-use (non-commercial use only) TTM-R2 pre-trained models can be found here: [Research-Use-TTM-R2](https://huggingface.co/ibm-research/ttm-research-r2)

### The get_model() utility
TTM Model card offers a suite of models with varying `context_length` and `prediction_length` combinations.
In this notebook, we will utilize the TSFM `get_model()` utility that automatically selects the right model based on the given input `context_length` and `prediction_length` (and some other optional arguments) abstracting away the internal complexity. See the usage examples below in the `zeroshot_eval()` and `fewshot_finetune_eval()` functions. For more details see the [docstring](https://github.com/ibm-granite/granite-tsfm/blob/main/tsfm_public/toolkit/get_model.py) of the function definition.

## Install `tsfm`
**[Optional for Local Run / Mandatory for Google Colab]**  
Run the below cell to install `tsfm`. Skip if already installed.

In [3]:
# Install the tsfm library
! pip install "granite-tsfm[notebooks] @ git+https://github.com/ibm-granite/granite-tsfm.git@v0.3.1"

Collecting granite-tsfm@ git+https://github.com/ibm-granite/granite-tsfm.git@v0.3.1 (from granite-tsfm[notebooks]@ git+https://github.com/ibm-granite/granite-tsfm.git@v0.3.1)
  Cloning https://github.com/ibm-granite/granite-tsfm.git (to revision v0.3.1) to /tmp/pip-install-4k9f6m7d/granite-tsfm_bd71648cafeb48a68bf27ad280ef0ced
  Running command git clone --filter=blob:none --quiet https://github.com/ibm-granite/granite-tsfm.git /tmp/pip-install-4k9f6m7d/granite-tsfm_bd71648cafeb48a68bf27ad280ef0ced
  Running command git checkout -q 16106d70d1fb3244eecd48c8fbbf3a0009fb8751
  Resolved https://github.com/ibm-granite/granite-tsfm.git to commit 16106d70d1fb3244eecd48c8fbbf3a0009fb8751
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting deprecated (from granite-tsfm@ git+https://github.com/ibm-granite/granite-tsfm.git@v0.3.1->granite-tsfm[notebooks]@ git+https://g

## Imports

In [1]:
import math
import os
import tempfile

import pandas as pd
from torch.optim import AdamW
from torch.optim.lr_scheduler import OneCycleLR
from transformers import EarlyStoppingCallback, Trainer, TrainingArguments, set_seed
from transformers.integrations import INTEGRATION_TO_CALLBACK

from tsfm_public.toolkit import TimeSeriesPreprocessor, TrackingCallback, count_parameters, get_datasets
from tsfm_public.toolkit.get_model import get_model
from tsfm_public.toolkit.lr_finder import optimal_lr_finder
from tsfm_public.toolkit.visualization import plot_predictions

In [2]:
import warnings


# Suppress all warnings
warnings.filterwarnings("ignore")

### Important arguments

In [3]:
# Set seed for reproducibility
SEED = 42
set_seed(SEED)

# TTM Model path. The default model path is Granite-R2. Below, you can choose other TTM releases.
TTM_MODEL_PATH = "ibm-granite/granite-timeseries-ttm-r2"
# TTM_MODEL_PATH = "ibm-granite/granite-timeseries-ttm-r1"
# TTM_MODEL_PATH = "ibm-research/ttm-research-r2"

# Context length, Or Length of the history.
# Currently supported values are: 512/1024/1536 for Granite-TTM-R2 and Research-Use-TTM-R2, and 512/1024 for Granite-TTM-R1
CONTEXT_LENGTH = 1536

# Granite-TTM-R2 supports forecast length upto 720 and Granite-TTM-R1 supports forecast length upto 96
PREDICTION_LENGTH = 168

TARGET_DATASET = "etth1"
dataset_path = "https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTh1.csv"


# Results dir
OUT_DIR = "ttm_finetuned_models/"

# Data processing

In [6]:
# Dataset
TARGET_DATASET = "train"
dataset_path = "./train_test.csv"
timestamp_column = "일시"
id_columns = ['건물번호']  # mention the ids that uniquely identify a time-series.

control_columns = ["기온(°C)","강수량(mm)","풍속(m/s)","습도(%)"]
target_columns = ["전력소비량(kWh)"]
split_config = {
    "train": [0, 1536],          # 인덱스 0부터 1535까지 (총 1536개)
    "valid": [1536, 2040],       # 인덱스 1536부터 2039까지 (총 504개)
    "test": [2040, 2040 + 168],  # 인덱스 2040부터 2207까지 (총 168개)
}
# Understanding the split config -- slides

data = pd.read_csv(
    dataset_path,
    parse_dates=[timestamp_column],
)

# --- 디버깅: ID 1번 데이터의 크기 확인 ---
building1_data = data[data['건물번호'] == 1]
print(f"건물번호 1번의 데이터 개수: {len(building1_data)}개")
# 이 값이 2208이 맞는지 확인해 주세요.

column_specifiers = {
    "timestamp_column": timestamp_column,
    "id_columns": id_columns,
    "target_columns": target_columns,
    "control_columns": control_columns,
}

건물번호 1번의 데이터 개수: 2208개


## Zero-shot evaluation method

In [7]:
column_specifiers

{'timestamp_column': '일시',
 'id_columns': ['건물번호'],
 'target_columns': ['전력소비량(kWh)'],
 'control_columns': ['기온(°C)', '강수량(mm)', '풍속(m/s)', '습도(%)']}

In [8]:
def zeroshot_eval(dataset_name, batch_size, context_length=512, forecast_length=96):
    # Get data
    tsp = TimeSeriesPreprocessor(
        **column_specifiers,
        context_length=context_length,
        prediction_length=forecast_length,
        scaling=True,
        encode_categorical=False,
        scaler_type="standard",
    )
    # Load model
    zeroshot_model = get_model(
        TTM_MODEL_PATH,
        context_length=context_length,
        prediction_length=forecast_length,
        freq_prefix_tuning=False,
        freq=None,
        prefer_l1_loss=False,
        prefer_longer_context=True,
    )
    dset_train, dset_valid, dset_test = get_datasets(
        tsp, data, split_config, use_frequency_token=zeroshot_model.config.resolution_prefix_tuning
    )
    print("dset_train=",dset_train)
    print("dset_valid=",dset_valid)
    print("dset_test=",dset_test)
    temp_dir = tempfile.mkdtemp()
    # zeroshot_trainer
    zeroshot_trainer = Trainer(
        model=zeroshot_model,
        args=TrainingArguments(
            output_dir=temp_dir,
            per_device_eval_batch_size=batch_size,
            seed=SEED,
            report_to="none",
        ),
    )
    # evaluate = zero-shot performance
    print("+" * 20, "Test MSE zero-shot", "+" * 20)
    zeroshot_output = zeroshot_trainer.evaluate(dset_test)
    print("zeroshot_output=", zeroshot_output)
    # get predictions
    predictions_dict = zeroshot_trainer.predict(dset_test)
    predictions_np = predictions_dict.predictions[0]
    print("predictions_np.shape=",predictions_np.shape)
    # get backbone embeddings (if needed for further analysis)
    backbone_embedding = predictions_dict.predictions[1]
    print("backbone_embedding.shape=",backbone_embedding.shape)
    # plot
    """
    plot_predictions(
        model=zeroshot_trainer.model,
        dset=dset_test,
        plot_dir=os.path.join(OUT_DIR, dataset_name),
        plot_prefix="test_zeroshot",
        indices=[685, 118, 902, 1984, 894, 967, 304, 57, 265, 1015],
        channel=0,
    )
    """
    # zeroshot_trainer와 tsp를 함께 반환
    return dset_train, dset_valid, dset_test, predictions_dict, zeroshot_trainer, tsp

# Zeroshot

In [9]:
# 1. 수정된 zeroshot_eval 함수 호출
dset_train, dset_valid, dset_test, predictions_dict, zeroshot_trainer, tsp = zeroshot_eval(
    dataset_name=TARGET_DATASET, context_length=CONTEXT_LENGTH, forecast_length=PREDICTION_LENGTH, batch_size=64
)

INFO:/usr/local/lib/python3.12/dist-packages/tsfm_public/toolkit/get_model.py:Loading model from: ibm-granite/granite-timeseries-ttm-r2


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/13.5M [00:00<?, ?B/s]

INFO:/usr/local/lib/python3.12/dist-packages/tsfm_public/toolkit/get_model.py:Model loaded successfully from ibm-granite/granite-timeseries-ttm-r2, revision = 1536-192-r2.
INFO:/usr/local/lib/python3.12/dist-packages/tsfm_public/toolkit/get_model.py:[TTM] context_length = 1536, prediction_length = 192


dset_train= <tsfm_public.toolkit.dataset.ForecastDFDataset object at 0x7e2be9d9e8a0>
dset_valid= <tsfm_public.toolkit.dataset.ForecastDFDataset object at 0x7e2a62703290>
dset_test= <tsfm_public.toolkit.dataset.ForecastDFDataset object at 0x7e2a61fb4f50>
++++++++++++++++++++ Test MSE zero-shot ++++++++++++++++++++


zeroshot_output= {'eval_loss': 0.3994915783405304, 'eval_model_preparation_time': 0.0034, 'eval_runtime': 1.4257, 'eval_samples_per_second': 70.14, 'eval_steps_per_second': 1.403}
predictions_np.shape= (100, 168, 5)
backbone_embedding.shape= (100, 5, 12, 384)


In [10]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def plot_predictions_from_eval(
    data, dset_test, predictions_dict, split_config, building_id, forecast_length, tsp
):
    """
    여러 빌딩의 데이터를 사용하여 예측 결과를 시각화합니다.
    Args:
        data (pd.DataFrame): 모든 건물 데이터가 합쳐진 원본 DataFrame.
        dset_test (Dataset): zeroshot_eval에서 반환된 테스트 데이터셋.
        predictions_dict (dict): zeroshot_eval에서 반환된 예측 결과.
        split_config (dict): 데이터 분할 설정.
        building_id (int): 시각화할 건물 번호.
        forecast_length (int): 예측 길이.
        tsp (TimeSeriesPreprocessor): 데이터 전처리 객체.
    """
    # 데이터셋의 총 row 개수를 기준으로 각 빌딩의 시작점과 끝점을 계산합니다.
    # 예: 건물번호 1-100, 총 데이터 204000개
    rows_per_building = data.shape[0] // 100 # 총 100개 건물로 가정
    start_index_in_combined = (building_id - 1) * rows_per_building
    end_index_in_combined = building_id * rows_per_building

    # 1. 모델에 입력된 과거 데이터(context) 추출
    start_test_idx_in_building = split_config['test'][0]
    context_length = split_config['valid'][0] - split_config['train'][0]

    historical_df = data.iloc[start_index_in_combined + start_test_idx_in_building - context_length :
                              start_index_in_combined + start_test_idx_in_building].copy()

    # 2. 예측 기간에 해당하는 실제값(정답) 데이터 추출
    end_test_idx_in_building = split_config['test'][1]
    ground_truth_df = data.iloc[start_index_in_combined + start_test_idx_in_building :
                                start_index_in_combined + end_test_idx_in_building].copy()

    # 3. predictions_dict에서 예측 결과 추출
    # predictions_dict의 첫 번째 차원은 빌딩 ID에 해당합니다.
    predictions_for_plot = predictions_dict.predictions[0][building_id - 1, :, 0]

    # --- 언스케일링 과정 ---
    # `target_scaler_dict`의 키는 빌딩 ID입니다.
    target_scaler = tsp.target_scaler_dict[building_id]
    predictions_for_plot = predictions_for_plot.reshape(-1, 1)
    unscaled_predictions = target_scaler.inverse_transform(predictions_for_plot).flatten()

    # 4. 데이터 시각화
    plt.figure(figsize=(15, 6))

    # 과거 데이터와 미래의 실제값을 합쳐서 '실제값' 라인으로 플롯
    full_ground_truth_df = pd.concat([historical_df, ground_truth_df])
    plt.plot(full_ground_truth_df['일시'], full_ground_truth_df['전력소비량(kWh)'], label='실제값', color='blue')

    # 예측값을 플롯
    forecast_timestamps = pd.date_range(start=historical_df['일시'].iloc[-1], periods=forecast_length + 1, freq='H')[1:]
    plt.plot(forecast_timestamps, unscaled_predictions, label='예측값', color='red', linestyle='--')

    # 제목 및 라벨
    plt.title(f'건물 {building_id}의 전력소비량 예측', fontsize=16)
    plt.xlabel('일시', fontsize=12)
    plt.ylabel('전력소비량 (kWh)', fontsize=12)
    plt.grid(True)
    plt.legend()
    plt.tight_layout()
    plt.show()

In [11]:
for i in range(1,101,1):
    plot_predictions_from_eval(
        data=data,
        dset_test=dset_test,
        predictions_dict=predictions_dict,
        split_config=split_config,
        building_id=i,
        forecast_length=PREDICTION_LENGTH,
        tsp=tsp
    )

Output hidden; open in https://colab.research.google.com to view.

In [13]:
import numpy as np

# 각 빌딩의 Descaling된 예측 결과를 담을 빈 리스트를 생성합니다.
all_descaled_predictions = []

# predictions_dict에서 각 빌딩의 예측 결과를 추출하고,
# tsp 객체에서 해당 빌딩의 스케일러를 사용하여 Descaling을 수행합니다.
for building_id in range(1, 101):
    # predictions_dict의 인덱스는 0부터 시작하므로 building_id - 1을 사용합니다.
    predictions_for_building = predictions_dict.predictions[0][building_id - 1, :, 0]

    # Descaling을 위해 해당 빌딩의 스케일러를 가져옵니다.
    target_scaler = tsp.target_scaler_dict[building_id]

    # 예측 배열의 차원을 변환 (reshape)하여 스케일러에 전달할 수 있도록 준비합니다.
    predictions_for_plot = predictions_for_building.reshape(-1, 1)

    # Descaling을 수행합니다.
    descaled_predictions = target_scaler.inverse_transform(predictions_for_plot).flatten()

    # Descaling된 예측 결과를 리스트에 추가합니다.
    all_descaled_predictions.append(descaled_predictions)

# 리스트에 담긴 모든 Descaling된 예측 결과를 하나의 NumPy 배열로 합칩니다.
combined_descaled_predictions = np.concatenate(all_descaled_predictions)

# 결과 확인
print(f"Descaling 후 합쳐진 예측 배열의 크기: {len(combined_descaled_predictions)}개")
print("Descaling 후 합쳐진 예측 배열의 첫 10개 값:", combined_descaled_predictions[:10])

Descaling 후 합쳐진 예측 배열의 크기: 16800개
Descaling 후 합쳐진 예측 배열의 첫 10개 값: [4522.0815 4263.15   4012.0562 3789.3796 3689.147  3880.9846 4122.3584
 4662.274  5330.029  5959.803 ]


In [14]:
sample = pd.read_csv('./sample_submission.csv')

In [15]:
sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16800 entries, 0 to 16799
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   num_date_time  16800 non-null  object
 1   answer         16800 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 262.6+ KB


In [16]:
# 위 DataFrame이 'submission_df'라는 변수에 저장되어 있다고 가정합니다.
# 'combined_descaled_predictions' 배열은 이미 준비되어 있습니다.
sample['answer'] = combined_descaled_predictions

# 변경사항을 확인합니다.
print(sample.head())
print(sample.tail())

   num_date_time       answer
0  1_20240825 00  4522.081543
1  1_20240825 01  4263.149902
2  1_20240825 02  4012.056152
3  1_20240825 03  3789.379639
4  1_20240825 04  3689.146973
         num_date_time       answer
16795  100_20240831 19  2525.792725
16796  100_20240831 20  2423.616943
16797  100_20240831 21  2302.714355
16798  100_20240831 22  2546.547119
16799  100_20240831 23  2793.781250


In [18]:
sample.to_csv('./baseline_submission_20250820_01.csv', index=False)