# Model Stacking in PyTorch Tabular

This page demonstrates how to use model stacking functionality in PyTorch Tabular to combine multiple models for better predictions.



## Setup and Imports

In [1]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pytorch_tabular import TabularModel
from pytorch_tabular.models import (
CategoryEmbeddingModelConfig,
FTTransformerConfig,
TabNetModelConfig
)
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig
from pytorch_tabular.models.stacking import StackingModelConfig
from pytorch_tabular.utils import make_mixed_dataset

## Create synthetic classification dataset & split into train, validation and test sets

In [2]:
data, cat_col_names, num_col_names = make_mixed_dataset(
    task="classification", n_samples=3000, n_features=7, n_categories=4
)

train, test = train_test_split(data, random_state=42)
train, valid = train_test_split(train, random_state=42)

## Common configurations

In [14]:
data_config = DataConfig(
    target=["target"],
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    batch_size=1024,
    max_epochs=20,
    early_stopping="valid_accuracy",
    early_stopping_mode="max",
    early_stopping_patience=3,
    checkpoints="valid_accuracy",
    load_best=True,
)
optimizer_config = OptimizerConfig()

## Configure individual models

In [15]:
model_config_1 = CategoryEmbeddingModelConfig(
    task="classification",
    layers="128-64-32",
    activation="ReLU",
    learning_rate=1e-3
)
model_config_2 = FTTransformerConfig(
    task="classification",
    input_embed_dim=32,
    num_attn_blocks=2,
    num_heads=4,
    learning_rate=1e-3
)
model_config_3 = TabNetModelConfig(
    task="classification",
    n_d=8,
    n_a=8,
    n_steps=3,
    learning_rate=1e-3
)


## Configure Stacking Model

Now let's set up the stacking configuration that will combine these models:

In [28]:
stacking_config = StackingModelConfig(
    task="classification",
    model_configs=[
        model_config_1,
        model_config_2,
        model_config_3
    ],
    head="LinearHead",
    head_config={
        "layers": "64",
        "activation": "ReLU",
        "dropout": 0.1
    },
    learning_rate=1e-3
)


## Train Stacking Model

In [29]:
stacking_model = TabularModel(
    data_config=data_config,
    model_config=stacking_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)
stacking_model.fit(
    train=train,
    validation=valid
)

Seed set to 42


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7fb1a508d420>

## Evaluate Results

In [36]:
predictions = stacking_model.predict(test)
stacking_metrics = stacking_model.evaluate(test)[0]
stacking_acc = stacking_metrics["test_accuracy"]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()

## Compare with individual models

In [31]:
def train_and_evaluate_model(model_config, name):
    model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
    )
    model.fit(train=train, validation=valid)
    metrics = model.evaluate(test)
    print(f"\n{name} Metrics:")
    print(metrics)
    return metrics

In [35]:
ce_metrics = train_and_evaluate_model(model_config_1, "Category Embedding")[0]
ft_metrics = train_and_evaluate_model(model_config_2, "FT Transformer")[0]
tab_metrics = train_and_evaluate_model(model_config_3, "TabNet")[0]
ce_acc = ce_metrics["test_accuracy"]
ft_acc = ft_metrics["test_accuracy"]
tab_acc = tab_metrics["test_accuracy"]

Seed set to 42


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()

`Trainer.fit` stopped: `max_epochs=20` reached.


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()


Category Embedding Metrics:
[{'test_loss_0': 0.8828091025352478, 'test_loss': 0.8828091025352478, 'test_accuracy': 0.4586666524410248}]


Seed set to 42


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()


FT Transformer Metrics:
[{'test_loss_0': 0.6846821904182434, 'test_loss': 0.6846821904182434, 'test_accuracy': 0.5546666383743286}]


Seed set to 42


GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Output()


TabNet Metrics:
[{'test_loss_0': 1.1570961475372314, 'test_loss': 1.1570961475372314, 'test_accuracy': 0.4346666634082794}]


In [37]:
print("Stacking Model Test Accuracy: {}".format(stacking_acc))
print("Category Embedding Model Test Accucacy: {}".format(ce_acc))
print("FT Transformer Model Test Accuracy: {}".format(ft_acc))
print("TabNet Model Test Accuracy: {}".format(tab_acc))

Stacking Model Test Accuracy: 0.5960000157356262
Category Embedding Model Test Accucacy: 0.4586666524410248
FT Transformer Model Test Accuracy: 0.5546666383743286
TabNet Model Test Accuracy: 0.4346666634082794


## Save the stacking model & load it

In [22]:
stacking_model.save_model("stacking_model")

In [23]:
loaded_model = TabularModel.load_model("stacking_model")

Trainer already configured with model summary callbacks: [<class 'pytorch_lightning.callbacks.rich_model_summary.RichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs



## Key Points About Stacking



1. The stacking model combines predictions from multiple base models into a final prediction
2. Each base model can have its own architecture and hyperparameters
3. The head layer combines the outputs from all base models
4. Base models are trained simultaneously
5. The stacking model can often achieve better performance than individual models

## Tips for Better Stacking Results

1. Use diverse base models that capture different aspects of the data
2. Experiment with different head architectures
3. Consider using cross-validation for more robust stacking
4. Balance model complexity with training time
5. Monitor individual model performances to ensure they contribute meaningfully

This example demonstrates basic stacking functionality. For production use cases, you may want to:
- Use cross-validation
- Implement more sophisticated ensemble techniques
- Add custom metrics
- Tune hyperparameters for both base models and stacking head