# Fine-tuning TabPFN on the Covertype Dataset

This notebook demonstrates an example of how to fine-tune a TabPFNClassifier on the Covertype dataset. The process involves preparing the data, setting up the model and optimizer, and then running a training loop that alternates between fine-tuning the model on batches of data and evaluating its performance.

The original script can be encapsulated into a scikit-learn compatible classifier structure like the one below, which provides a clear fit/predict interface.


In [None]:
# Install Baselines for model comparison
!uv pip install catboost xgboost

# Install the datasets library for loading example data
!uv pip install datasets

# Install rich for better and more readable printing
!uv pip install rich

# Install the TabPFN Client and library
!uv pip install tabpfn-client
!git clone https://github.com/PriorLabs/tabpfn
!uv pip install -e tabpfn

# Install TabPFN extensions for additional functionalities
!git clone https://github.com/PriorLabs/tabpfn-extensions
!uv pip install -e tabpfn-extensions[all]

In [1]:
from functools import partial

import numpy as np
import sklearn.datasets
import torch
from sklearn.metrics import log_loss, roc_auc_score
from sklearn.model_selection import train_test_split
from torch.optim import Adam, Optimizer
from torch.utils.data import DataLoader
from tqdm import tqdm

from tabpfn import TabPFNClassifier
from tabpfn.finetune_utils import clone_model_for_evaluation
from tabpfn.utils import meta_dataset_collator

### 2\. Data Preparation

We'll start by defining a function to load the Covertype dataset from `sklearn.datasets`. We will then subset it to a manageable size and split it into training and testing sets.

In [2]:
def prepare_data(config: dict) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    """Loads, subsets, and splits the Covertype dataset."""
    print("--- 1. Data Preparation ---")
    X_all, y_all = sklearn.datasets.fetch_covtype(return_X_y=True, shuffle=True)

    rng = np.random.default_rng(config["random_seed"])
    num_samples_to_use = min(config["num_samples_to_use"], len(y_all))
    indices = rng.choice(np.arange(len(y_all)), size=num_samples_to_use, replace=False)
    X, y = X_all[indices], y_all[indices]

    splitter = partial(
        train_test_split,
        test_size=config["test_set_ratio"],
        random_state=config["random_seed"],
    )
    X_train, X_test, y_train, y_test = splitter(X, y, stratify=y)

    print(
        f"Loaded and split data: {X_train.shape[0]} train, {X_test.shape[0]} test samples."
    )
    print("---------------------------\n")
    return X_train, X_test, y_train, y_test

### 3\. Model and Optimizer Setup

Next, we'll set up the `TabPFNClassifier`. We initialize it with a configuration suitable for fine-tuning, including ignoring pre-training size limits. The optimizer is an `Adam` optimizer, configured with a specific learning rate for the fine-tuning process.

In [3]:
def setup_model_and_optimizer(config: dict) -> tuple[TabPFNClassifier, Optimizer, dict]:
    """Initializes the TabPFN classifier, optimizer, and training configs."""
    print("--- 2. Model and Optimizer Setup ---")
    classifier_config = {
        "ignore_pretraining_limits": True,
        "device": config["device"],
        "n_estimators": 2,
        "random_state": config["random_seed"],
        "inference_precision": torch.float32,
    }
    classifier = TabPFNClassifier(
        **classifier_config, fit_mode="batched", differentiable_input=False
    )
    classifier._initialize_model_variables()
    # Optimizer uses finetuning-specific learning rate
    optimizer = Adam(
        classifier.model_.parameters(), lr=config["finetuning"]["learning_rate"]
    )

    print(f"Using device: {config['device']}")
    print(f"Optimizer: Adam, Finetuning LR: {config['finetuning']['learning_rate']}")
    print("----------------------------------\n")
    return classifier, optimizer, classifier_config

### 4\. Evaluation Function

To monitor our progress, we need a function to evaluate the model. This function clones the current state of the fine-tuning classifier, fits it on the training data, and evaluates its performance (ROC AUC and Log Loss) on the held-out test set. This ensures that our evaluation metric is unbiased.

In [4]:
def evaluate_model(
    classifier: TabPFNClassifier,
    eval_config: dict,
    X_train: np.ndarray,
    y_train: np.ndarray,
    X_test: np.ndarray,
    y_test: np.ndarray,
) -> tuple[float, float]:
    """Evaluates the model's performance on the test set."""
    eval_classifier = clone_model_for_evaluation(
        classifier, eval_config, TabPFNClassifier
    )
    eval_classifier.fit(X_train, y_train)

    try:
        probabilities = eval_classifier.predict_proba(X_test)
        roc_auc = roc_auc_score(
            y_test, probabilities, multi_class="ovr", average="weighted"
        )
        log_loss_score = log_loss(y_test, probabilities)
    except Exception as e:
        print(f"An error occurred during evaluation: {e}")
        roc_auc, log_loss_score = np.nan, np.nan

    return roc_auc, log_loss_score

5. Main Fine-tuning Workflow
Now we bring everything together.

Configuration: We define a master config dictionary that holds all hyperparameters and settings for the data, model, and fine-tuning process.

Initialization: We call our helper functions to prepare the data and initialize the model and optimizer.

Data Loader: We create preprocessed datasets and a DataLoader to efficiently feed batches of data to the model during the fine-tuning loop.

Training Loop: We loop for a specified number of epochs. In each epoch, we train the model on meta-batches from our dataloader. We evaluate the model's performance on the test set before fine-tuning begins (Epoch 0) and after each subsequent epoch.



In [None]:
# --- Master Configuration ---
config = {
    "device": "cuda" if torch.cuda.is_available() else "cpu",
    "num_samples_to_use": 100_000,
    "random_seed": 42,
    "test_set_ratio": 0.3,
    "n_inference_context_samples": 10000,
}
config["finetuning"] = {
    "epochs": 10,
    "learning_rate": 1e-5,
    "meta_batch_size": 1,
    "batch_size": int(
        min(
            config["n_inference_context_samples"],
            config["num_samples_to_use"] * (1 - config["test_set_ratio"]),
        )
    ),
}

# --- Setup Data, Model, and Dataloader ---
X_train, X_test, y_train, y_test = prepare_data(config)
classifier, optimizer, classifier_config = setup_model_and_optimizer(config)

splitter = partial(train_test_split, test_size=config["test_set_ratio"])
training_datasets = classifier.get_preprocessed_datasets(
    X_train, y_train, splitter, config["finetuning"]["batch_size"]
)
finetuning_dataloader = DataLoader(
    training_datasets,
    batch_size=config["finetuning"]["meta_batch_size"],
    collate_fn=meta_dataset_collator,
)
loss_function = torch.nn.NLLLoss()

eval_config = {
    **classifier_config,
    "inference_config": {"SUBSAMPLE_SAMPLES": config["n_inference_context_samples"]},
}

# --- Finetuning and Evaluation Loop ---
print("--- 3. Starting Finetuning & Evaluation ---")
for epoch in range(config["finetuning"]["epochs"] + 1):
    if epoch > 0:
        # Finetuning Step
        progress_bar = tqdm(finetuning_dataloader, desc=f"Finetuning Epoch {epoch}")
        for (
            X_train_batch,
            X_test_batch,
            y_train_batch,
            y_test_batch,
            cat_ixs,
            confs,
        ) in progress_bar:
            if len(np.unique(y_train_batch)) != len(np.unique(y_test_batch)):
                continue  # Skip batch if splits don't have all classes

            optimizer.zero_grad()
            classifier.fit_from_preprocessed(
                X_train_batch, y_train_batch, cat_ixs, confs
            )
            predictions = classifier.forward(X_test_batch)
            loss = loss_function(
                torch.log(predictions), y_test_batch.to(config["device"])
            )
            loss.backward()
            optimizer.step()

            # Set the postfix of the progress bar to show the current loss
            progress_bar.set_postfix(loss=f"{loss.item():.4f}")

    # Evaluation Step (runs before finetuning and after each epoch)
    epoch_roc, epoch_log_loss = evaluate_model(
        classifier, eval_config, X_train, y_train, X_test, y_test
    )

    status = "Initial" if epoch == 0 else f"Epoch {epoch}"
    print(
        f"📊 {status} Evaluation | Test ROC: {epoch_roc:.4f}, Test Log Loss: {epoch_log_loss:.4f}\n"
    )

print("--- ✅ Finetuning Finished ---")

--- 1. Data Preparation ---
Loaded and split data: 70000 train, 30000 test samples.
---------------------------

--- 2. Model and Optimizer Setup ---


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


(…)fn-v2-classifier-finetuned-zk73skhh.ckpt:   0%|          | 0.00/29.0M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/37.0 [00:00<?, ?B/s]

Using device: cuda
Optimizer: Adam, Finetuning LR: 1e-05
----------------------------------

--- 3. Starting Finetuning & Evaluation ---
📊 Initial Evaluation | Test ROC: 0.9620, Test Log Loss: 0.3656



Finetuning Epoch 1: 100%|██████████| 7/7 [02:25<00:00, 20.82s/it, loss=0.3926]


📊 Epoch 1 Evaluation | Test ROC: 0.9686, Test Log Loss: 0.3288



Finetuning Epoch 2: 100%|██████████| 7/7 [02:24<00:00, 20.67s/it, loss=0.4172]


📊 Epoch 2 Evaluation | Test ROC: 0.9684, Test Log Loss: 0.3338



Finetuning Epoch 3: 100%|██████████| 7/7 [02:24<00:00, 20.68s/it, loss=0.3823]


📊 Epoch 3 Evaluation | Test ROC: 0.9693, Test Log Loss: 0.3273



Finetuning Epoch 4: 100%|██████████| 7/7 [02:24<00:00, 20.62s/it, loss=0.3708]


📊 Epoch 4 Evaluation | Test ROC: 0.9705, Test Log Loss: 0.3182



Finetuning Epoch 5: 100%|██████████| 7/7 [02:24<00:00, 20.68s/it, loss=0.3953]


📊 Epoch 5 Evaluation | Test ROC: 0.9703, Test Log Loss: 0.3189



Finetuning Epoch 6:  29%|██▊       | 2/7 [00:41<01:43, 20.67s/it, loss=0.3535]