# Introduction to **TabPFN** and **TabICL**

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-datascience-datacamp/datacamp-master/blob/main/11_tabular_foundational_models/01-tabpfn-tabicl.ipynb)

Author: [Pedro L. C. Rodrigues](https://plcrodrigues.github.io) and [Thomas Moreau](https://tommoral.github.io)

- **TabPFN** : Hollman et al. "Accurate predictions on small data with a tabular foundation model" (2025) [[link](https://www.nature.com/articles/s41586-024-08328-6)]
- **TabICL** : Qu et al. "TabICL: A Tabular Foundation Model for In-Context Learning on Large Data" (2025) [[link](https://arxiv.org/abs/2502.05564)]

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

%matplotlib inline

The Python implementation of TabPFN is developed by people from [Prior Labs](https://docs.priorlabs.ai/overview) and follows the same API from `scikit-learn`.

Note, however, that to use the last version of the TabPFN's foundational model, you will need to authenticate at HuggingFace, which can be a bit messy. Because of this, we will be focusing on TabPFN-V2, which should be more than enough.

⚡ GPU Recommended: For optimal performance, use a GPU (even older ones with ~8GB VRAM work well; 16GB needed for some large datasets). On CPU, only small datasets (≲1000 samples) are feasible. Note that **this notebook can be run on Codalab with the top button**.

First of all, you will need to install the package by as below

In [2]:
!pip install -U tabpfn

Collecting tabpfn
  Downloading tabpfn-6.1.0-py3-none-any.whl.metadata (36 kB)
Collecting eval-type-backport>=0.2.2 (from tabpfn)
  Downloading eval_type_backport-0.3.1-py3-none-any.whl.metadata (2.4 kB)
Collecting tabpfn-common-utils>=0.2.8 (from tabpfn-common-utils[telemetry-interactive]>=0.2.8->tabpfn)
  Downloading tabpfn_common_utils-0.2.12-py3-none-any.whl.metadata (5.4 kB)
Collecting posthog~=6.7 (from tabpfn-common-utils>=0.2.8->tabpfn-common-utils[telemetry-interactive]>=0.2.8->tabpfn)
  Downloading posthog-6.9.3-py3-none-any.whl.metadata (6.0 kB)
Collecting requests (from huggingface-hub<2,>=0.19.0->tabpfn)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting backoff>=1.10.0 (from posthog~=6.7->tabpfn-common-utils>=0.2.8->tabpfn-common-utils[telemetry-interactive]>=0.2.8->tabpfn)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Downloading tabpfn-6.1.0-py3-none-any.whl (553 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5

# Regression with TabPFN

We investigate how TabPFN can be used for regression and compare his performance versus other classic regressors.

In [3]:
from sklearn.utils import shuffle
from sklearn.model_selection import KFold, cross_val_score
from sklearn.datasets import fetch_california_housing, load_diabetes

from tabpfn import TabPFNRegressor
from tabpfn.constants import ModelVersion

import pandas as pd, requests

# Loading the Boston dataset
cols = ["CRIM","ZN","INDUS","CHAS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT","MEDV"]
df_boston = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data",
                        sep='\\s+', header=None, names=cols)

print(df_boston.shape)
df_boston.head()

(506, 14)


Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


In [6]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [7]:
# Convert to pure numpy arrays
X, y = df_boston.drop(columns=["MEDV"]).values, df_boston["MEDV"].values

# Choose cross-validation strategy
cv = KFold(shuffle=True, n_splits=5)

# Instantiate TabPFN for regression
regressor_tabpfn = TabPFNRegressor()
#regressor_tabpfn = TabPFNRegressor.create_default_for_version(ModelVersion.V2)
regressor_tabpfn.n_estimators = 1

scores = []
for idx_train, idx_test in tqdm(cv.split(X, y)):
    X_train, y_train = X[idx_train], y[idx_train]
    X_test, y_test = X[idx_test], y[idx_test]
    regressor_tabpfn.fit(X_train, y_train)
    scores.append(regressor_tabpfn.score(X_test, y_test))
print(np.mean(scores))

0it [00:00, ?it/s]

tabpfn-v2.5-regressor-v2.5_default.ckpt:   0%|          | 0.00/40.8M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

5it [00:03,  1.40it/s]

0.9074041393406524





<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
         <li>What is happening in the <span><code>fit</code></span> method?</li>
     </ul>
</div>

Let's see now how a `RandomForestRegressor` and the `HistGradientRegressor` behave

In [8]:
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor
regressor_rf = RandomForestRegressor()
regressor_hgbr = HistGradientBoostingRegressor()
est_dict = {'rf':regressor_rf, 'hgbr':regressor_hgbr}
for key, value in est_dict.items():
    scores = cross_val_score(value, X, y, cv=cv)
    print(key, np.mean(scores))

rf 0.8542893363591315
hgbr 0.8589642198354154


We see that TabPFN beats both baseslines by quite a margin. However, it took much more time...

Let's consider now a different dataset.

In [9]:
from sklearn.datasets import fetch_california_housing
df_california, targets = fetch_california_housing(return_X_y=True, as_frame=True)
print(df_california.shape)
df_california.head()

(20640, 8)


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


The dataset is bigger than the previous one, so let's see how TabPFN behaves.

In [10]:
from sklearn.model_selection import train_test_split # let's avoid cross-val for time sake
X, y = df_california.values, targets.values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
regressor_tabpfn.fit(X_train, y_train)
print(regressor_tabpfn.score(X_test, y_test))

0.874954307823786


<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
         <li>Why do you think we exploded in memory?</li>
     </ul>
</div>

One possible trick is to subsample the dataset and use an ensemble of TabPFN regressors as below.

In [None]:
regressor_tabpfn.ignore_pretraining_limits = True
regressor_tabpfn.n_estimators = 1
regressor_tabpfn.inference_config = {"SUBSAMPLE_SAMPLES": 500}
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
regressor_tabpfn.fit(X_train, y_train)
print(regressor_tabpfn.score(X_test, y_test))

Or even better, use a GPU :-)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-datascience-datacamp/datacamp-master/blob/main/11_tabular_foundational_models/01-tabpfn-tabicl.ipynb)


In [11]:
regressor_rf = RandomForestRegressor()
regressor_hgbr = HistGradientBoostingRegressor()
est_dict = {'rf':regressor_rf, 'hgbr':regressor_hgbr}
for key, est in est_dict.items():
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    est.fit(X_train, y_train)
    score = est.score(X_test, y_test)
    print(key, score)

rf 0.8050664968727209
hgbr 0.8374422154968123


In the slides, we saw that **TabICL** can, in principle, scale to any number of samples, due to the way that rows and columns are embedded in its architecture. So should we try to use it?

In [12]:
!pip install -U tabicl # watch out for the python version!

Collecting tabicl
  Downloading tabicl-0.1.4-py3-none-any.whl.metadata (13 kB)
Downloading tabicl-0.1.4-py3-none-any.whl (107 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.8/107.8 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tabicl
Successfully installed tabicl-0.1.4


Checking the **TabICL** [documentation](https://github.com/soda-inria/tabicl) we notice that it currently does not work for regression... :'(

At least for now... ;)

<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
         <li>Why can TabPFN do regression out-of-the-box whereas TabICL not?</li>
     </ul>
</div>

# Classification with TabPFN and TabICL

Let's switch to a classifcation problem first with a small then with a big dataset.

In [13]:
from sklearn.datasets import fetch_openml
df, target = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)
X, y = df.values, target.values
df.head()

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2.0,,"St Louis, MO"
1,1,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.55,C22 C26,S,11.0,,"Montreal, PQ / Chesterville, ON"
2,1,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"


We saw in previous classes that we can not simply plug the Titanic dataset into standard scikit-learn estimators. First, it is necessary to pre-process the data, encode categorical features, etc. But what happens in TabPFN ?

In [14]:
from sklearn.model_selection import StratifiedKFold
cv = StratifiedKFold(shuffle=True, n_splits=5)

# Instantiate TabPFN for classification
from tabpfn import TabPFNClassifier
clf_tabpfn = TabPFNClassifier.create_default_for_version(ModelVersion.V2)
clf_tabpfn.n_estimators = 1

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf_tabpfn.fit(X_train, y_train)
print(clf_tabpfn.score(X_test, y_test))

tabpfn-v2-classifier-finetuned-zk73skhh.(…):   0%|          | 0.00/29.0M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/37.0 [00:00<?, ?B/s]

0.9699074074074074


What about TabICL ?

In [15]:
from tabicl import TabICLClassifier
clf_icl = TabICLClassifier(n_estimators=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf_icl.fit(X_train, y_train)
print(clf_icl.score(X_test, y_test))

INFO: You are downloading 'tabicl-classifier-v1.1-0506.ckpt', the latest best-performing version of TabICL.
To reproduce results from the original paper, please use 'tabicl-classifier-v1-0208.ckpt'.

Checkpoint 'tabicl-classifier-v1.1-0506.ckpt' not cached.
 Downloading from Hugging Face Hub (jingang/TabICL-clf).



tabicl-classifier-v1.1-0506.ckpt:   0%|          | 0.00/108M [00:00<?, ?B/s]

ValueError: could not convert string to float: 'Collett, Mr. Sidney C Stuart'

The documention can help us : https://github.com/soda-inria/tabicl?tab=readme-ov-file#basic-integration

In [17]:
!pip install -U skrub

Collecting skrub
  Downloading skrub-0.7.0-py3-none-any.whl.metadata (4.4 kB)
Downloading skrub-0.7.0-py3-none-any.whl (498 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m498.3/498.3 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: skrub
Successfully installed skrub-0.7.0


In [30]:
from skrub import TableVectorizer
from tabicl import TabICLClassifier
from sklearn.pipeline import make_pipeline

pipeline = make_pipeline(
    TableVectorizer(),  # automatically handles various data types
    TabICLClassifier(n_estimators=8)  # beware of the default parameters!
)

X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.33, random_state=42) # note that we pass the dataframe!
pipeline.fit(X_train, y_train)
print(pipeline.score(X_test, y_test))

1.0


<div class="alert alert-success">
    <b>EXERCISE</b>:
     <ul>
         <li>Why can TabPFN preprocess categorical features directly and TabICL needs a pipeline?</li>
     </ul>
</div>

In [31]:
from skrub import tabular_pipeline
est = tabular_pipeline('classifier')
X_train, X_test, y_train, y_test = train_test_split(df, target, test_size=0.33, random_state=42) # note that we pass the dataframe!
est.fit(X_train, y_train)
print(est.score(X_test, y_test))

1.0


Let's consider now a larger dataset and see how **TabICL** behaves.

In [20]:
import pandas as pd
from pathlib import Path
from urllib.request import urlretrieve

DATA_DIR = Path().parent / "data"

url = ("https://archive.ics.uci.edu/ml/machine-learning-databases"
       "/adult/adult.data")

local_filename =  DATA_DIR/ Path(url).name
if not local_filename.exists():
    print("Downloading Adult Census datasets from UCI")
    DATA_DIR.mkdir(exist_ok=True)
    urlretrieve(url, local_filename)

names = ("age, workclass, fnlwgt, education, education-num, "
         "marital-status, occupation, relationship, race, sex, "
         "capital-gain, capital-loss, hours-per-week, "
         "native-country, income").split(', ')
df = pd.read_csv(local_filename, names=names)
df = df.rename(columns={'income': 'class'})

columns_to_plot = [
    "age",
    "education-num",
    "capital-loss",
    "capital-gain",
    "hours-per-week",
    "class",
]
df = df[columns_to_plot]
print(df.shape)
df.head()

Downloading Adult Census datasets from UCI
(32561, 6)


Unnamed: 0,age,education-num,capital-loss,capital-gain,hours-per-week,class
0,39,13,0,2174,40,<=50K
1,50,13,0,0,13,<=50K
2,38,9,0,0,40,<=50K
3,53,7,0,0,40,<=50K
4,28,13,0,0,40,<=50K


In [21]:
target_name = "class"
target = df[target_name]
X, y = df.iloc[:,:-1].values, target.values
y = (y == ' <=50K').astype(np.int8)

In [22]:
clf_icl = TabICLClassifier(n_estimators=4)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf_icl.fit(X_train, y_train)
print(clf_icl.score(X_test, y_test))

0.8214219244369998


In [23]:
from sklearn.ensemble import HistGradientBoostingClassifier

clf_hgbr = HistGradientBoostingClassifier()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf_hgbr.fit(X_train, y_train)
print(clf_hgbr.score(X_test, y_test))

0.8397543271915131


In [24]:
!pip install uv
!uv pip install autogluon.tabular[mitra]


Collecting uv
  Downloading uv-0.9.18-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.9.18-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (22.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m22.2/22.2 MB[0m [31m68.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: uv
Successfully installed uv-0.9.18
[2mUsing Python 3.12.12 environment at: /usr[0m
[2K[2mResolved [1m71 packages[0m [2min 601ms[0m[0m
[2K[2mPrepared [1m15 packages[0m [2min 30.30s[0m[0m
[2mUninstalled [1m5 packages[0m [2min 1.20s[0m[0m
[2K[2mInstalled [1m15 packages[0m [2min 545ms[0m[0m
 [32m+[39m [1mautogluon-common[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-core[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-features[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-tabular[0m[2m==1.4.0[0m
 [32m+[39m [1mboto3[0m[2m==1.42.12[0m
 [32m+[39m [1mbotocore[0m[2m==1.42.12[0m
 [32m+[39m [1meinx[

In [29]:
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

# Load datasets
california_data = fetch_california_housing(as_frame=True)
california_df = california_data.frame
california_df['MedHouseVal'] = california_data.target

print("Dataset shapes:")
print(f"California Housing: {california_df.shape}")

# Create train/test splits (80/20)
california_train, california_test = train_test_split(california_df, test_size=0.2, random_state=42)

print("Training set sizes:")
print(f"California Housing: {len(california_train)} samples")

# Convert to TabularDataset
california_train_data = TabularDataset(california_train)
california_test_data = TabularDataset(california_test)

# Create predictor with Mitra
print("Training Mitra regressor on California housing dataset...")
mitra_predictor = TabularPredictor(label='MedHouseVal', problem_type='regression')
mitra_predictor.fit(
    california_train_data,
    hyperparameters={
        'MITRA': {'fine_tune': False}
    },
   )

print("\nMitra training completed!")

# Make predictions
mitra_predictions = mitra_predictor.predict(california_test_data)
print("Sample Mitra predictions:")
print(mitra_predictions.head(10))

# Show model leaderboard
print("\nMitra Model Leaderboard:")
mitra_predictor.leaderboard(california_test_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20251218_090244"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          2
Memory Avail:       8.94 GB / 12.67 GB (70.5%)
Disk Space Avail:   69.27 GB / 112.64 GB (61.5%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='extreme' : New in v1.4: Massively better than 'best' on datasets <30000 samples by using new models meta-learned on https://tabarena.ai: TabPFNv2, TabICL, Mitra, and TabM. Absolute best accuracy. Requires a GPU. Recommended 64 GB CPU memory and 32+ GB GPU memory.
	presets='best'    : Maximize accuracy. Recommended for most us

Dataset shapes:
California Housing: (20640, 9)
Training set sizes:
California Housing: 16512 samples
Training Mitra regressor on California housing dataset...


Beginning AutoGluon training ...
AutoGluon will save models to "/content/AutogluonModels/ag-20251218_090244"
Train Data Rows:    16512
Train Data Columns: 8
Label Column:       MedHouseVal
Problem Type:       regression
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    9159.24 MB
	Train Data (Original)  Memory Usage: 1.01 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Stage 5 Generators:
		Fitting DropDuplicatesFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 8 | ['MedInc', 'HouseA

RuntimeError: No models were trained successfully during fit(). Inspect the log output or increase verbosity to determine why no models were fit. Alternatively, set `raise_on_no_models_fitted` to False during the fit call.