Experiment tracking is essential in machine learning because it enables data scientists and researchers to effectively manage and reproduce their experiments. By tracking various aspects of an experiment, such as hyperparameters, model architecture, and training data, it becomes easier to understand and interpret the results. Experiment tracking also allows for better collaboration and knowledge sharing among team members, as it provides a centralized repository of experiments and their associated metadata. Additionally, tracking experiments helps in debugging and troubleshooting, as it allows for the identification of specific settings or conditions that led to successful or unsuccessful outcomes. Overall, experiment tracking plays a crucial role in ensuring transparency, reproducibility, and continuous improvement in machine learning workflows.

Now let's see how we can get all these benefits for free with PyTorch Tabular using Weights & Biases.

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import random
from pytorch_tabular.utils import load_covertype_dataset, print_metrics
import pandas as pd
import wandb

# %load_ext autoreload
# %autoreload 2

In [2]:
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mmanujosephv[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [3]:
data, cat_col_names, num_col_names, target_col = load_covertype_dataset()
train, test = train_test_split(data, random_state=42)
train, val = train_test_split(train, random_state=42)

# Importing the Library

In [4]:
from pytorch_tabular import TabularModel
from pytorch_tabular.models import (
    CategoryEmbeddingModelConfig,
    FTTransformerConfig,
    TabNetModelConfig,
    GANDALFConfig,
)
from pytorch_tabular.config import (
    DataConfig,
    OptimizerConfig,
    TrainerConfig,
    ExperimentConfig,
)
from pytorch_tabular.models.common.heads import LinearHeadConfig

## Common Configs

In [5]:
data_config = DataConfig(
    target=[
        target_col
    ],  # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True,  # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=100,
    early_stopping="valid_loss",  # Monitor valid_loss for early stopping
    early_stopping_mode="min",  # Set the mode as min because for val_loss, lower is better
    early_stopping_patience=5,  # No. of epochs of degradation training will wait before terminating
    checkpoints="valid_loss",  # Save best checkpoint monitoring val_loss
    load_best=True,  # After training, load the best checkpoint
)
optimizer_config = OptimizerConfig()

head_config = LinearHeadConfig(
    layers="",  # No additional layer in head, just a mapping layer to output_dim
    dropout=0.1,
    initialization="kaiming",
).__dict__  # Convert to dict to pass to the model config (OmegaConf doesn't accept objects)

EXP_PROJECT_NAME = "pytorch-tabular-covertype"


## Category Embedding Model

In [6]:
model_config = CategoryEmbeddingModelConfig(
    task="classification",
    layers="1024-512-512",  # Number of nodes in each layer
    activation="LeakyReLU",  # Activation between each layers
    learning_rate=1e-3,
    head="LinearHead",  # Linear Head
    head_config=head_config,  # Linear Head Config
)

experiment_config = ExperimentConfig(
    project_name=EXP_PROJECT_NAME,
    run_name="CategoryEmbeddingModel",
    exp_watch="gradients",
    log_target="wandb",
    log_logits=True,
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
    experiment_config=experiment_config,
    verbose=False,
    suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)



VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112782611356427, max=1.0…

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7f14867a4850>

In [7]:
result = tabular_model.evaluate(test)

Output()

/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


In [8]:
# Although the experiment should finish automatically, it's safer to call it explicitly before running a new experiment
wandb.finish()



VBox(children=(Label(value='0.005 MB of 0.005 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇███
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▄▄▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇███████████████
train_loss,█▇▆▅▅▅▄▄▃▄▃▄▄▃▂▃▃▃▃▃▂▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁
trainer/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
valid_accuracy,▁▂▃▃▄▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇█▇██▇███████▇██▇
valid_loss,█▇▆▆▅▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▂▁▁▂

0,1
epoch,52.0
test_accuracy,0.91597
test_loss,0.2139
train_accuracy,0.91938
train_loss,0.20502
trainer/global_step,16640.0
valid_accuracy,0.89782
valid_loss,0.24614


## FT Transformer

In [9]:
model_config = FTTransformerConfig(
    task="classification",
    num_attn_blocks=3,
    num_heads=4,
    learning_rate=1e-3,
    head="LinearHead",  # Linear Head
    head_config=head_config,  # Linear Head Config
)

experiment_config = ExperimentConfig(
    project_name=EXP_PROJECT_NAME,
    run_name="FTTransformer",
    exp_watch="gradients",
    log_target="wandb",
    log_logits=True,
)
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
    experiment_config=experiment_config,
    verbose=False,
    suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)



VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112755477531917, max=1.0…

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7f1486720910>

In [10]:
result = tabular_model.evaluate(test)

Output()

/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


In [11]:
wandb.finish()

VBox(children=(Label(value='0.010 MB of 0.010 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))



0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
test_accuracy,▁
test_loss,▁
train_accuracy,▁▃▄▄▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇██████████████
train_loss,█▇▆▅▆▄▄▄▄▄▃▄▃▃▂▃▃▃▂▂▂▁▃▃▃▄▃▂▂▄▂▂▁▂▁▃▂▃▂▁
trainer/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
valid_accuracy,▁▃▄▄▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇████████████████
valid_loss,█▆▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁

0,1
epoch,47.0
test_accuracy,0.91207
test_loss,0.21334
train_accuracy,0.88804
train_loss,0.23555
trainer/global_step,15040.0
valid_accuracy,0.91161
valid_loss,0.21692


## GANDALF

In [12]:
model_config = GANDALFConfig(
    task="classification",
    gflu_stages=10,
    learning_rate=1e-3,
    head="LinearHead",  # Linear Head
    head_config=head_config,  # Linear Head Config
)

experiment_config = ExperimentConfig(
    project_name=EXP_PROJECT_NAME,
    run_name="GANDALF",
    exp_watch="gradients",
    log_target="wandb",
    log_logits=True,
)
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
    experiment_config=experiment_config,
    verbose=False,
    suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)



VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01111247184453532, max=1.0)…

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7f14866c8a50>

In [13]:
result = tabular_model.evaluate(test)

Output()

/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


In [14]:
wandb.finish()



VBox(children=(Label(value='0.006 MB of 0.006 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
epoch,▁▁▁▁▂▂▂▂▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇█
test_accuracy,▁
test_loss,▁
train_accuracy,▁▆▇███████
train_loss,█▇▄▃▃▂▁▃▂▂▂▂▃▁▂▃▁▂▂▂▂▂▂▂▂▁▂▂▂▁▂▁▂▃▂▁▂▁▂▄
trainer/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████
valid_accuracy,▁▆▆▆▇█▇▇██
valid_loss,█▃▂▂▁▁▁▁▁▁

0,1
epoch,10.0
test_accuracy,0.86695
test_loss,0.32519
train_accuracy,0.86358
train_loss,0.42013
trainer/global_step,3200.0
valid_accuracy,0.86707
valid_loss,0.32778


## TabNet Model

In [15]:
model_config = TabNetModelConfig(
    task="classification",
    learning_rate=1e-5,
    n_d=16,
    n_a=16,
    n_steps=4,
    head="LinearHead",  # Linear Head
    head_config=head_config,  # Linear Head Config
)

experiment_config = ExperimentConfig(
    project_name=EXP_PROJECT_NAME,
    run_name="TabNet",
    exp_watch="gradients",
    log_target="wandb",
    log_logits=True,
)
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
    experiment_config=experiment_config,
    verbose=False,
    suppress_lightning_logger=True,
)
tabular_model.fit(train=train, validation=val)



VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112859611037291, max=1.0…

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7f1487491310>

In [16]:
result = tabular_model.evaluate(test)

Output()

/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.


In [17]:
wandb.finish()



VBox(children=(Label(value='0.013 MB of 0.013 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
epoch,▁▁▁▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▃▅▅▅▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇▇█
test_accuracy,▁
test_loss,▁
train_accuracy,▁▄▅▇█▆
train_loss,▇▄▃▂▃▃▃▃▁▄█▄▅▄▄▃▂▂▂▄▃▂▂▂▃▂▂▂▂▂▂▂▂▁▄▄▅▃
trainer/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████
valid_accuracy,█▇▁█▅▄
valid_loss,▁▃█▁▂█

0,1
epoch,6.0
test_accuracy,0.71867
test_loss,0.67711
train_accuracy,0.70094
train_loss,0.7142
trainer/global_step,1920.0
valid_accuracy,0.65582
valid_loss,0.95996


### Accessing the Experiments

We can access the runs @ https://wandb.ai/manujosephv/pytorch-tabular-covertype/
![](imgs/wandb_preview_1.png)

We can also inspect gradient flows in each component of the model for debugging purposes.
![](imgs/wandb_preview_2.png)