# Hyperparameter Tuning
*(Note: This notebook runs significantly faster if you have access to a GPU. Use either the GPUHub, Google Colab, or your own GPU.)*

In this project, you will optimize the hyperparameters of a model in 3 stages.

## Paraphrase Detection
We finetune [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on [MRPC](https://huggingface.co/datasets/glue/viewer/mrpc/train), a paraphrase detection dataset. This notebook is adapted from a [PyTorch Lightning example](https://lightning.ai/docs/pytorch/1.9.5/notebooks/lightning_examples/text-transformers.html).

In [1]:
%pip install -q torch transformers lightning datasets wandb evaluate ipywidgets

Note: you may need to restart the kernel to use updated packages.


The next 4 cells are:
* Imports
* The `GLUEDataModule` loads the task's dataset and creates dataloaders for the train and valid sets.
* The `GLUETransformer` implements the model forward pass and the training/validation steps. You can check here what is logged with the `self.log` calls.
* The last cell runs training with the given parameters.

In [2]:
from datetime import datetime
from typing import Optional

import wandb
import datasets
import evaluate
import lightning as L
import torch
from lightning.pytorch.loggers import WandbLogger
from torch.utils.data import DataLoader
from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    get_linear_schedule_with_warmup,
    get_constant_schedule_with_warmup,
    get_cosine_schedule_with_warmup
)

In [None]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mjan-wahli[0m ([33mjan-wahli-hochschule-luzern[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [4]:
class GLUEDataModule(L.LightningDataModule):
    task_text_field_map = {
        "cola": ["sentence"],
        "sst2": ["sentence"],
        "mrpc": ["sentence1", "sentence2"],
        "qqp": ["question1", "question2"],
        "stsb": ["sentence1", "sentence2"],
        "mnli": ["premise", "hypothesis"],
        "qnli": ["question", "sentence"],
        "rte": ["sentence1", "sentence2"],
        "wnli": ["sentence1", "sentence2"],
        "ax": ["premise", "hypothesis"],
    }

    glue_task_num_labels = {
        "cola": 2,
        "sst2": 2,
        "mrpc": 2,
        "qqp": 2,
        "stsb": 1,
        "mnli": 3,
        "qnli": 2,
        "rte": 2,
        "wnli": 2,
        "ax": 3,
    }

    loader_columns = [
        "datasets_idx",
        "input_ids",
        "token_type_ids",
        "attention_mask",
        "start_positions",
        "end_positions",
        "labels",
    ]

    def __init__(
        self,
        model_name_or_path: str,
        task_name: str = "mrpc",
        max_seq_length: int = 128,
        train_batch_size: int = 32,
        eval_batch_size: int = 32,
        **kwargs,
    ):
        super().__init__()
        self.model_name_or_path = model_name_or_path
        self.task_name = task_name
        self.max_seq_length = max_seq_length
        self.train_batch_size = train_batch_size
        self.eval_batch_size = eval_batch_size

        self.text_fields = self.task_text_field_map[task_name]
        self.num_labels = self.glue_task_num_labels[task_name]
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name_or_path, use_fast=True)

    def setup(self, stage: str):
        self.dataset = datasets.load_dataset("glue", self.task_name)

        for split in self.dataset.keys():
            self.dataset[split] = self.dataset[split].map(
                self.convert_to_features,
                batched=True,
                remove_columns=["label"],
            )
            self.columns = [c for c in self.dataset[split].column_names if c in self.loader_columns]
            self.dataset[split].set_format(type="torch", columns=self.columns)

        self.eval_splits = [x for x in self.dataset.keys() if "validation" in x]

    def prepare_data(self):
        datasets.load_dataset("glue", self.task_name)
        AutoTokenizer.from_pretrained(self.model_name_or_path, use_fast=True)

    def train_dataloader(self):
        return DataLoader(self.dataset["train"], batch_size=self.train_batch_size, shuffle=True)

    def val_dataloader(self):
        if len(self.eval_splits) == 1:
            return DataLoader(self.dataset["validation"], batch_size=self.eval_batch_size)
        elif len(self.eval_splits) > 1:
            return [DataLoader(self.dataset[x], batch_size=self.eval_batch_size) for x in self.eval_splits]

    def test_dataloader(self):
        if len(self.eval_splits) == 1:
            return DataLoader(self.dataset["test"], batch_size=self.eval_batch_size)
        elif len(self.eval_splits) > 1:
            return [DataLoader(self.dataset[x], batch_size=self.eval_batch_size) for x in self.eval_splits]

    def convert_to_features(self, example_batch, indices=None):
        # Either encode single sentence or sentence pairs
        if len(self.text_fields) > 1:
            texts_or_text_pairs = list(zip(example_batch[self.text_fields[0]], example_batch[self.text_fields[1]]))
        else:
            texts_or_text_pairs = example_batch[self.text_fields[0]]

        # Tokenize the text/text pairs
        features = self.tokenizer.batch_encode_plus(
            texts_or_text_pairs, max_length=self.max_seq_length, padding="max_length", truncation=True
        )

        # Rename label to labels to make it easier to pass to model forward
        features["labels"] = example_batch["label"]

        return features

In [5]:
class GLUETransformer(L.LightningModule):
    def __init__(
        self,
        model_name_or_path: str,
        num_labels: int,
        task_name: str,
        learning_rate: float = 2e-5,
        lr_schedule_type: str = "linear",
        warmup_steps: int = 0,
        weight_decay: float = 0.0,
        train_batch_size: int = 32,
        eval_batch_size: int = 32,
        gradient_clip_val: float = 1.0,
        eval_splits: Optional[list] = None,
        **kwargs,
    ):
        super().__init__()

        self.save_hyperparameters()

        self.config = AutoConfig.from_pretrained(model_name_or_path, num_labels=num_labels)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, config=self.config)
        self.metric = evaluate.load(
            "glue", self.hparams.task_name, experiment_id=datetime.now().strftime("%d-%m-%Y_%H-%M-%S")
        )

        self.validation_step_outputs = []

    def forward(self, **inputs):
        return self.model(**inputs)

    def training_step(self, batch, batch_idx):
        outputs = self(**batch)
        loss = outputs[0]
        return loss

    def validation_step(self, batch, batch_idx, dataloader_idx=0):
        outputs = self(**batch)
        val_loss, logits = outputs[:2]

        if self.hparams.num_labels > 1:
            preds = torch.argmax(logits, axis=1)
        elif self.hparams.num_labels == 1:
            preds = logits.squeeze()

        labels = batch["labels"]
        self.validation_step_outputs.append({"loss": val_loss, "preds": preds, "labels": labels})
        return val_loss

    def on_validation_epoch_end(self):
        if self.hparams.task_name == "mnli":
            for i, output in enumerate(self.validation_step_outputs):
                # matched or mismatched
                split = self.hparams.eval_splits[i].split("_")[-1]
                preds = torch.cat([x["preds"] for x in output]).detach().cpu().numpy()
                labels = torch.cat([x["labels"] for x in output]).detach().cpu().numpy()
                loss = torch.stack([x["loss"] for x in output]).mean()
                self.log(f"val_loss_{split}", loss, prog_bar=True)
                split_metrics = {
                    f"{k}_{split}": v for k, v in self.metric.compute(predictions=preds, references=labels).items()
                }
                self.log_dict(split_metrics, prog_bar=True)
            self.validation_step_outputs.clear()
            return loss

        preds = torch.cat([x["preds"] for x in self.validation_step_outputs]).detach().cpu().numpy()
        labels = torch.cat([x["labels"] for x in self.validation_step_outputs]).detach().cpu().numpy()
        loss = torch.stack([x["loss"] for x in self.validation_step_outputs]).mean()
        self.log("val_loss", loss, prog_bar=True)
        self.log_dict(self.metric.compute(predictions=preds, references=labels), prog_bar=True)
        self.validation_step_outputs.clear()

    def configure_optimizers(self):
        """Prepare optimizer and schedule (linear warmup and decay)"""
        model = self.model
        no_decay = ["bias", "LayerNorm.weight"]
        optimizer_grouped_parameters = [
            {
                "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
                "weight_decay": self.hparams.weight_decay,
            },
            {
                "params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
                "weight_decay": 0.0,
            },
        ]
        optimizer = torch.optim.AdamW(optimizer_grouped_parameters, lr=self.hparams.learning_rate)

        if self.hparams.lr_schedule_type == "linear":
            scheduler = get_linear_schedule_with_warmup(
                optimizer,
                num_warmup_steps=self.hparams.warmup_steps,
                num_training_steps=self.trainer.estimated_stepping_batches,
            )
        elif self.hparams.lr_schedule_type == "cosine":
            scheduler = get_cosine_schedule_with_warmup(
                optimizer,
                num_warmup_steps=self.hparams.warmup_steps,
                num_training_steps=self.trainer.estimated_stepping_batches,
            )
        elif self.hparams.lr_schedule_type == "constant":
            scheduler = get_constant_schedule_with_warmup(
                optimizer,
                num_warmup_steps=self.hparams.warmup_steps
            )

        scheduler = {"scheduler": scheduler, "interval": "step", "frequency": 1}

        return [optimizer], [scheduler]


In [6]:
#Week 1
# run_dict = {
#     1:  {'learning_rate': 2e-6, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     2:  {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     3:  {'learning_rate': 2e-4, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     4:  {'learning_rate': 2e-3, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
    
#     5:  {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16,  'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     6:  {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 64, 'gradient_clip_val': 1.0},
#     7:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16,  'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     8:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 64, 'gradient_clip_val': 1.0},
    
#     9:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 0,  'weight_decay': 0.01, 'train_batch_size': 32,  'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     10: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 25,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     11: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 32,  'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     12: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 100,  'weight_decay': 0.01, 'train_batch_size': 32, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
    
#     13: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.0},
#     14: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
#     15: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 1.0},
#     16: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50,  'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 2.0},
    
#     17: {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
#     18: {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50, 'weight_decay': 0.05, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
#     19: {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50, 'weight_decay': 0.1, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
#     20: {'learning_rate': 2e-5, 'lr_schedule_type': 'linear', 'warmup_steps': 50, 'weight_decay': 0.15, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
# }

#Week 2
run_dict = {
    21:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    22:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    23:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    24:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    25:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    26:  {'learning_rate': 1e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},

    27:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    28:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    29:  {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    30: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    31: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    32: {'learning_rate': 2e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},

    33: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    34: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    35: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    36: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    37: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
    38: {'learning_rate': 3e-5, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01,  'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5},
}

for run_id, run_params in run_dict.items():

    name = (
        f"R{run_id}_"
        f"lr{run_params['learning_rate']:.0e}_"
        f"ls-{run_params['lr_schedule_type'][:3]}_"
        f"ws{run_params['warmup_steps']}_"
        f"wd{run_params['weight_decay']}_"
        f"tb{run_params['train_batch_size']}_"
        f"eb{run_params['eval_batch_size']}_"
        f"gc{run_params['gradient_clip_val']}"
    )

    print(f"Starting run {run_id} with params: {run_params}")
    epochs = 3  # do not change this
    logger = WandbLogger(project="mrpc-distilbert", name=name, resume="never")

    L.seed_everything(42)

    dm = GLUEDataModule(
        model_name_or_path="distilbert-base-uncased",
        task_name="mrpc",
        train_batch_size=run_params['train_batch_size'],
        eval_batch_size=run_params['eval_batch_size'],
    )

    dm.setup("fit")
    
    model = GLUETransformer(
        model_name_or_path="distilbert-base-uncased",
        num_labels=dm.num_labels,
        eval_splits=dm.eval_splits,
        task_name=dm.task_name,
        learning_rate=run_params['learning_rate'],
        lr_schedule_type=run_params['lr_schedule_type'],
        warmup_steps=run_params['warmup_steps'],
        weight_decay=run_params['weight_decay'],
        train_batch_size=run_params['train_batch_size'],
        eval_batch_size=run_params['eval_batch_size'],
        gradient_clip_val=run_params['gradient_clip_val'],
    )


    logger.experiment.config.update({
        "model_name": model.hparams.model_name_or_path,
        "task": dm.task_name,
        "learning_rate": model.hparams.learning_rate,
        "lr_schedule_type": model.hparams.lr_schedule_type,
        "warmup_steps": model.hparams.warmup_steps,
        "weight_decay": model.hparams.weight_decay,
        "train_batch_size": model.hparams.train_batch_size,
        "eval_batch_size": model.hparams.eval_batch_size,
        "gradient_clip_val": model.hparams.gradient_clip_val,
        "max_seq_length": dm.max_seq_length,
        "epochs": epochs,
        "seed": 42,
    })

    trainer = L.Trainer(
        max_epochs=epochs,
        accelerator="auto",
        devices=1,
        logger=logger,
        gradient_clip_val=model.hparams.gradient_clip_val,
    )
    trainer.fit(model, datamodule=dm)

    wandb.finish()

Seed set to 42


Starting run 21 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñà‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñá
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÅ‚ñÅ

0,1
accuracy,0.83578
epoch,2.0
f1,0.88428
trainer/global_step,689.0
val_loss,0.39422


Seed set to 42


Starting run 22 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñà‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñá
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÅ‚ñÅ

0,1
accuracy,0.83578
epoch,2.0
f1,0.88428
trainer/global_step,689.0
val_loss,0.39433


Seed set to 42


Starting run 23 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÜ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÜ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÇ‚ñÅ

0,1
accuracy,0.84314
epoch,2.0
f1,0.89003
trainer/global_step,689.0
val_loss,0.39179


Seed set to 42


Starting run 24 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñá‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñá‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÇ‚ñÅ

0,1
accuracy,0.84314
epoch,2.0
f1,0.89003
trainer/global_step,689.0
val_loss,0.3919


Seed set to 42


Starting run 25 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÜ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÜ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÇ‚ñÅ

0,1
accuracy,0.84069
epoch,2.0
f1,0.88851
trainer/global_step,689.0
val_loss,0.39418


Seed set to 42


Starting run 26 with params: {'learning_rate': 1e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÜ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÜ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÇ‚ñÅ

0,1
accuracy,0.84069
epoch,2.0
f1,0.88851
trainer/global_step,689.0
val_loss,0.39428


Seed set to 42


Starting run 27 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñá‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñà‚ñà

0,1
accuracy,0.84314
epoch,2.0
f1,0.89003
trainer/global_step,689.0
val_loss,0.40018


Seed set to 42


Starting run 28 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñà‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñá
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñà‚ñà

0,1
accuracy,0.84314
epoch,2.0
f1,0.89003
trainer/global_step,689.0
val_loss,0.40041


Seed set to 42


Starting run 29 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÜ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÖ‚ñà

0,1
accuracy,0.84069
epoch,2.0
f1,0.88774
trainer/global_step,689.0
val_loss,0.40332


Seed set to 42


Starting run 30 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÖ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñà‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÜ‚ñà

0,1
accuracy,0.84069
epoch,2.0
f1,0.88774
trainer/global_step,689.0
val_loss,0.40252


Seed set to 42


Starting run 31 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÇ‚ñÅ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÇ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÇ‚ñÅ

0,1
accuracy,0.86275
epoch,2.0
f1,0.90244
trainer/global_step,689.0
val_loss,0.37771


Seed set to 42


Starting run 32 with params: {'learning_rate': 2e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÇ‚ñÅ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÇ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñà‚ñÅ‚ñÅ

0,1
accuracy,0.86275
epoch,2.0
f1,0.90244
trainer/global_step,689.0
val_loss,0.37774


Seed set to 42


Starting run 33 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñà‚ñÅ‚ñÜ
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñà‚ñÅ‚ñÖ
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÖ‚ñà

0,1
accuracy,0.84314
epoch,2.0
f1,0.89003
trainer/global_step,689.0
val_loss,0.45272


Seed set to 42


Starting run 34 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 10, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñà‚ñÅ‚ñÜ
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñà‚ñÅ‚ñÖ
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÖ‚ñà

0,1
accuracy,0.84314
epoch,2.0
f1,0.88966
trainer/global_step,689.0
val_loss,0.45624


Seed set to 42


Starting run 35 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñà‚ñÅ‚ñÜ
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñà‚ñÅ‚ñÜ
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÖ‚ñà

0,1
accuracy,0.85784
epoch,2.0
f1,0.9
trainer/global_step,689.0
val_loss,0.45651


Seed set to 42


Starting run 36 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 30, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñà‚ñÅ‚ñÜ
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñà‚ñÅ‚ñÖ
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÖ‚ñà

0,1
accuracy,0.85784
epoch,2.0
f1,0.9
trainer/global_step,689.0
val_loss,0.45729


Seed set to 42


Starting run 37 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.005, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÑ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÖ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÉ‚ñà

0,1
accuracy,0.85539
epoch,2.0
f1,0.89739
trainer/global_step,689.0
val_loss,0.447


Seed set to 42


Starting run 38 with params: {'learning_rate': 3e-05, 'lr_schedule_type': 'cosine', 'warmup_steps': 50, 'weight_decay': 0.01, 'train_batch_size': 16, 'eval_batch_size': 32, 'gradient_clip_val': 0.5}


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


Map:   0%|          | 0/3668 [00:00<?, ? examples/s]



Map:   0%|          | 0/408 [00:00<?, ? examples/s]



Map:   0%|          | 0/1725 [00:00<?, ? examples/s]

Loading `train_dataloader` to estimate number of stepping batches.
/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.

  | Name  | Type                                | Params | Mode
---------------------------------------------------------------------
0 | model | DistilBertForSequenceClassification | 67.0 M | eval
---------------------------------------------------------------------
67.0 M    Trainable params
0         Non-trainable params
67.0 M    Total params
267.820   Total estimated model params size (MB)
0         Modules in train mode
96        Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/janwahli/Projects/AI/mlops/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=13` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.
[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
accuracy,‚ñÅ‚ñÖ‚ñà
epoch,‚ñÅ‚ñÖ‚ñà
f1,‚ñÅ‚ñÜ‚ñà
trainer/global_step,‚ñÅ‚ñÖ‚ñà
val_loss,‚ñÅ‚ñÉ‚ñà

0,1
accuracy,0.85049
epoch,2.0
f1,0.89391
trainer/global_step,689.0
val_loss,0.45133
