# Finetune Hugging Face BERT with PyTorch Lightning

Running the following cells will train the model using settings that are shown.

In [3]:
import torch

import lightning.pytorch as pl
from lightning.pytorch.callbacks import EarlyStopping, ModelCheckpoint
from lightning.pytorch.loggers import CSVLogger
# from lightning.pytorch.profilers import PyTorchProfiler

from toxy_bot.ml.datamodule import AutoTokenizerDataModule
from toxy_bot.ml.module import SequenceClassificationModule
from toxy_bot.ml.utils import create_dirs
from toxy_bot.ml.config import Config, DataModuleConfig, ModuleConfig


First, let's configure some basic settings

In [5]:
# model and dataset
model_name = ModuleConfig.model_name
lr = ModuleConfig.learning_rate
dataset_name = DataModuleConfig.dataset_name
batch_size = DataModuleConfig.batch_size

print(f"Model: {model_name}")
print(f"Learning rate: {lr}")
print(f"Dataset: {dataset_name}")
print(f"Batch size: {batch_size}")

# paths
cache_dir = Config.cache_dir
log_dir = Config.log_dir
ckpt_dir = Config.ckpt_dir
# prof_dir = Config.prof_dir
perf_dir = Config.perf_dir
# creates dirs to avoid failure if empty dir has been deleted
create_dirs([cache_dir, log_dir, ckpt_dir, perf_dir])

# set matmul precision
# see https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html
torch.set_float32_matmul_precision("medium")

Model: google/bert_uncased_L-4_H-512_A-8
Learning rate: 3e-05
Dataset: anitamaxvim/jigsaw-toxic-comments
Batch size: 16


Now, we can define our LightningDataModule, which will be used by Trainer for its DataLoaders

In [6]:
lit_datamodule = AutoTokenizerDataModule(
    model_name=model_name,
    dataset_name=dataset_name,
    cache_dir=cache_dir,
    batch_size=batch_size,
)

and our custom LightningModule with ResNet

In [7]:
lit_model = SequenceClassificationModule(learning_rate=lr)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google/bert_uncased_L-4_H-512_A-8 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


next - we are going to define some common callbacks, and our most basic logger - CSVLogger.

EarlyStopping callback helps us to end training early if a convergence criteria is met before the max-iteration setting is reached.

ModelCheckpoint saves the model periodically, and after training finishes, uses best_model_path to retrieve the path to the best checkpoint file and best_model_score to retrieve its score.

In [8]:
callbacks = [
    # EarlyStopping(monitor="val_acc", mode="min"),
    ModelCheckpoint(
        dirpath=ckpt_dir,
        filename="model",
    ),
]

In [9]:
logger = CSVLogger(
    save_dir=log_dir,
    name="csv_logs",
)

Finally – we create our Trainer and pass in our flags (settings), the callbacks and loggers.  Then we call fit!

In [10]:
lit_trainer = pl.Trainer(
    accelerator="auto",
    devices="auto",
    strategy="auto",
    precision="16-mixed",
    max_epochs=5,
    deterministic=True,
    logger=logger,
    callbacks=callbacks,
)

/Users/dbozbay/Dev/toxy-bot/.venv/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/accelerator_connector.py:513: You passed `Trainer(accelerator='cpu', precision='16-mixed')` but AMP with fp16 is not supported on CPU. Using `precision='bf16-mixed'` instead.
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


In [11]:
lit_trainer.fit(model=lit_model, datamodule=lit_datamodule)

Seed set to 42
[2025-03-27 16:06:17.600019] Data cache exists. Loading from cache.
Map: 100%|██████████| 135635/135635 [01:35<00:00, 1423.70 examples/s]
Map: 100%|██████████| 23936/23936 [00:20<00:00, 1143.47 examples/s]

  | Name      | Type                          | Params | Mode 
--------------------------------------------------------------------
0 | model     | BertForSequenceClassification | 28.8 M | eval 
1 | accuracy  | MultilabelAccuracy            | 0      | train
2 | f1_score  | MultilabelF1Score             | 0      | train
3 | precision | MultilabelPrecision           | 0      | train
4 | recall    | MultilabelRecall              | 0      | train
--------------------------------------------------------------------
28.8 M    Trainable params
0         Non-trainable params
28.8 M    Total params
115.067   Total estimated model params size (MB)
4         Modules in train mode
87        Modules in eval mode


Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]

/Users/dbozbay/Dev/toxy-bot/.venv/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


                                                                           

/Users/dbozbay/Dev/toxy-bot/.venv/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.


Epoch 0:   0%|          | 2/8478 [06:09<435:13:28,  0.01it/s, v_num=0, train_loss=0.674]


Detected KeyboardInterrupt, attempting graceful shutdown ...


NameError: name 'exit' is not defined