# Tutorial: Finetuning a BERT model for a Classification Task

In this notebook, we will demonstrate how to fine-tune a BERT model on a text classification task using the Hugging Face Transformers library. We will be using the IMDB dataset as an example and the Bert-base model as our base model.

Below you will find explanations, code snippets, and installation tips to help you follow along.

## Installation and Environment Setup

Before starting, please ensure that you have the following libraries installed:

- transformers
- datasets
- torch

You can install them using pip:

```bash
pip install transformers datasets torch
```

> Note: This notebook is designed to be run on a pretty standard machine either on cpu or gpu. If you run into memory issues, consider using a smaller model or a managed cloud notebook environment.

In [25]:
! pip install transformers[torch] datasets torch



In [26]:
# Import the necessary libraries.
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np
import torch

## Dataset Preparation

We will use the IMDB dataset, a standard benchmark for text classification. The dataset will be tokenized using the tokenizer corresponding to our base model. We also limit the maximum token length to 512 tokens.

In [27]:
# Load the IMDB dataset from the Hugging Face datasets library.
dataset = load_dataset("stanfordnlp/imdb")

# Define a function for tokenization.
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, max_length=512)

# Load the tokenizer for our model.
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenize the dataset.
tokenized_datasets = dataset.map(preprocess_function, batched=True)


Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

## Model Setup

We load the pre-trained Bert model and add a classification head. We set the number of output labels to 2 as the IMDB dataset is a binary classification task (positive/negative).

In [28]:
# Initialize the model with a classification head.
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Training and Evaluation Configuration

Next, we define the training arguments. Here we set up evaluation strategy, batch sizes, number of epochs, and learning rate. We also define a helper function to compute accuracy during evaluation.

In [29]:
# Configure the training arguments.
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,  # For demonstration; increase for better performance.
    weight_decay=0.01,
    push_to_hub=False,
    report_to="none"
)

# Define a metrics function to compute accuracy.
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = (predictions == labels).mean()
    return {"accuracy": accuracy}



## Setting Up the Trainer

We now create a Trainer object from Hugging Face which manages the training loop, evaluation, and logging.

In [30]:
# Prepare a subset of the dataset for quick demonstration (use full dataset in practice).
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(2000))
eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(500))

# Initialize the Trainer.
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

  trainer = Trainer(


In [31]:
## Evaluate the model before training.
metrics_before = trainer.evaluate(eval_dataset)
print("Metrics before training:", metrics_before)

Metrics before training: {'eval_loss': 0.7443788051605225, 'eval_model_preparation_time': 0.0059, 'eval_accuracy': 0.492, 'eval_runtime': 11.2058, 'eval_samples_per_second': 44.62, 'eval_steps_per_second': 11.155}


In [32]:
import pandas as pd
from IPython.display import display, HTML

samples = eval_dataset.select([0,20,40, 80, 100])
label_names = eval_dataset.features['label'].names
pred_out = trainer.predict(samples)
pred_labels = np.argmax(pred_out.predictions, axis=1)
true_labels = pred_out.label_ids

pred_names = [label_names[i] for i in pred_labels]
true_names = [label_names[i] for i in true_labels]
inputs = [sample["text"] for sample in samples]

df_preds = pd.DataFrame({
    "Input Text": inputs,
    "Prediction": pred_names,
    "True Label": true_names
})

display(HTML(df_preds.to_html(escape=False)))

Unnamed: 0,Input Text,Prediction,True Label
0,"When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong? Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness! I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there. Since I didn't understand why the cover said the film was about sisters fighting over land -they weren't fighting each other at all- I watched it a second time. Then I was able to see that if one hadn't lived a similar story, one would easily miss the overwhelming undercurrent of dread and fear and the deep bond between the sisters that runs through it all. That is exactly the reason why people in general often overlook the truth about their neighbors for instance. But yet another reason why this movie is so perfect! I don't give a rat's ass (pardon my French) about to what extend the King Lear story is followed. All I know is that I can honestly say: this movie has changed my life. Keep up the good work guys, you CAN and DO make a difference.",pos,pos
1,"I was expecting a lot more of this film than what I actually got. The acting was just awful from everyone and the story was far from impressive. It took a lot of something I don't to even follow what was going because it was so jumpy. An example of the acting is when Paxton's character, Vann, is upset the South Vietnamese colonel for so he throws some of the sand from the ""sand map"". It was impossible to get any idea of what he was feeling and his actions were robotic. To make things worse, I have no idea how I'm supposed to feel about Vann. He's obviously presented as the protagonist but as soon as he gets to Vietnam he starts an affair with an Vietnamese English teacher. The only thing the movie had going for it was that it wasn't particularly boring. I give it 4 stars out of 10.",pos,neg
2,"This film is bad. It's filled with glaring plot holes, characters who are ruled by stupidity, bad acting and above all, a poor script which has been done before in many, many films, only better. I feel sorry for Donald Sutherland, I just hope he had to do this film rather than wanted to! Miss it.",pos,neg
3,"Now and again, a film comes around purely by accident that makes you doubt your sanity. We just finished studying the novel, ""Northanger Abbey"", at school and decided to refresh our memory of this unexciting piece of humourless garbage with the BBC adaptation. The funny thing about Northanger Abbey is that it actually makes you want to kill yourself. The film is NOTHING like the book, for example, the subtly evil characters seem to have been turned into transparent stereotypes. John Thorpe looks like a leprechaun on acid while Isabella plays the role of slut. Catherine, the main character, is the most depressingly stupid and irritating actress on god's earth (she looks like a coffee addict, her eyes are like basketballs) whilst Mr Tilney looks and acts like a retired porno stunt double. The plot goes completely off the rails at certain points of the film, I don't know what the hell the director was thinking when for no reason at all, a 7 year old black kid who we've never met before takes the main character out of the abbey and starts cartwheeling in front of her. Yes, that's right, cartwheeling. Nonsense of this kind is occasionally interrupted by Catherines ""fantasies"" in which she is being carried around a cathedral by an ogre. Northanger Abbey is basically visual euthanasia so if you want to murder your boss or something like that, BBC have basically discovered a new way to kill someone. Northanger is a barely laughably bad film. Don't watch it unless you're in a padded cell.",pos,neg
4,"I wish I would have read more reviews and more opinions about this movie before I rented it. A waste of money. A waste of time. Very little dialog. The dialog was hard to understand in every way. The storyline and plot were both weak. The only thing that was nice at all was the cinematography. The characters were interesting. At the same time you will spend so much time trying to figure things out, because of the lack of dialog, that you will be rewinding the movie a lot. Do not watch this movie. It was a mess and will leave you feeling like a mess. You will say, what the heck was that, when the movie ends?",pos,neg


## Finetuning the Model

Now we start the fine-tuning process. Training a 1B parameter model can be resource intensive. For a thorough training, consider increasing the number of epochs and using gradient accumulation if needed.

In [33]:
# Begin training
trainer.train()

Epoch,Training Loss,Validation Loss,Model Preparation Time,Accuracy
1,0.4285,0.335327,0.0059,0.908


TrainOutput(global_step=500, training_loss=0.42849285888671873, metrics={'train_runtime': 201.2436, 'train_samples_per_second': 9.938, 'train_steps_per_second': 2.485, 'total_flos': 431818686278880.0, 'train_loss': 0.42849285888671873, 'epoch': 1.0})

## Model Evaluation

After training, evaluate the model on the evaluation dataset and print the performance metrics. Also check the predictions of the samples shown above.

In [34]:
samples = eval_dataset.select([0,20,40, 80, 100])
label_names = eval_dataset.features['label'].names
pred_out = trainer.predict(samples)
pred_labels = np.argmax(pred_out.predictions, axis=1)
true_labels = pred_out.label_ids

pred_names = [label_names[i] for i in pred_labels]
true_names = [label_names[i] for i in true_labels]
inputs = [sample["text"] for sample in samples]

df_preds = pd.DataFrame({
    "Input Text": inputs,
    "Prediction": pred_names,
    "True Label": true_names
})

display(HTML(df_preds.to_html(escape=False)))

Unnamed: 0,Input Text,Prediction,True Label
0,"When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong? Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness! I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there. Since I didn't understand why the cover said the film was about sisters fighting over land -they weren't fighting each other at all- I watched it a second time. Then I was able to see that if one hadn't lived a similar story, one would easily miss the overwhelming undercurrent of dread and fear and the deep bond between the sisters that runs through it all. That is exactly the reason why people in general often overlook the truth about their neighbors for instance. But yet another reason why this movie is so perfect! I don't give a rat's ass (pardon my French) about to what extend the King Lear story is followed. All I know is that I can honestly say: this movie has changed my life. Keep up the good work guys, you CAN and DO make a difference.",pos,pos
1,"I was expecting a lot more of this film than what I actually got. The acting was just awful from everyone and the story was far from impressive. It took a lot of something I don't to even follow what was going because it was so jumpy. An example of the acting is when Paxton's character, Vann, is upset the South Vietnamese colonel for so he throws some of the sand from the ""sand map"". It was impossible to get any idea of what he was feeling and his actions were robotic. To make things worse, I have no idea how I'm supposed to feel about Vann. He's obviously presented as the protagonist but as soon as he gets to Vietnam he starts an affair with an Vietnamese English teacher. The only thing the movie had going for it was that it wasn't particularly boring. I give it 4 stars out of 10.",neg,neg
2,"This film is bad. It's filled with glaring plot holes, characters who are ruled by stupidity, bad acting and above all, a poor script which has been done before in many, many films, only better. I feel sorry for Donald Sutherland, I just hope he had to do this film rather than wanted to! Miss it.",neg,neg
3,"Now and again, a film comes around purely by accident that makes you doubt your sanity. We just finished studying the novel, ""Northanger Abbey"", at school and decided to refresh our memory of this unexciting piece of humourless garbage with the BBC adaptation. The funny thing about Northanger Abbey is that it actually makes you want to kill yourself. The film is NOTHING like the book, for example, the subtly evil characters seem to have been turned into transparent stereotypes. John Thorpe looks like a leprechaun on acid while Isabella plays the role of slut. Catherine, the main character, is the most depressingly stupid and irritating actress on god's earth (she looks like a coffee addict, her eyes are like basketballs) whilst Mr Tilney looks and acts like a retired porno stunt double. The plot goes completely off the rails at certain points of the film, I don't know what the hell the director was thinking when for no reason at all, a 7 year old black kid who we've never met before takes the main character out of the abbey and starts cartwheeling in front of her. Yes, that's right, cartwheeling. Nonsense of this kind is occasionally interrupted by Catherines ""fantasies"" in which she is being carried around a cathedral by an ogre. Northanger Abbey is basically visual euthanasia so if you want to murder your boss or something like that, BBC have basically discovered a new way to kill someone. Northanger is a barely laughably bad film. Don't watch it unless you're in a padded cell.",neg,neg
4,"I wish I would have read more reviews and more opinions about this movie before I rented it. A waste of money. A waste of time. Very little dialog. The dialog was hard to understand in every way. The storyline and plot were both weak. The only thing that was nice at all was the cinematography. The characters were interesting. At the same time you will spend so much time trying to figure things out, because of the lack of dialog, that you will be rewinding the movie a lot. Do not watch this movie. It was a mess and will leave you feeling like a mess. You will say, what the heck was that, when the movie ends?",neg,neg


In [35]:
# Evaluate the fine-tuned model.
metrics_after = trainer.evaluate()
metrics_table = pd.DataFrame([metrics_before, metrics_after],
                               index=["Before Training", "After Training"])
metrics_table

Unnamed: 0,eval_loss,eval_model_preparation_time,eval_accuracy,eval_runtime,eval_samples_per_second,eval_steps_per_second,epoch
Before Training,0.744379,0.0059,0.492,11.2058,44.62,11.155,
After Training,0.335327,0.0059,0.908,12.8943,38.777,9.694,1.0


## Final Notes

- For production-level fine-tuning, consider using distributed training and mixed precision (fp16).
- Experiment with hyperparameters such as learning rate and batch size to optimize performance.
- For further improvements, consider using the Hugging Face Hub for version control and model sharing.