# Fine-tunning BERT for Sentiment Analysis with IMBD data

* Inspired from [Fine-Tuning vs Continued Pretraining](https://medium.com/@heyamit10/fine-tuning-vs-continued-pretraining-c8058e5040cf) (by Hey Amit)

* Tested in Colab on 2025/11/10 (takes less than 5 minutes)

In [2]:
# Fmmb: It seems that this is the only working version
!pip install datasets==3.6.0


Collecting datasets==3.6.0
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: datasets
  Attempting uninstall: datasets
    Found existing installation: datasets 4.0.0
    Uninstalling datasets-4.0.0:
      Successfully uninstalled datasets-4.0.0
Successfully installed datasets-3.6.0


In [26]:
import torch
if torch.cuda.is_available():
    print("GPU is available!")
else:
    print("Please switch to a GPU-enabled environment.")

GPU is available!


In [34]:
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments, AutoTokenizer
from transformers import pipeline

In [56]:
from datasets import load_dataset

# Load dataset
dataset = load_dataset("imdb")  # Example: IMDb reviews dataset

In [61]:
random_samples = dataset["train"].shuffle(seed=42).select(range(100))
print(random_samples["label"])

[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0]


In [60]:
# FMMB: trying to save time
dataset["train"] = dataset["train"].shuffle(seed=42).select(range(1000))
dataset["test"] = dataset["test"].shuffle(seed=42).select(range(100))
dataset["unsupervised"] = dataset["unsupervised"].shuffle(seed=42).select(range(2000))

In [62]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 100
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [63]:
len(dataset['train'])

1000

In [64]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenize data
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Load pre-trained model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [65]:
# Checking the results before fine-tunning

pipe = pipeline(task="sentiment-analysis", model=model.to("cpu"), tokenizer=tokenizer)
pipe(["this is a bad and ugly and horrible movie.", "I hate it all.", "i don't like it", "perfect movie , really nice , good , beautiful"])

Device set to use cuda:0


[{'label': 'LABEL_1', 'score': 0.585413932800293},
 {'label': 'LABEL_1', 'score': 0.5388413071632385},
 {'label': 'LABEL_0', 'score': 0.5525482296943665},
 {'label': 'LABEL_1', 'score': 0.6080735325813293}]

In [66]:
# "evaluation_strategy" is now "eval_strategy"
# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    report_to='none',
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)



In [67]:
# Fine-tune the model
trainer.train()

Epoch,Training Loss,Validation Loss
1,No log,0.418794
2,No log,0.416077
3,No log,0.360758


TrainOutput(global_step=375, training_loss=0.2586337687174479, metrics={'train_runtime': 308.4413, 'train_samples_per_second': 9.726, 'train_steps_per_second': 1.216, 'total_flos': 789333166080000.0, 'train_loss': 0.2586337687174479, 'epoch': 3.0})

In [68]:
trainer.evaluate()

{'eval_loss': 0.3607577383518219,
 'eval_runtime': 2.7472,
 'eval_samples_per_second': 36.4,
 'eval_steps_per_second': 4.732,
 'epoch': 3.0}

In [69]:
# Checking the results after fine-tunning

pipe = pipeline(task="sentiment-analysis", model=model.to("cpu"), tokenizer=tokenizer)
pipe(["this is a bad and ugly and horrible movie.", "I hate it all.", "i don't like it", "perfect movie , really nice , good , beautiful"])

Device set to use cuda:0


[{'label': 'LABEL_0', 'score': 0.977741539478302},
 {'label': 'LABEL_0', 'score': 0.8679994344711304},
 {'label': 'LABEL_0', 'score': 0.9481554627418518},
 {'label': 'LABEL_1', 'score': 0.9935944676399231}]

In [None]:
from sklearn.metrics import classification_report
import numpy as np

# Make predictions on the test set
predictions = trainer.predict(tokenized_datasets["test"])

# Get predicted labels
predicted_labels = np.argmax(predictions.predictions, axis=1)

# Get true labels
true_labels = predictions.label_ids

# Print classification report
print(classification_report(true_labels, predicted_labels, target_names=["negative", "positive"]))



# Extra stuff

In [51]:
x = tokenized_datasets["train"]["label"]
import collections
collections.Counter(x)


Counter({0: 2000})

In [70]:
print(tokenized_datasets["test"].select(range(1))['text'])

["<br /><br />When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong?<br /><br />Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness!<br /><br />I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there.<br /><br />Sinc