# **Fine-Tuning BERT with Hugging Face**
This notebook fine-tunes a pre-trained BERT model on sentiment analysis using the IMDb dataset.
It includes training, debugging, evaluation, and a real-world application.
---
**Enable GPU before running the notebook!**

In [6]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# model.to(device)


Using device: cuda


In [7]:
!pip install transformers datasets accelerate huggingface_hub



## **1️⃣ Load Dataset**
We use the IMDb dataset for sentiment analysis.

In [8]:

from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset("imdb")
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

## **2️⃣ Tokenization**
We tokenize the text using BERT's tokenizer.

In [9]:

from transformers import AutoTokenizer

# Load BERT tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenization function
def tokenize_function(example):
    return tokenizer(example["text"], padding="max_length", truncation=True)

# Tokenize dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 50000
    })
})

## **3️⃣ Prepare Data for Training**
Convert datasets into PyTorch format and use a data collator.

In [10]:

import torch
from transformers import DataCollatorWithPadding

# Remove unnecessary columns
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")

# Data collator for dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


## **4️⃣ Load Pre-Trained BERT Model**
We use `bert-base-uncased` for sentiment classification.

In [11]:

from transformers import AutoModelForSequenceClassification

# Load pre-trained BERT model for sentiment analysis (2 classes: positive/negative)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## **5️⃣ Define Training Arguments**
Set hyperparameters for training.

In [18]:
print(f"Training samples: {len(tokenized_datasets['train'])}")
print(f"Evaluation samples: {len(tokenized_datasets['test'])}")


Training samples: 25000
Evaluation samples: 25000


In [23]:
import logging
from transformers import TrainingArguments

# ✅ Force logs to print in real-time
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info("✅ Logging to console is now enabled.")

training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,  # 🚀 Ensure training restarts
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=50,  # Log every 50 steps
    log_level="info",  # ✅ Increase verbosity
    load_best_model_at_end=True,
    disable_tqdm=False,  # ✅ Ensures logs show up!
    logging_first_step=True,  # ✅ Logs first training step
    logging_strategy="steps",  # ✅ Ensures logs print at every step
)

logger.info("✅ Training arguments set. Ready to start training!")


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


## **6️⃣ Train BERT Using Hugging Face Trainer**

In [20]:
import torch

print(f"Number of GPUs Available: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
    print(f"GPU {i}: {torch.cuda.get_device_name(i)}")


Number of GPUs Available: 2
GPU 0: Tesla T4
GPU 1: Tesla T4


In [24]:
from transformers import Trainer, TrainingArguments
import time
import sys

# Track time
start_time = time.time()

class CustomTrainer(Trainer):
    def log(self, logs):
        super().log(logs)
        print(f"Epoch {logs.get('epoch', '?')} - Loss: {logs.get('loss', '?')} - Step: {logs.get('step', '?')}")
        
        # Print additional information if available
        if "eval_loss" in logs:
            print(f"Validation Loss: {logs['eval_loss']}")
        if "eval_accuracy" in logs:
            print(f"Validation Accuracy: {logs['eval_accuracy']}")

        sys.stdout.flush()  # 🔥 Forces immediate printing

trainer = CustomTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Train the model
print("🚀 Starting Training Now...")
trainer.train()
print("✅ Training Completed!")

# Print total time taken
end_time = time.time()
print(f"\nTotal Training Time: {round((end_time - start_time) / 60, 2)} minutes")


  trainer = CustomTrainer(
***** Running training *****
  Num examples = 25,000
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 4,689
  Number of trainable parameters = 109,483,778
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


🚀 Starting Training Now...


<IPython.core.display.Javascript object>

KeyboardInterrupt: 

## **7️⃣ Save the Fine-Tuned Model**

In [None]:

model.save_pretrained("./bert-sentiment-analysis")
tokenizer.save_pretrained("./bert-sentiment-analysis")


## **8️⃣ Debugging Common Issues**

In [None]:

# Example: Reducing batch size to fix memory issues
training_args = TrainingArguments(
    per_device_train_batch_size=4,  # Reduce batch size
    gradient_accumulation_steps=2,  # Accumulate gradients
    num_train_epochs=3,
)


## **9️⃣ Evaluate the Model**

In [None]:

from datasets import load_metric

# Load accuracy metric
metric = load_metric("accuracy")

# Compute Accuracy
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = torch.argmax(torch.tensor(logits), axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer.compute_metrics = compute_metrics

# Evaluate Model
results = trainer.evaluate()
print(results)


## **🔟 Real-World Sentiment Analysis**

In [None]:

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits).item()
    return "Positive" if prediction == 1 else "Negative"

# Example Predictions
print(predict_sentiment("This movie was absolutely fantastic!"))
print(predict_sentiment("I hated this movie, it was the worst."))
