# **Fine-Tuning BERT with Hugging Face**
### **Kaggle Notebook**
Author: *Rafael Hidalgo*  
Date: *03/02/2025*  

## **1. Introduction**
This notebook demonstrates how to fine-tune a BERT model for sentiment analysis using the IMDb dataset. We will use Hugging Face's `transformers` and `datasets` libraries to:
- Preprocess and tokenize the dataset
- Train a BERT model for text classification
- Debug and optimize training performance
- Evaluate the fine-tuned model using key metrics
- Explore potential real-world applications


In [None]:
!pip install transformers datasets torch

In [None]:
# Import necessary libraries
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import torch
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

## **3. Load and Prepare the IMDb Dataset**

In [None]:
# Load dataset
dataset = load_dataset('imdb')

# Load tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenize the data
def tokenize_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True, max_length=128)

# Apply tokenization
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Rename the label column
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

# Convert dataset to PyTorch format
tokenized_datasets.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

# Subset the dataset for quick training
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(2000))
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(500))


## **4. Load Pre-Trained BERT Model**

In [None]:
# Load pre-trained BERT model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

## **5. Define Training Arguments**

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    save_steps=10,
)

## **6. Define Trainer and Train the Model**

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Train the model
trainer.train()

## **7. Debugging Issues During Training**

### **Possible Issues & Solutions**
- **Overfitting**: Reduce epochs or increase dropout.
- **Underfitting**: Increase training data or adjust learning rate.
- **Long training time**: Use `distilbert` instead of `bert-base-uncased` for a smaller, faster model.

To experiment, try:
```python
training_args.num_train_epochs = 5  # Increase epochs if underfitting
training_args.per_device_train_batch_size = 8  # Reduce batch size if memory issue
```


## **8. Evaluate Model Performance**

In [None]:
# Define compute metrics function
def compute_metrics(pred):
    predictions, labels = pred
    predictions = np.argmax(predictions, axis=1)
    acc = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions)
    return {"accuracy": acc, "f1_score": f1}

# Evaluate the model
eval_result = trainer.evaluate()
print(f"Evaluation results: {eval_result}")

## **9. Apply Model to Real-World Task**

In [None]:
# Example text inputs
texts = ["This movie was fantastic! I loved every moment.", 
         "The film was terrible. I regret watching it."]

# Tokenize inputs
inputs = tokenizer(texts, padding="max_length", truncation=True, max_length=128, return_tensors="pt")

# Make predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Print results
for text, pred in zip(texts, predictions):
    label = "Positive" if pred == 1 else "Negative"
    print(f"Review: {text} 
Predicted Sentiment: {label}
")

## **10. Conclusion**

In this notebook, we:
- Fine-tuned `bert-base-uncased` on the IMDb dataset
- Addressed common debugging issues
- Evaluated the model using accuracy and F1-score
- Applied the model to classify unseen text

### **Next Steps:**
- Try different datasets (e.g., SQuAD for question answering)
- Experiment with hyperparameters for better accuracy
- Deploy the model as an API for real-world applications

**Thank you for exploring BERT with me! 🚀**
