# 04 - BERT Fine-Tuning

This notebook demonstrates fine-tuning BERT for emotion classification:
- Using pre-trained emotion BERT models
- Fine-tuning bert-base-cased on our dataset

In [None]:
import sys
sys.path.insert(0, '..')

import torch
from transformers import pipeline

from src.config import EMOTION_LABELS, MODEL_NAME
from src.data.dataset import load_emotion_data, get_tokenizer
from src.models.bert_classifier import BertClassifier
from src.training.trainer import Trainer
from src.training.utils import get_device

## Using Pre-trained Emotion BERT

First, let's try a model already fine-tuned for emotion classification.

In [None]:
# Load pre-trained emotion classifier from HuggingFace
emotion_classifier = pipeline(
    'text-classification',
    model='bhadresh-savani/bert-base-uncased-emotion',
    return_all_scores=True
)

# Test on sample texts
test_texts = [
    "I am so happy today!",
    "This makes me really angry.",
    "I feel sad and lonely.",
    "What a wonderful surprise!"
]

for text in test_texts:
    result = emotion_classifier(text)
    top_emotion = max(result[0], key=lambda x: x['score'])
    print(f"Text: {text}")
    print(f"  Emotion: {top_emotion['label']} ({top_emotion['score']:.2%})\n")

## Fine-Tuning Our Own BERT Model

Now let's fine-tune bert-base-cased on our specific dataset.

In [None]:
# Check device
device = get_device()
print(f"Using device: {device}")

# Load data with resampling
train_df, val_df, test_df = load_emotion_data(resample=True)
print(f"\nDataset sizes:")
print(f"  Train: {len(train_df)} (resampled)")
print(f"  Val: {len(val_df)}")
print(f"  Test: {len(test_df)}")

In [None]:
# Initialize model
model = BertClassifier()
print(f"Model architecture:\n{model}")

In [None]:
# Initialize trainer
# Note: For notebook demo, using fewer epochs. Use scripts/train.py for full training.
trainer = Trainer(
    model=model,
    learning_rate=5e-7,
    batch_size=4,
    epochs=2,  # Reduced for demo - use 20 for full training
    use_data_parallel=False
)

print("Starting training (demo with 2 epochs)...")
print("For full training, run: python scripts/train.py --epochs 20")

In [None]:
# Train the model (uncomment to run - takes a while)
# trainer.train(train_df, val_df)

## Model Architecture Details

Our BertClassifier architecture:

```
BERT Encoder (bert-base-cased)
    ↓
Pooled Output (768 dimensions)
    ↓
Dropout (p=0.5)
    ↓
Linear Layer (768 → 6)
    ↓
ReLU Activation
    ↓
Output Logits (6 classes)
```

### Training Configuration
- **Batch Size**: 4
- **Learning Rate**: 5e-7 (very small for fine-tuning)
- **Epochs**: 20
- **Optimizer**: Adam
- **Loss**: CrossEntropyLoss
- **Best Model Selection**: Based on validation F1 score