# üéØ Lesson 2.1: Your First Fine-Tuning!

**Duration:** 2 hours  
**Difficulty:** Beginner  
**Prerequisites:** Module 1 completed

---

## üéØ Learning Objectives

By the end of this lesson, you will:
1. **Actually fine-tune a model** (for real!)
2. Prepare a dataset for training
3. Configure training parameters
4. Watch your model learn in real-time
5. Evaluate your trained model
6. Save and use your model

**This is the BIG one - you're about to become a fine-tuner! üöÄ**

---

## üìã What We'll Build

**Project:** Movie Review Sentiment Classifier

**Task:** Classify movie reviews as positive or negative

**Model:** DistilBERT (small, fast, beginner-friendly)

**Dataset:** IMDB movie reviews (small subset for speed)

**Why this project?**
- Simple binary classification (2 classes)
- Fast to train (5-10 minutes)
- Easy to understand results
- Real-world applicable

---

## üõ†Ô∏è Setup

In [None]:
# Install required libraries (uncomment if needed)
# !pip install transformers datasets torch evaluate scikit-learn

# Imports
import torch
import numpy as np
from datasets import load_dataset, Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer
)
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ All libraries imported!")
print(f"Using device: {'GPU' if torch.cuda.is_available() else 'CPU'}")

## üìä Step 1: Load and Explore Data

Let's load a small subset of IMDB reviews for quick training.

In [None]:
# Load dataset
print("Loading IMDB dataset...")
dataset = load_dataset("imdb")

# Let's use a smaller subset for faster training
# In real projects, use more data!
train_dataset = dataset['train'].shuffle(seed=42).select(range(1000))
test_dataset = dataset['test'].shuffle(seed=42).select(range(200))

print(f"\n‚úÖ Dataset loaded!")
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"\nDataset structure: {train_dataset.features}")

In [None]:
# Let's look at some examples
print("Sample reviews:\n")
for i in range(3):
    review = train_dataset[i]
    label = "POSITIVE" if review['label'] == 1 else "NEGATIVE"
    text = review['text'][:200] + "..."  # First 200 characters
    
    print(f"Example {i+1}:")
    print(f"Label: {label}")
    print(f"Text: {text}\n")

In [None]:
# Check class balance
train_labels = [example['label'] for example in train_dataset]
positive_count = sum(train_labels)
negative_count = len(train_labels) - positive_count

print("Class distribution:")
print(f"Positive: {positive_count} ({positive_count/len(train_labels)*100:.1f}%)")
print(f"Negative: {negative_count} ({negative_count/len(train_labels)*100:.1f}%)")

if abs(positive_count - negative_count) < len(train_labels) * 0.2:
    print("\n‚úÖ Dataset is balanced! Good for training.")
else:
    print("\n‚ö†Ô∏è Dataset is imbalanced. Consider balancing techniques.")

## üî§ Step 2: Tokenize the Data

Convert text to numbers that the model can understand.

In [None]:
# Load tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

print(f"‚úÖ Loaded tokenizer: {model_name}")
print(f"Vocabulary size: {tokenizer.vocab_size}")

In [None]:
# Create tokenization function
def tokenize_function(examples):
    """Tokenize text data."""
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=256  # Limit sequence length for speed
    )

# Apply tokenization to entire dataset
print("Tokenizing datasets...")
tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_test = test_dataset.map(tokenize_function, batched=True)

print("‚úÖ Tokenization complete!")
print(f"\nTokenized features: {tokenized_train.features.keys()}")

In [None]:
# Let's see what tokenized data looks like
example = tokenized_train[0]

print("Example tokenized data:\n")
print(f"Original text: {train_dataset[0]['text'][:100]}...\n")
print(f"Input IDs (first 20): {example['input_ids'][:20]}")
print(f"Attention mask (first 20): {example['attention_mask'][:20]}")
print(f"Label: {example['label']}")
print(f"\nTotal tokens in this example: {len(example['input_ids'])}")

## üß† Step 3: Load the Model

Load a pre-trained model and prepare it for our task (binary classification).

In [None]:
# Load pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,  # Binary classification (positive/negative)
)

print(f"‚úÖ Model loaded: {model_name}")
print(f"\nModel configuration:")
print(f"  Number of labels: {model.num_labels}")
print(f"  Hidden size: {model.config.hidden_size}")
print(f"  Number of layers: {model.config.num_hidden_layers}")
print(f"  Number of attention heads: {model.config.num_attention_heads}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

## üìä Step 4: Define Evaluation Metrics

How will we measure if our model is learning?

In [None]:
def compute_metrics(eval_pred):
    """Calculate accuracy, precision, recall, and F1 score."""
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    # Calculate metrics
    accuracy = accuracy_score(labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predictions, average='binary'
    )
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

print("‚úÖ Metrics function defined")
print("\nMetrics we'll track:")
print("  - Accuracy: Overall correctness")
print("  - Precision: Of positive predictions, how many are correct?")
print("  - Recall: Of actual positives, how many did we find?")
print("  - F1: Harmonic mean of precision and recall")

## üöÄ Step 5: Configure Training

Set up how we want to train the model.

### Understanding Training Arguments:

- **epochs**: How many times to see entire dataset (3 is good for small datasets)
- **batch_size**: How many examples to process at once (8 or 16 for beginners)
- **learning_rate**: How fast to learn (2e-5 is a safe default)
- **weight_decay**: Regularization to prevent overfitting (0.01 is standard)

In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",              # Where to save checkpoints
    num_train_epochs=3,                  # Number of training epochs
    per_device_train_batch_size=8,      # Batch size for training
    per_device_eval_batch_size=16,      # Batch size for evaluation
    learning_rate=2e-5,                  # Learning rate
    weight_decay=0.01,                   # Weight decay for regularization
    
    # Evaluation and logging
    eval_strategy="epoch",              # Evaluate after each epoch
    save_strategy="epoch",               # Save after each epoch
    logging_steps=50,                    # Log every 50 steps
    
    # Performance
    load_best_model_at_end=True,        # Load best model when done
    metric_for_best_model="accuracy",   # Use accuracy to determine best model
    
    # Optional: Reduce output
    report_to="none",                    # Don't report to wandb/tensorboard
)

print("‚úÖ Training configuration set!")
print(f"\nTraining parameters:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  Total training steps: {len(tokenized_train) // training_args.per_device_train_batch_size * training_args.num_train_epochs}")

## üéØ Step 6: Train the Model!

**This is it! We're about to fine-tune your first model!**

Watch the loss decrease - that means your model is learning! üìâ

In [None]:
# Create Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    compute_metrics=compute_metrics,
)

print("‚úÖ Trainer created and ready!")
print("\nüöÄ Starting training...\n")
print("This will take 5-10 minutes. Watch the loss go down!")
print("="*60)

In [None]:
# TRAIN THE MODEL!
training_result = trainer.train()

print("\n" + "="*60)
print("üéâ TRAINING COMPLETE!")
print("="*60)
print(f"\nFinal training loss: {training_result.training_loss:.4f}")
print(f"Training time: {training_result.metrics['train_runtime']:.2f} seconds")

## üìä Step 7: Evaluate the Model

How well did we do?

In [None]:
# Evaluate on test set
print("Evaluating on test set...\n")
eval_results = trainer.evaluate()

print("="*60)
print("üìä EVALUATION RESULTS")
print("="*60)
print(f"Accuracy:  {eval_results['eval_accuracy']:.2%}")
print(f"Precision: {eval_results['eval_precision']:.2%}")
print(f"Recall:    {eval_results['eval_recall']:.2%}")
print(f"F1 Score:  {eval_results['eval_f1']:.2%}")
print("="*60)

# Interpretation
if eval_results['eval_accuracy'] >= 0.85:
    print("\nüåü EXCELLENT! Your model is performing very well!")
elif eval_results['eval_accuracy'] >= 0.75:
    print("\n‚úÖ GOOD! Your model is working well. Consider more training data or epochs for improvement.")
else:
    print("\n‚ö†Ô∏è The model is learning but could improve. Try more data or different hyperparameters.")

## üß™ Step 8: Test Your Model!

Let's try it on real reviews (including your own!).

In [None]:
def predict_sentiment(text):
    """Predict sentiment of a given text."""
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=256)
    
    # Move to same device as model
    inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Get probabilities
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1).item()
    confidence = probabilities[0][prediction].item()
    
    label = "POSITIVE" if prediction == 1 else "NEGATIVE"
    
    return label, confidence

# Test examples
test_reviews = [
    "This movie was absolutely amazing! Best film I've seen this year!",
    "Terrible movie. Complete waste of time and money.",
    "It was okay, nothing special but not terrible either.",
    "I loved every minute of it! The acting was superb and the plot was gripping.",
    "Boring and predictable. I fell asleep halfway through."
]

print("Testing your fine-tuned model:\n")
print("="*80)

for review in test_reviews:
    label, confidence = predict_sentiment(review)
    print(f"\nReview: {review}")
    print(f"‚Üí Prediction: {label} (confidence: {confidence:.2%})")
    print("-"*80)

In [None]:
# üéØ YOUR TURN: Test with your own reviews!

your_reviews = [
    "Write your own movie review here!",
    # Add more reviews...
]

print("Your custom reviews:\n")
for review in your_reviews:
    label, confidence = predict_sentiment(review)
    print(f"Review: {review}")
    print(f"‚Üí {label} ({confidence:.2%})\n")

## üíæ Step 9: Save Your Model

Save your fine-tuned model so you can use it later!

In [None]:
# Save model and tokenizer
save_directory = "./my_first_finetuned_model"

model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

print(f"‚úÖ Model and tokenizer saved to: {save_directory}")
print("\nYou can now load this model anytime with:")
print(f'model = AutoModelForSequenceClassification.from_pretrained("{save_directory}")')
print(f'tokenizer = AutoTokenizer.from_pretrained("{save_directory}")')

In [None]:
# Test loading the saved model
print("Testing if we can load the saved model...\n")

loaded_model = AutoModelForSequenceClassification.from_pretrained(save_directory)
loaded_tokenizer = AutoTokenizer.from_pretrained(save_directory)

print("‚úÖ Model loaded successfully!")

# Quick test
test_text = "This is a great movie!"
inputs = loaded_tokenizer(test_text, return_tensors="pt")
outputs = loaded_model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

print(f"\nQuick test: '{test_text}'")
print(f"Prediction: {'POSITIVE' if prediction == 1 else 'NEGATIVE'}")

## üéâ CONGRATULATIONS!

You just fine-tuned your first transformer model!

### What You Accomplished:

‚úÖ Loaded and explored a dataset  
‚úÖ Tokenized text data  
‚úÖ Configured training parameters  
‚úÖ **Fine-tuned a model from scratch**  
‚úÖ Evaluated model performance  
‚úÖ Made predictions on new data  
‚úÖ Saved your model for future use  

---

## üß† What You Learned

### Key Concepts:

1. **Data Preparation**: Load ‚Üí Tokenize ‚Üí Batching
2. **Training Loop**: Model sees data multiple times (epochs)
3. **Loss Decreasing = Learning**: Lower loss = better performance
4. **Evaluation Metrics**: Accuracy, precision, recall, F1
5. **Overfitting Prevention**: Validation set, early stopping

### The Fine-Tuning Pipeline:

```
Data ‚Üí Tokenize ‚Üí Model ‚Üí Training Args ‚Üí Trainer ‚Üí Train ‚Üí Evaluate ‚Üí Save
```

---

## üöÄ Next Steps

### Challenge Yourself:

1. **Experiment with hyperparameters:**
   - Try different learning rates (1e-5, 3e-5, 5e-5)
   - Adjust batch sizes (4, 8, 16)
   - More epochs (5, 10)

2. **Use more data:**
   - Increase from 1000 to 5000 training examples
   - See how accuracy improves!

3. **Try different models:**
   - `bert-base-uncased`
   - `roberta-base`
   - `albert-base-v2`

4. **Upload to HuggingFace Hub:**
   - Share your model with the world!
   - `model.push_to_hub("my-awesome-sentiment-model")`

---

## üìù Reflection Questions

Can you answer these?

1. What does the loss value tell you about training?
2. Why do we need a separate test set?
3. What's the difference between accuracy and F1 score?
4. How would you improve the model's performance?

---

## ‚û°Ô∏è Next Lesson

**Lesson 2.2: Evaluating Your Model**
- Deep dive into metrics
- Confusion matrices
- Error analysis
- When is your model good enough?

---

**Progress:** üü¢üü¢üü¢üü¢üîò (Lesson 4 of 15)

---

## üéØ Achievement Unlocked!

**üèÜ First Fine-Tuning Complete!**

You're no longer a beginner - you're a fine-tuner! Keep going! üí™