# Lab 2.3: First Fine-Tuning

**Objective**: Fine-tune BERT for text classification

**Duration**: 35 minutes

## Learning Outcomes
- Set up Trainer with TrainingArguments
- Fine-tune on IMDB dataset
- Monitor training progress

In [None]:
import sys
sys.path.insert(0, "../../../src")
from hf_ecosystem import __version__
print(f"hf-ecosystem version: {__version__}")

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from hf_ecosystem.training import create_trainer, compute_metrics

## 1. Prepare Data

In [None]:
# Load small subset for demo
dataset = load_dataset("imdb", split="train[:100]")
eval_dataset = load_dataset("imdb", split="test[:50]")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Tokenize
def tokenize(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

train_ds = dataset.map(tokenize, batched=True)
eval_ds = eval_dataset.map(tokenize, batched=True)

## 2. Load Model

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2
)
print(f"Model parameters: {model.num_parameters():,}")

## 3. Create Trainer

In [None]:
trainer = create_trainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    output_dir="./results",
    num_epochs=1,
    batch_size=8,
    compute_metrics=compute_metrics,
)
print("Trainer created")

In [None]:
# Train (skip in CI due to time)
# trainer.train()

## Verification

In [None]:
def verify_lab():
    assert trainer is not None
    assert len(train_ds) == 100
    print("âœ… Lab completed successfully!")

verify_lab()