
# Hands-On Tutorial: Supervised Fine-Tuning with SFTTrainer

This tutorial demonstrates how to perform **supervised fine-tuning (SFT)** on the `SmolLM2-135M` model using Hugging Face's `trl` library.

You will learn how to:
1. Load a pretrained language model (`SmolLM2-135M`).
2. Format inputs as **chat conversations** using `setup_chat_format`.
3. Run inference with the base model (before training).
4. Load and stream a dataset from the Hugging Face Hub.
5. Configure and run `SFTTrainer` for supervised fine-tuning.
6. Evaluate the fine-tuned model vs. the base model.

---


## 1) Setup & Authentication

In [None]:
# If on Colab or fresh env, uncomment:
# !pip install -q transformers datasets trl huggingface_hub

from huggingface_hub import login

# login()  # Uncomment to authenticate with your Hugging Face token

## 2) Imports

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, setup_chat_format
import torch

## 3) Device, Model, Tokenizer Setup

In [None]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else (
        "mps"
        if getattr(torch.backends, "mps", None) and torch.backends.mps.is_available()
        else "cpu"
    )
)
print("Using device:", device)

model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Setup chat format (important for consistency during training + inference)
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

finetune_name = "SmolLM2-FT-MyDataset"
finetune_tags = ["smol-course", "module_1"]

## 4) Baseline Generation (Before Training)

In [None]:
prompt = "Write a haiku about programming"

messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=60)

print("=== Base Model Output ===")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

## 5) Load & Stream a Dataset

In [None]:
# Example: Dolly-15k dataset for instruction tuning
streamed = load_dataset(
    "databricks/databricks-dolly-15k", split="train", streaming=True
)

# Inspect first few rows
from itertools import islice

for i, row in enumerate(islice(streamed, 3)):
    print(f"Row {i}:")
    print("Instruction:", row.get("instruction"))
    print("Context:", row.get("context"))
    print("Response:", (row.get("response") or "")[:120], "...")
    print("-" * 50)

### Helper: Convert Dataset Rows into Chat-Formatted Text

In [None]:

def row_to_chat_text(row):
    system_msg = {"role": "system", "content": "You are a helpful assistant."}
    instr = row.get("instruction") or ""
    ctx = row.get("context") or ""
    user_msg = instr if not ctx else f"{instr}\n\nContext:\n{ctx}"
    assistant_msg = row.get("response") or ""

    messages = [system_msg, {"role": "user", "content": user_msg}]
    if assistant_msg.strip():
        messages.append({"role": "assistant", "content": assistant_msg})

    return tokenizer.apply_chat_template(messages, tokenize=False)

# Test conversion
streamed = load_dataset("databricks/databricks-dolly-15k", split="train", streaming=True)
for i, row in enumerate(islice(streamed, 2)):
    print("Formatted chat prompt:
", row_to_chat_text(row)[:400], "...\n")


## 6) Prepare Dataset for Training

In [None]:
# For fine-tuning we need a non-streaming dataset (train/test splits)
dataset = load_dataset(
    "databricks/databricks-dolly-15k", split="train[:2%]"
)  # subset for speed


# Map into text field for SFTTrainer
def preprocess(example):
    return {"text": row_to_chat_text(example)}


dataset = dataset.map(preprocess, remove_columns=dataset.column_names)
print(dataset[0]["text"][:300])

## 7) Configure SFTTrainer

In [None]:
sft_config = SFTConfig(
    output_dir=finetune_name,
    num_train_epochs=1,
    per_device_train_batch_size=2,
    logging_steps=10,
    save_strategy="epoch",
    push_to_hub=False,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    args=sft_config,
)

## 8) Run Fine-Tuning

In [None]:
trainer.train()

## 9) Compare Model Outputs (After Training)

In [None]:
prompt = "Write a haiku about programming"

messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=60)

print("=== Fine-Tuned Model Output ===")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


---

## ✅ Summary

In this tutorial you:

1. Ran inference with the **base model**.
2. Loaded & **streamed a dataset** from Hugging Face Hub.
3. Converted rows into **chat-formatted training examples**.
4. Configured and ran **supervised fine-tuning with SFTTrainer**.
5. Compared outputs before & after training.

**Next steps:**
- Try bigger subsets (`train[:10%]`) or full dataset for better results.
- Experiment with different datasets (e.g., `HuggingFaceTB/smoltalk`, `OpenAssistant/oasst1`).
- Push your fine-tuned model to the Hugging Face Hub for reuse!

Happy fine-tuning 🚀
