```{contents}
```

## Supervised Fine-Tuning (SFT)

**Supervised Fine-Tuning (SFT)** is the process of training a *pretrained* Large Language Model (LLM) on **labeled instruction–response pairs** so that it can learn to **follow instructions**, behave like an **assistant**, and perform **specific tasks** reliably.

SFT transforms a raw pretrained model (which only predicts the next token) into an **instruction-following model**.

---

### Why SFT Is Needed

A pretrained model:

* learns language patterns, grammar, facts
* predicts the next token in text
* does **not** understand instructions
* does **not** know how to respond like an assistant

Example of a base model:

```
User: Summarize this paragraph.
Model: Summarize this paragraph by saying that...
```

→ It *continues* the user text instead of answering.

SFT fixes this.

---

### What SFT Actually Does

During SFT, the model is trained on examples like:

```
Instruction: Translate to French
Input: How are you?
Output: Comment allez-vous ?
```

The model learns:

* how to interpret a user request
* how to generate the correct style of answer
* how to provide structured outputs
* how to follow the expected conversational format

---

### How SFT Works (Process)

#### **1. Prepare an instruction dataset**

Examples of:

* summarization
* classification
* translation
* question answering
* reasoning
* coding
* safe refusal examples

Each sample has:

```
instruction
input (optional)
response (target label)
```

#### **2. Convert to a chat template**

Such as:

```
### Instruction:
Summarize this text.
### Input:
Cats are mammals...
### Response:
Cats are mammals that...
```

#### **3. Fine-tune the model**

Use **supervised learning** with cross-entropy loss:

$$
\text{Train the model to predict the correct response tokens.}
$$

This adjusts the model’s behavior to match the dataset examples.

#### **4. (Optional) Apply RLHF / DPO**

After SFT, preference optimization further improves:

* helpfulness
* safety
* correctness

---

### Techniques Used for SFT

| Method               | Description                      | Usage                            |
| -------------------- | -------------------------------- | -------------------------------- |
| **Full Fine-Tuning** | Update *all* model weights       | Highest quality, expensive       |
| **LoRA**             | Train only small adapter modules | Efficient, widely used           |
| **QLoRA**            | LoRA + 4-bit quantization        | Train large models on small GPUs |

---

###  What SFT Achieves

SFT creates a model that:

* follows human instructions
* responds in a helpful, conversational manner
* performs domain-specific tasks
* produces structured outputs (JSON, code)
* generalizes across tasks

Without SFT, LLMs would *not* behave like chat assistants.

---

**One-Sentence Summary**

**Supervised Fine-Tuning (SFT) trains a pretrained LLM on labeled instruction–response examples so it learns to follow instructions, solve tasks, and act like an assistant rather than a text-completion model.**



### **Intuition Behind SFT (Supervised Fine-Tuning)**

#### **Intuition 1 — A pretrained LLM is a “completion engine,” not a “helper.”**

A pretrained model learns **only one skill**:

> Predict the next token.

So if the user says:

```
Summarize this text:
```

the model thinks:

> “What token usually comes after ‘Summarize this text:’ in the training data?”

Often the continuation is **another instruction** or **irrelevant text**.

It has knowledge, but **no idea how to behave**.

---

#### **Intuition 2 — SFT acts like showing the model “examples of good behavior.”**

During SFT, the model sees pairs like:

```
User: Explain how airplanes fly.
Assistant: Airplanes fly because...
```

After thousands of these examples, the model learns:

* how to respond
* how long to respond
* what tone to use
* what role it should play
* how instructions map to answers

It's similar to teaching a student by **showing solved examples**.

No guessing — just follow patterns.

---

#### **Intuition 3 — SFT unlocks abilities the model already has.**

Pretraining gives the model:

* facts
* reasoning ability
* language fluency

But it doesn’t know **when** to use each skill.

SFT tells the model:

> “When the user asks for a summary, use your knowledge of summarizing.
> When the user asks for translation, use your multilingual knowledge.”

SFT is like adding a **user manual** to a machine that already has powerful internal capabilities.

---

#### **Intuition 4 — The model learns the *shape* of a helpful response.**

SFT examples teach:

* Start directly answering the question
* Be concise when asked
* Use step-by-step reasoning when appropriate
* Wrap answers in JSON or code when required
* Follow conversational structure
* Stop at the right point

Before SFT, the model produces free-form continuation.
After SFT, it produces **task-specific, structured behavior**.

---

#### **Intuition 5 — SFT teaches the model to generalize across tasks.**

If the SFT dataset contains many different tasks:

* summarization
* reasoning
* translation
* coding
* chatting
* multi-turn conversations

The model learns a **single universal rule**:

> “Read the instruction → understand user intent → produce the kind of answer shown in examples.”

Even if a user asks something **never seen before**, the model generalizes the pattern.

---

####  **Intuition 6 — SFT changes the model’s *behavior*, not its *knowledge*.**

Pretraining = “What do I know?”
SFT = “How should I act when users talk to me?”

This is why SFT is small (often 1–4% of total compute) but hugely impactful.

---

#### **Intuition 7 — SFT aligns the model with *human expectations*.**

Without SFT, the model:

* rambles
* ignores instructions
* continues text endlessly
* does not format answers
* does not stop

With SFT, the model:

* listens
* responds
* formats
* follows rules
* engages in a conversation

This makes the model feel like an **assistant**, not a text generator.

---

#### Simple Analogy

##### **Pretraining:**

You read thousands of books and learn everything about the world.

##### **SFT:**

Someone finally shows you **how to answer questions** based on what you already know.

---

**One-Sentence Intuition**

**SFT teaches an LLM how to behave like a helpful assistant by showing it thousands of examples of how instructions should be answered, turning raw knowledge into usable behavior.**



### Supervised Fine-Tuning (SFT) 

We will:

1. Create a small instruction dataset
2. Convert it into prompt–response format
3. Load a pretrained LLM in **4-bit**
4. Apply **QLoRA** adapters
5. Train with `SFTTrainer`
6. Generate text with the tuned model

This is a **real SFT pipeline** used in industry.

---

#### 1. Install dependencies

```bash
pip install transformers datasets peft accelerate bitsandbytes trl
```

---

#### 2. Create a tiny instruction dataset

```python
from datasets import Dataset

data = [
    {
        "instruction": "Summarize the text.",
        "input": "Large language models are trained using unsupervised learning.",
        "output": "Large language models learn patterns from unlabeled data."
    },
    {
        "instruction": "Translate to French.",
        "input": "How are you?",
        "output": "Comment allez-vous ?"
    },
    {
        "instruction": "Explain like I'm five.",
        "input": "What is gravity?",
        "output": "Gravity is a force that pulls things down, like when you drop a ball."
    }
]

dataset = Dataset.from_list(data)
dataset
```

---

#### 3. Convert dataset to training format (prompt + response)

```python
def format_example(example):
    prompt = f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
"""
    return {
        "prompt": prompt,
        "response": example["output"]
    }

dataset = dataset.map(format_example)
dataset
```

---

#### 4. Load tokenizer + base LLM in 4-bit (QLoRA)

Here we use a small model `facebook/opt-350m` so you can run it anywhere.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "facebook/opt-350m"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    load_in_4bit=True,
    device_map="auto"
)
```

---

#### 5. Apply LoRA adapters (Parameter-efficient SFT)

```python
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
```

---

#### 6. Tokenize prompts for supervised training

```python
def tokenize(example):
    full_text = example["prompt"] + example["response"]
    return tokenizer(full_text, truncation=True, max_length=512)

tokenized_dataset = dataset.map(tokenize)
```

---

#### 7. Train using TRL’s SFTTrainer (best practice)

```python
from trl import SFTTrainer
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./sft_model",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    num_train_epochs=3,
    fp16=True,
    logging_steps=1,
    max_steps=30
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_dataset,
    dataset_text_field=None,
    max_seq_length=512,
    args=training_args,
)

trainer.train()
```

This updates **only LoRA parameters**, making tuning extremely fast and memory-efficient.

---

#### 8. Save the tuned model

```python
trainer.model.save_pretrained("sft_lora_model")
tokenizer.save_pretrained("sft_lora_model")
```

---

#### 9. Test the Instruction-Tuned Model

```python
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="sft_lora_model",
    tokenizer=tokenizer,
    max_new_tokens=100
)

prompt = """### Instruction:
Explain like I'm five.

### Input:
Why is the sky blue?

### Response:
"""

print(pipe(prompt)[0]["generated_text"])
```

You will now see the model respond **according to the SFT examples**, behaving like a simple assistant.

---

#### What This Demo Shows

* How to **create an instruction dataset**
* How to **prepare prompts**
* How to **load a pretrained model in 4-bit**
* How to apply **QLoRA adapters**
* How to run **SFT using TRL’s SFTTrainer**
* How to **test** the instruction-following ability

This is the **same workflow** used to build:

* LLaMA-2-Chat
* Mistral-Instruct
* Falcon-Instruct
* GPT-Instruct models (scaled up)