```{contents}
```

## Supervised Fine-Tuning (SFT)

**Supervised Fine-Tuning (SFT)** is the process of training a *pretrained* Large Language Model (LLM) on **labeled instruction–response pairs** so that it can learn to **follow instructions**, behave like an **assistant**, and perform **specific tasks** reliably.

SFT transforms a raw pretrained model (which only predicts the next token) into an **instruction-following model**.

---

### Why SFT Is Needed

A pretrained model:

* learns language patterns, grammar, facts
* predicts the next token in text
* does **not** understand instructions
* does **not** know how to respond like an assistant

Example of a base model:

```
User: Summarize this paragraph.
Model: Summarize this paragraph by saying that...
```

→ It *continues* the user text instead of answering.

SFT fixes this.

---

### What SFT Actually Does

During SFT, the model is trained on examples like:

```
Instruction: Translate to French
Input: How are you?
Output: Comment allez-vous ?
```

The model learns:

* how to interpret a user request
* how to generate the correct style of answer
* how to provide structured outputs
* how to follow the expected conversational format

---

### How SFT Works (Process)

#### **1. Prepare an instruction dataset**

Examples of:

* summarization
* classification
* translation
* question answering
* reasoning
* coding
* safe refusal examples

Each sample has:

```
instruction
input (optional)
response (target label)
```

#### **2. Convert to a chat template**

Such as:

```
### Instruction:
Summarize this text.
### Input:
Cats are mammals...
### Response:
Cats are mammals that...
```

#### **3. Fine-tune the model**

Use **supervised learning** with cross-entropy loss:

$$
\text{Train the model to predict the correct response tokens.}
$$

This adjusts the model’s behavior to match the dataset examples.

#### **4. (Optional) Apply RLHF / DPO**

After SFT, preference optimization further improves:

* helpfulness
* safety
* correctness

---

### Techniques Used for SFT

| Method               | Description                      | Usage                            |
| -------------------- | -------------------------------- | -------------------------------- |
| **Full Fine-Tuning** | Update *all* model weights       | Highest quality, expensive       |
| **LoRA**             | Train only small adapter modules | Efficient, widely used           |
| **QLoRA**            | LoRA + 4-bit quantization        | Train large models on small GPUs |

---

###  What SFT Achieves

SFT creates a model that:

* follows human instructions
* responds in a helpful, conversational manner
* performs domain-specific tasks
* produces structured outputs (JSON, code)
* generalizes across tasks

Without SFT, LLMs would *not* behave like chat assistants.

---

**One-Sentence Summary**

**Supervised Fine-Tuning (SFT) trains a pretrained LLM on labeled instruction–response examples so it learns to follow instructions, solve tasks, and act like an assistant rather than a text-completion model.**