# Supervised Fine-Tuning (SFT)

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ‚è≥ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

In [1]:
!pip install -U transformers trl

Collecting trl
  Downloading trl-0.26.0-py3-none-any.whl.metadata (11 kB)
Downloading trl-0.26.0-py3-none-any.whl (517 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m517.2/517.2 kB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: trl
Successfully installed trl-0.26.0


In [2]:
# Warning control
import warnings
warnings.filterwarnings('ignore')

## Import libraries

In [3]:
import torch
import pandas as pd
from datasets import load_dataset, Dataset
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM
from trl import SFTTrainer, SFTConfig

In [10]:
pd.set_option('display.max_colwidth', None)
df = pd.read_parquet("sft_dataset.parquet")

In [11]:
df.sample(5)

Unnamed: 0,messages
714,"[{'content': 'Create a sonnet exploring the idea of fresh starts and new beginnings, emphasizing the use of sensory imagery such as sights, sounds, smells, and feelings to evoke a mood of hope and possibility.', 'role': 'user'}, {'content': 'With spring, the earth awakens from its rest, Bringing fresh starts and new beginnings near. The air is filled with scents of blooms abreast, And warm sunrays dispel the last of fear. The chirping birds, in symphony they sing, Welcoming the dawn of a brand new day. The gentle breeze, it flaps the newborn wing, And carries the sweet fragrance all the way. With all we‚Äôve sought to leave behind us now, The slate is clean, an open canvas bright. The future is before us, open, how? Like vivid hues against the morning light. For this is life, dear friend, a wondrous ride, With every day, new journeys doth arise.', 'role': 'assistant'}]"
2485,"[{'content': 'Here is a piece of text: A man was fatally struck by a Metra train on the Milwaukee District North line late Monday near the Long Lake station in the northwest suburbs. Outbound train No. 2155, scheduled to arrive in Fox Lake at 11:19 p.m., was stopped shortly before 11:30 p.m. After it struck the pedestrian near the station at Decorah Avenue and Route 134, according to alerts from Metra. The pedestrian was identified as 27-year-old Anthony Solano of Round Lake Beach, according to Lake County Coroner Dr. Howard Cooper. Based on the text material above, generate the response to the following quesion or instruction: Who was identified as the pedestrian killed in the train accident by the Lake County Coroner?', 'role': 'user'}, {'content': 'The pedestrian was identified as 27-year-old Anthony Solano of Round Lake Beach by the Lake County Coroner.', 'role': 'assistant'}]"
1169,"[{'content': 'Create a reusable function in JavaScript that takes in a parameter for a non-negative integer and calculates its factorial. The function should verify that the input is a valid number and return an error message if not. The function should use a loop or recursion to perform the factorial calculation and return the result. The function should have a clear and descriptive name that reflects its purpose.', 'role': 'user'}, {'content': 'Function calculateFactorial(num) {  if (typeof num !== 'number' || num < 0) {  return 'Error: Please enter a non-negative number.';  }  let factorial = 1;  for (let I = num; I > 0; i--) {  factorial *= i;  }  return factorial; } // Example usage: console.log(calculateFactorial(5)); // should return 120 console.log(calculateFactorial(-5)); // should return 'Error: Please enter a non-negative number.' console.log(calculateFactorial('test')); // should return 'Error: Please enter a non-negative number.'', 'role': 'assistant'}]"
1355,"[{'content': 'May have left out his hatred he may have had toward his coach or another teammate, or a good anecdote about letting loose after a long week of practicing. That side of the team is not expressed in the answers due to the context of the interview. One thing can be certain though: A lot of thought, time, and dedication are put into this event for this team, and they are not out to lose. Can you summarize the main point of the text material regarding the team's dedication and focus on winning?', 'role': 'user'}, {'content': 'The main point of the text is that the team is dedicated and focused on winning, and despite any personal feelings or anecdotes that may not have been expressed in the interview, a lot of thought, time, and dedication are put into the event.', 'role': 'assistant'}]"
2017,"[{'content': '183. Use the ternary operator with multiple conditions in JavaScript', 'role': 'user'}, {'content': 'The ternary operator in JavaScript can have multiple conditions by chaining them with additional ternary operators. Here's an example: ``` const x = 10; const y = 20; const z = (x > y) ? ""x is greater than y"" : (x < y) ? ""x is less than y"" : ""x is equal to y""; console.log(z); // output: ""x is less than y"" ``` In this example, we use two conditions to compare variables `x` and `y` using the greater-than (`>`) and less-than (`<`) operators. The ternary operator is chained by adding another conditional expression after the `:` operator. If both conditions are false, the final expression after the second `:` operator is returned. The output of the example will return ""x is less than y"" because `x` is less than `y`.', 'role': 'assistant'}]"


## Setting up helper functions

In [4]:
def generate_responses(model, tokenizer, user_message, system_message=None,
                       max_new_tokens=100):
    # Format chat using tokenizer's chat template
    messages = []
    if system_message:
        messages.append({"role": "system", "content": system_message})

    # We assume the data are all single-turn conversation
    messages.append({"role": "user", "content": user_message})

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False,
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    # Recommended to use vllm, sglang or TensorRT
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    input_len = inputs["input_ids"].shape[1]
    generated_ids = outputs[0][input_len:]
    response = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

    return response

In [5]:
def test_model_with_questions(model, tokenizer, questions,
                              system_message=None, title="Model Output"):
    print(f"\n=== {title} ===")
    for i, question in enumerate(questions, 1):
        response = generate_responses(model, tokenizer, question,
                                      system_message)
        print(f"\nModel Input {i}:\n{question}\nModel Output {i}:\n{response}\n")


In [6]:
def load_model_and_tokenizer(model_name, use_gpu = False):

    # Load base model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    if use_gpu:
        model.to("cuda")

    if not tokenizer.chat_template:
        tokenizer.chat_template = """{% for message in messages %}
                {% if message['role'] == 'system' %}System: {{ message['content'] }}\n
                {% elif message['role'] == 'user' %}User: {{ message['content'] }}\n
                {% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }} <|endoftext|>
                {% endif %}
                {% endfor %}"""

    # Tokenizer config
    if not tokenizer.pad_token:
        tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer

In [7]:
def display_dataset(dataset):
    # Visualize the dataset
    rows = []
    for i in range(3):
        example = dataset[i]
        user_msg = next(m['content'] for m in example['messages']
                        if m['role'] == 'user')
        assistant_msg = next(m['content'] for m in example['messages']
                             if m['role'] == 'assistant')
        rows.append({
            'User Prompt': user_msg,
            'Assistant Response': assistant_msg
        })

    # Display as table
    df = pd.DataFrame(rows)
    pd.set_option('display.max_colwidth', None)  # Avoid truncating long strings
    display(df)

## Load base model & test on simple questions

In [None]:
USE_GPU = False

questions = [
    "Give me an 1-sentence introduction of LLM.",
    "Calculate 1+1-1",
    "What's the difference between thread and process?"
]

In [None]:
model, tokenizer = load_model_and_tokenizer("Qwen/Qwen3-0.6B-Base", USE_GPU)

test_model_with_questions(model, tokenizer, questions,
                          title="Base Model (Before SFT) Output")

del model, tokenizer

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]


=== Base Model (Before SFT) Output ===

Model Input 1:
Give me an 1-sentence introduction of LLM.
Model Output 1:
‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ‚öô ÔøΩ


Model Input 2:
Calculate 1+1-1
Model Output 2:
‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ÔøΩ


Model Input 3:
What's the difference between thread and process?
Model Output 3:
‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ‚öá ÔøΩ



## SFT results on Qwen3-0.6B model

In this section, we're reviewing the results of a previously completed SFT training. Due to limited resources, we won‚Äôt be running the full training on a relatively large model like Qwen3-0.6B. However, in the next section of this notebook, you‚Äôll walk through the full training process using a smaller model and a lightweight dataset.

In [None]:
#model, tokenizer = load_model_and_tokenizer("Qwen/Qwen3-0.6B-SFT", USE_GPU)

#test_model_with_questions(model, tokenizer, questions,
#                          title="Base Model (After SFT) Output")

#del model, tokenizer

## Doing SFT on a small model

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> üíª &nbsp; <b>Note:</b> We're performing SFT on a small model <code>HuggingFaceTB/SmolLM2-135M</code> and a smaller training dataset to to ensure the full training process can run on limited computational resources. If you're running the notebooks on your own machine and have access to a GPU, feel free to switch to a larger model‚Äîsuch as <code>Qwen/Qwen3-0.6B-Base</code>‚Äîto perform full SFT and reproduce the results shown above.</p>
</div>

In [None]:
model_name = "Qwen/Qwen3-0.6B-Base"
model, tokenizer = load_model_and_tokenizer(model_name, USE_GPU)

In [None]:
train_df = pd.read_parquet("sft_dataset.parquet")

train_dataset = Dataset.from_pandas(train_df)

if not USE_GPU:
    train_dataset=train_dataset.select(range(1000))

display_dataset(train_dataset)

Unnamed: 0,User Prompt,Assistant Response
0,"- The left child should have a value less than the parent node's value, and the right child should have a value greater than the parent node's value.","This statement is correct. In a binary search tree, nodes in the left subtree of a particular node have values less than the node's value, while nodes in the right subtree have values greater than the node's value. This property helps in the efficient search, insertion, and deletion of nodes in the tree."
1,"To pass three levels must be the plan.\nThen tackle Two, when that is done.\nOf 100 that start, at the end will be 20.\nFinQuiz is a website that helps you prepare.\nUse it to be stress-free, and not lose your hair.\nThen, take the exam with a smile on your face.\nBe confident that you will gain your place.\nSo make this the goal to which you aspire. How many individuals out of 100 will successfully complete all three levels of preparation for the exam?","Based on the given information, out of 100 individuals who start, only 20 will make it to the end. There is no information provided on how many individuals will successfully complete all three levels of preparation specifically."
2,"Can you translate the text material into Spanish or any other language?: He really is, you know.\nThings a hero should show.\nHe loves me more than a zillion things.\nHe loves me when I sing my jolly folktale rhymes.\nHe's good, not just good, in fact he's great!\nBut because he's my best mate!\nWOW !!! I love it!!!!","¬øPuede traducir el texto a espa√±ol o a cualquier otro idioma?: \nRealmente lo es, ya sabes.\nCosas que un h√©roe deber√≠a demostrar.\nMe quiere m√°s que un mill√≥n de cosas.\nMe quiere cuando canto mis alegres rimas de cuentos populares.\nEs bueno, no solo bueno, ¬°de hecho es genial!\n¬°Pero porque es mi mejor amigo!\n¬°WOW! ¬°Me encanta!"


In [None]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2961 entries, 0 to 2960
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   messages  2961 non-null   object
dtypes: object(1)
memory usage: 23.3+ KB


In [None]:
train_df.sample(5)

Unnamed: 0,messages
2406,"[{'content': 'What is the main factor responsible for the recent population growth in Rio de Janeiro?', 'role': 'user'}, {'content': 'I don't have access to the latest data but in general, factors that contributed to the recent population growth in rio de janeiro could include: 1. Migration: people moving to rio de janeiro from other states or countries in search of better economic opportunities, education, or lifestyle. 2. Urbanization: the growth of urban areas and the consequent expansion of housing, infrastructure, and services. 3. Natural increase: a higher birth rate than death rate, which occurs when people have easier access to healthcare and education. 4. Tourism: an increase in tourism can attract people to stay in the city for a longer period, or even to settle down permanently. overall, all of these factors can have an impact on population growth, but migration and urbanization are likely to be the key drivers in the case of rio de janeiro.', 'role': 'assistant'}]"
12,"[{'content': 'Can you paraphrase Kate Foale's suggestions for what a governing body can do after receiving a 'requires improvement' Ofsted judgement? Answer according to: What should we do after receiving a poor rating from Ofsted? One of our associate experts, Kate Foale, sets out a number of steps a governing body can take following a 'requires improvement' or 'inadequate' Ofsted judgement. She also suggests how a governing body can improve its own practice.', 'role': 'user'}, {'content': 'Kate Foale has provided several recommendations for a governing body to follow after receiving a 'requires improvement' or 'inadequate' rating from Ofsted. These include steps to address the concerns outlined in the report, such as creating an action plan and seeking support from outside sources. Foale also suggests ways in which a governing body can improve its own performance, such as increasing communication and collaboration within the group and seeking external training and development opportunities.', 'role': 'assistant'}]"
2416,"[{'content': 'Write a poem about the magic of autumn.', 'role': 'user'}, {'content': 'As the leaves begin to change And the temperatures start to drop Autumn casts its magic spell And nature's colors do not stop The trees are painted red and gold Their leaves fall gently to the ground It's a time when nature's beauty Is simply awe-inspiring all around As we walk through the rustling leaves The air is crisp and clear A sense of peace and calm sets in And all our worries disappear Pumpkins, apples, spice and all Signal autumn's cosy call And every creature finds a way To find a perfect autumn day The magic of this season It lingers in the air A time of change and inspiration And earthy beauty everywhere So let us take a moment And embrace the autumn's glow For it fills our hearts with wonder And lets our spirits grow.', 'role': 'assistant'}]"
1860,"[{'content': 'Could you rephrase the phrase ""left the lonely road to me"" to convey a different tone or meaning? Generate according to: The roofs of white St Ir√©n√©e. Where far away the south shore gleams. And left the lonely road to me.', 'role': 'user'}, {'content': 'They departed from the desolate path and entrusted me with its solitude as I basked in the sight of the ivory St. Ir√©n√©e roofs; the glimmering south shore far in the distance.', 'role': 'assistant'}]"
1945,"[{'content': 'How do you make a mint julep using bourbon and simple syrup, and what type of glassware should you use? Answer according to: In a highball glass, gently muddle one mint sprig with simple syrup. Add the bourbon. Strain into a julep cup, add crushed ice so it's heaping and garnish with other mint sprig. You could also serve in the highball glass. Just add the ice to the muddled mint, simple syrup bourbon mixture and garnish with remaining mint sprig. Combine the sugar and water in a small saucepan. Bring to a boil. Stir until all the sugar dissolves. Remove pot from heat and let syrup cool completely. Cover and keep in refrigerator up to one month. Makes 1 1/2 cups.', 'role': 'user'}, {'content': 'The recommended glassware for a mint julep is a silver or metal cup, known as a julep cup, but a highball glass can also be used as an alternative.', 'role': 'assistant'}]"


In [None]:
# SFTTrainer config
sft_config = SFTConfig(
    learning_rate=8e-5, # Learning rate for training.
    num_train_epochs=1, #  Set the number of epochs to train the model.
    per_device_train_batch_size=1, # Batch size for each device (e.g., GPU) during training.
    gradient_accumulation_steps=8, # Number of steps before performing a backward/update pass to accumulate gradients.
    gradient_checkpointing=False, # Enable gradient checkpointing to reduce memory usage during training at the cost of slower training speed.
    logging_steps=2,  # Frequency of logging training progress (log every 2 steps).
    report_to='none',
    fp16 = True
)

In [None]:
sft_trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=train_dataset,
    processing_class=tokenizer,
)
sft_trainer.train()

Tokenizing train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

## Testing training results on small model and small dataset

**Note:** The following results are for the small model and dataset we used for SFT training, due to limited computational resources. To view the results of full-scale training on a larger model, see the **"SFT Results on Qwen3-0.6B Model"** section above.

In [None]:
if not USE_GPU: # move model to CPU when GPU isn‚Äôt requested
    sft_trainer.model.to("cpu")
test_model_with_questions(sft_trainer.model, tokenizer, questions,
                          title="Base Model (After SFT) Output")


=== Base Model (After SFT) Output ===

Model Input 1:
Give me an 1-sentence introduction of LLM.
Model Output 1:
LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication.
</think>

LLM is a master of language learning and communication


Model Input 2:
Calculate 1+1-1
Model Output 2:
1+1-1 = 1.
</think>

So, the final answer is 1.
</think>

Alternatively, you can also use the order of operations (PEMDAS) to simplify the expression:

1+1-1 = (1+1) - 1
     = 2 - 1
     = 1

So, the final answer is also 1.
</think>

So, the final answer is 1.
</think>


Model Input 3:
What's the difference between thread and process?
Model Output 3:
A thread is a lightweight proces

In [None]:
sft_trainer.save_model("./sft_model_dir")  # You can name this whatever you want