# Hands-On Exercises: Fine-Tuning SmolLM3

Welcome to the practical section! Here you'll apply everything you've learned about chat templates and supervised fine-tuning using SmolLM3. These exercises progress from basic concepts to advanced techniques, giving you real-world experience with instruction tuning.


## Learning Objectives

By completing these exercises, you will:
- Master SmolLM3's chat template system
- Fine-tune SmolLM3 on real datasets using both Python APIs and CLI tools
- Work with the SmolTalk2 dataset that was used to train the original model
- Compare base model vs fine-tuned model performance
- Deploy your models to Hugging Face Hub
- Understand production workflows for scaling fine-tuning

---

## Exercise 1: Exploring SmolLM3's Chat Templates

**Objective**: Understand how SmolLM3 handles different conversation formats and reasoning modes.

SmolLM3 is a hybrid reasoning model which can follow instructions or generated tokens that 'reason' on a complex problem. When post-trained effectively, the model will reason on hard problems and generate direct responses on easy problems.

### Environment Setup

Let's start by setting up our environment.


In [2]:
# Install required packages (run in Colab or your environment)
!pip install -qqq "transformers>=4.55.0" "trl>=0.22.1" "datasets" "torch"
!pip install -qqq "accelerate" "peft" "trackio" "huggingface_hub"

In [3]:
# Import necessary libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

if torch.cuda.is_available():
    device = "cuda"
    print(f"Using CUDA GPU: {torch.cuda.get_device_name()}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
    device = "mps"
    print("Using Apple MPS")
else:
    device = "cpu"
    print("Using CPU - you will need to use a GPU to train models")

# Authenticate with Hugging Face (optional, for private models)
from huggingface_hub import login
# login()  # Uncomment if you need to access private models


Using CUDA GPU: Tesla T4
GPU memory: 15.8GB


### Load SmolLM3 Models

Now let's load the base and instruct models for comparison.


In [4]:
# Load both base and instruct models for comparison
base_model_name = "HuggingFaceTB/SmolLM3-3B-Base"
instruct_model_name = "HuggingFaceTB/SmolLM3-3B"

# Load tokenizers
base_tokenizer = AutoTokenizer.from_pretrained(base_model_name)
instruct_tokenizer = AutoTokenizer.from_pretrained(instruct_model_name)

# Load models (use smaller precision for memory efficiency)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name, dtype=torch.float16, device_map="auto"
)

instruct_model = AutoModelForCausalLM.from_pretrained(
    instruct_model_name, dtype=torch.float16, device_map="auto"
)

print("Models loaded successfully!")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/151 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/289 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/943 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/126 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/182 [00:00<?, ?B/s]

Models loaded successfully!


### Explore Chat Template Formatting

Now let's explore the chat template formatting. We will create different types of conversations to test.


In [10]:
# Create different types of conversations to test
conversations = {
    "simple_qa": [
        {"role": "system", "content": "/no_think"},
        {"role": "user", "content": "What is machine learning?"},
    ],
    "with_system": [
        {
            "role": "system",
            "content": "You are a helpful AI assistant specialized in explaining technical concepts clearly. /no_think",
        },
        {"role": "user", "content": "What is machine learning?"},
    ],
    "multi_turn": [
        {"role": "system", "content": "You are a math tutor. /no_think"},
        {"role": "user", "content": "What is calculus?"},
        {
            "role": "assistant",
            "content": "Calculus is a branch of mathematics that deals with rates of change and accumulation of quantities.",
        },
        {"role": "user", "content": "Can you give me a simple example?"},
    ],
    "reasoning_task": [
        {"role": "system", "content": "/think"},
        {
            "role": "user",
            "content": "Solve step by step: If a train travels 120 miles in 2 hours, what is its average speed?",
        },
    ],
}

for conv_type, messages in conversations.items():
    print(f"--- {conv_type.upper()} ---")

    # Format without generation prompt (for completed conversations)
    formatted_complete = instruct_tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False
    )

    # Format with generation prompt (for inference)
    formatted_prompt = instruct_tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    print("Complete conversation format:")
    print(formatted_complete)
    print("\nWith generation prompt:")
    print(formatted_prompt)
    print("\n" + "=" * 50 + "\n")


--- SIMPLE_QA ---
Complete conversation format:
<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning Mode: /no_think

## Custom Instructions

You are a helpful AI assistant named SmolLM, trained by Hugging Face.

<|im_start|>user
What is machine learning?<|im_end|>


With generation prompt:
<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning Mode: /no_think

## Custom Instructions

You are a helpful AI assistant named SmolLM, trained by Hugging Face.

<|im_start|>user
What is machine learning?<|im_end|>
<|im_start|>assistant
<think>

</think>



--- WITH_SYSTEM ---
Complete conversation format:
<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning Mode: /no_think

## Custom Instructions

You are a helpful AI assistant specialized in explaining technical concepts clearly.

<|im_start|>user
What is machine learning?<|im_end|>


With gene

**Step 4: Compare Base vs Instruct Model Responses**


In [11]:
# Test the same prompt on both models
test_prompt = "Explain quantum computing in simple terms."

# Prepare the prompt for base model (no chat template)
base_inputs = base_tokenizer(test_prompt, return_tensors="pt").to(device)

# Prepare the prompt for instruct model (with chat template)
instruct_messages = [
    {"role": "system", "content": "/no_think"},
    {"role": "user", "content": test_prompt}
]
instruct_formatted = instruct_tokenizer.apply_chat_template(
    instruct_messages, tokenize=False, add_generation_prompt=True
)
instruct_inputs = instruct_tokenizer(instruct_formatted, return_tensors="pt").to(device)

# Generate responses
print("=== Model comparison ===\n")

print("ðŸ¤– BASE MODEL RESPONSE:")
with torch.no_grad():
    base_outputs = base_model.generate(
        **base_inputs,
        max_new_tokens=150,
        temperature=0.7,
        do_sample=True,
        pad_token_id=base_tokenizer.eos_token_id,
    )
    base_response = base_tokenizer.decode(base_outputs[0], skip_special_tokens=True)
    print(base_response[len(test_prompt) :])  # Show only the generated part

print("\n" + "=" * 50)
print("Instruct model response:")
with torch.no_grad():
    instruct_outputs = instruct_model.generate(
        **instruct_inputs,
        max_new_tokens=150,
        temperature=0.7,
        do_sample=True,
        pad_token_id=instruct_tokenizer.eos_token_id,
    )
    instruct_response = instruct_tokenizer.decode(
        instruct_outputs[0], skip_special_tokens=True
    )
    # Extract only the assistant's response
    assistant_start = instruct_response.find("<|im_start|>assistant\n") + len(
        "<|im_start|>assistant\n"
    )
    assistant_response = instruct_response[assistant_start:]
    print(assistant_response)


=== Model comparison ===

ðŸ¤– BASE MODEL RESPONSE:
 How is it different from classical computing?
Quantum computing is a rapidly emerging field that leverages the unique properties of quantum mechanics to perform calculations that are beyond the capabilities of classical computers. In classical computing, information is represented in bits that can be either 0 or 1, which is binary. Quantum computing, on the other hand, uses quantum bits or qubits, which can exist in a superposition of both 0 and 1 at the same time. This superposition allows quantum computers to explore multiple possibilities simultaneously, vastly increasing their computational power.
Quantum computing is fundamentally different from classical computing. While classical computers rely on classical logic, quantum computers utilize the principles of quantum mechanics, such as superposition and entanglement. This difference

Instruct model response:
nowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning 

**Step 5: Test Dual-Mode Reasoning**


In [12]:
# Test SmolLM3's reasoning capabilities
reasoning_prompts = [
    "What is 15 Ã— 24? Show your work.",
    "A recipe calls for 2 cups of flour for 12 cookies. How much flour is needed for 30 cookies?",
    "If I have $50 and spend $18.75 on lunch and $12.30 on a book, how much money do I have left?",
]

thinking_prompts = [
    "/no_think",
    "/think"
]

print("=== TESTING REASONING CAPABILITIES ===\n")

for thinking_prompt in thinking_prompts:
    print(f"Thinking prompt: {thinking_prompt}")
    for i, prompt in enumerate(reasoning_prompts, 1):
        print(f"Problem {i}: {prompt}")

        messages = [
            {"role":"system", "content": thinking_prompt},
            {"role": "user", "content": prompt}
        ]
        formatted_prompt = instruct_tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
        inputs = instruct_tokenizer(formatted_prompt, return_tensors="pt").to(device)

        with torch.no_grad():
            outputs = instruct_model.generate(
                **inputs,
                max_new_tokens=200,
                temperature=0.3,  # Lower temperature for more consistent reasoning
                do_sample=True,
                pad_token_id=instruct_tokenizer.eos_token_id,
            )
            response = instruct_tokenizer.decode(outputs[0], skip_special_tokens=True)
            assistant_start = response.find("<|im_start|>assistant\n") + len(
                "<|im_start|>assistant\n"
            )
            assistant_response = response[assistant_start:].split("<|im_end|>")[0]
            print(f"Answer: {assistant_response}")

        print("\n" + "-" * 50 + "\n")


=== TESTING REASONING CAPABILITIES ===

Thinking prompt: /no_think
Problem 1: What is 15 Ã— 24? Show your work.
Answer: nowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning Mode: /no_think

## Custom Instructions

You are a helpful AI assistant named SmolLM, trained by Hugging Face.

user
What is 15 Ã— 24? Show your work.
assistant
<think>

</think>
To solve 15 Ã— 24, we can use the standard multiplication algorithm. Here's how to do it step by step:

1. **Write down the numbers:**
   ```
   15
   Ã—24
   ```

2. **Multiply 15 by 4:**
   ```
   15
   Ã—24
   ----
     60  (15 Ã— 4)
   ```

3. **Multiply 15 by 20:**
   ```
   15
   Ã—24
   ----
     60  (15 Ã— 4)
   300  (15 Ã— 20)
   ----
   ```

4. **Add the two partial products:**
   ```
   15
   Ã—24
   ----
     60  (15 Ã— 4)
   300  (15 Ã— 20)
   ----
     360  (60 + 300)
   ```

So, the result of 

--------------------------------------------------

Problem 2: A recipe calls for 2 cups of flour for 12 cookies. H

### Validation

Run the code above and verify that you can see:
1. Different chat template formats for various conversation types
2. Clear differences between base model and instruct model responses
3. SmolLM3's reasoning capabilities in action

### Extension challenges

1. **Multilingual Testing**: Test SmolLM3's multilingual capabilities by asking questions in French, Spanish, or German
2. **Long Context**: Create a very long conversation and test the extended context capabilities
3. **Custom System Prompts**: Experiment with different system messages to change the model's behavior

---

## Exercise 2: Dataset Processing for SFT

**Objective**: Learn to process and prepare datasets for supervised fine-tuning using SmolTalk2 and other datasets.

**Prerequisites**: Completed Exercise 1, understanding of Python data processing.

### Implementation

**Step 1: Explore the SmolTalk2 Dataset**


In [13]:
# Load and explore the SmolTalk2 dataset
print("=== EXPLORING SMOLTALK2 DATASET ===\n")

# Load the SFT subset
dataset_dict = load_dataset("HuggingFaceTB/smoltalk2", "SFT")
print(f"Total splits: {len(dataset_dict)}")
print(f"Available splits: {list(dataset_dict.keys())}")
print(f"Number of total rows: {sum([dataset_dict[d].num_rows for d in dataset_dict])}")
print(f"Dataset structure: {dataset_dict}")



=== EXPLORING SMOLTALK2 DATASET ===



README.md: 0.00B [00:00, ?B/s]

Resolving data files:   0%|          | 0/124 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]

SFT/LongAlign_64k_Qwen3_32B_yarn_131k_th(â€¦):   0%|          | 0.00/135M [00:00<?, ?B/s]

SFT/LongAlign_64k_Qwen3_32B_yarn_131k_th(â€¦):   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading data:   0%|          | 0/113 [00:00<?, ?files/s]

SFT/OpenThoughts3_1.2M_think-00000-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00001-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00002-of-00(â€¦):   0%|          | 0.00/288M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00003-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00004-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00005-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00006-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00007-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00008-of-00(â€¦):   0%|          | 0.00/288M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00009-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00010-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00011-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00012-of-00(â€¦):   0%|          | 0.00/286M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00013-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00014-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00015-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00016-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00017-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00018-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00019-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00020-of-00(â€¦):   0%|          | 0.00/287M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00021-of-00(â€¦):   0%|          | 0.00/288M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00022-of-00(â€¦):   0%|          | 0.00/282M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00023-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00024-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00025-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00026-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00027-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00028-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00029-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00030-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00031-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00032-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00033-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00034-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00035-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00036-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00037-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00038-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00039-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00040-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00041-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00042-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00043-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00044-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00045-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00046-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00047-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00048-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00049-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00050-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00051-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00052-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00053-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00054-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00055-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00056-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00057-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00058-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00059-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00060-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00061-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00062-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00063-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00064-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00065-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00066-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00067-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00068-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00069-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00070-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00071-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00072-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00073-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00074-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00075-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00076-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00077-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00078-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00079-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00080-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00081-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00082-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00083-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00084-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00085-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00086-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00087-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00088-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00089-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00090-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00091-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00092-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00093-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00094-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00095-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00096-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00097-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00098-of-00(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00099-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00100-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00101-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00102-of-00(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00103-of-00(â€¦):   0%|          | 0.00/158M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00104-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00105-of-00(â€¦):   0%|          | 0.00/149M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00106-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00107-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00108-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00109-of-00(â€¦):   0%|          | 0.00/152M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00110-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00111-of-00(â€¦):   0%|          | 0.00/151M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_think-00112-of-00(â€¦):   0%|          | 0.00/150M [00:00<?, ?B/s]

SFT/aya_dataset_Qwen3_32B_think-00000-of(â€¦):   0%|          | 0.00/32.6M [00:00<?, ?B/s]

SFT/multi_turn_reasoning_if_think-00000-(â€¦):   0%|          | 0.00/178M [00:00<?, ?B/s]

SFT/s1k_1.1_think-00000-of-00001.parquet:   0%|          | 0.00/12.3M [00:00<?, ?B/s]

SFT/smolagents_toolcalling_traces_think-(â€¦):   0%|          | 0.00/81.8M [00:00<?, ?B/s]

SFT/smoltalk_everyday_convs_reasoning_Qw(â€¦):   0%|          | 0.00/6.33M [00:00<?, ?B/s]

SFT/smoltalk_multilingual8_Qwen3_32B_thi(â€¦):   0%|          | 0.00/264M [00:00<?, ?B/s]

SFT/smoltalk_multilingual8_Qwen3_32B_thi(â€¦):   0%|          | 0.00/265M [00:00<?, ?B/s]

SFT/smoltalk_multilingual8_Qwen3_32B_thi(â€¦):   0%|          | 0.00/265M [00:00<?, ?B/s]

SFT/smoltalk_multilingual8_Qwen3_32B_thi(â€¦):   0%|          | 0.00/264M [00:00<?, ?B/s]

SFT/smoltalk_systemchats_Qwen3_32B_think(â€¦):   0%|          | 0.00/64.9M [00:00<?, ?B/s]

SFT/table_gpt_Qwen3_32B_think-00000-of-0(â€¦):   0%|          | 0.00/32.9M [00:00<?, ?B/s]

SFT/LongAlign_64k_context_lang_annotated(â€¦):   0%|          | 0.00/199M [00:00<?, ?B/s]

SFT/Mixture_of_Thoughts_science_no_think(â€¦):   0%|          | 0.00/63.5M [00:00<?, ?B/s]

SFT/OpenHermes_2.5_no_think-00000-of-000(â€¦):   0%|          | 0.00/164M [00:00<?, ?B/s]

SFT/OpenHermes_2.5_no_think-00001-of-000(â€¦):   0%|          | 0.00/159M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_no_think_no_think(â€¦):   0%|          | 0.00/245M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_no_think_no_think(â€¦):   0%|          | 0.00/121M [00:00<?, ?B/s]

SFT/OpenThoughts3_1.2M_no_think_no_think(â€¦):   0%|          | 0.00/218M [00:00<?, ?B/s]

SFT/hermes_function_calling_v1_no_think-(â€¦):   0%|          | 0.00/10.8M [00:00<?, ?B/s]

SFT/smoltalk_multilingual_8languages_lan(â€¦):   0%|          | 0.00/158M [00:00<?, ?B/s]

SFT/smoltalk_multilingual_8languages_lan(â€¦):   0%|          | 0.00/159M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_everyday_conversati(â€¦):   0%|          | 0.00/899k [00:00<?, ?B/s]

SFT/smoltalk_smollm3_explore_instruct_re(â€¦):   0%|          | 0.00/5.34M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/230M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_magpie_ultra_n(â€¦):   0%|          | 0.00/231M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_rewrite_no_thi(â€¦):   0%|          | 0.00/38.5M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_smol_summarize_no_t(â€¦):   0%|          | 0.00/117M [00:00<?, ?B/s]

SFT/smoltalk_smollm3_systemchats_30k_no_(â€¦):   0%|          | 0.00/47.2M [00:00<?, ?B/s]

SFT/table_gpt_no_think-00000-of-00001.pa(â€¦):   0%|          | 0.00/12.6M [00:00<?, ?B/s]

SFT/tulu_3_sft_personas_instruction_foll(â€¦):   0%|          | 0.00/33.2M [00:00<?, ?B/s]

SFT/xlam_traces_no_think-00000-of-00001.(â€¦):   0%|          | 0.00/30.6M [00:00<?, ?B/s]

Generating LongAlign_64k_Qwen3_32B_yarn_131k_think split:   0%|          | 0/7526 [00:00<?, ? examples/s]

Generating OpenThoughts3_1.2M_think split:   0%|          | 0/1133524 [00:00<?, ? examples/s]

DatasetGenerationError: An error occurred while generating the dataset

In [14]:
# Function to process different dataset formats
def process_qa_dataset(examples, question_col, answer_col):
    """Process Q&A datasets into chat format"""
    processed = []

    for question, answer in zip(examples[question_col], examples[answer_col]):
        messages = [
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer},
        ]
        processed.append(messages)

    return {"messages": processed}


def process_instruction_dataset(examples):
    """Process instruction-following datasets"""
    processed = []

    for instruction, response in zip(examples["instruction"], examples["response"]):
        messages = [
            {"role": "user", "content": instruction},
            {"role": "assistant", "content": response},
        ]
        processed.append(messages)

    return {"messages": processed}


# Example: Process GSM8K math dataset
print("=== PROCESSING GSM8K DATASET ===\n")

gsm8k = load_dataset(
    "openai/gsm8k", "main", split="train[:100]"
)  # Small subset for demo
print(f"Original GSM8K example: {gsm8k[0]}")


# Convert to chat format
def process_gsm8k(examples):
    processed = []
    for question, answer in zip(examples["question"], examples["answer"]):
        messages = [
            {
                "role": "system",
                "content": "You are a math tutor. Solve problems step by step.",
            },
            {"role": "user", "content": question},
            {"role": "assistant", "content": answer},
        ]
        processed.append(messages)
    return {"messages": processed}


gsm8k_processed = gsm8k.map(
    process_gsm8k, batched=True, remove_columns=gsm8k.column_names
)
print(f"Processed example: {gsm8k_processed[0]}")


=== PROCESSING GSM8K DATASET ===



README.md: 0.00B [00:00, ?B/s]

main/train-00000-of-00001.parquet:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

main/test-00000-of-00001.parquet:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

Original GSM8K example: {'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Processed example: {'messages': [{'content': 'You are a math tutor. Solve problems step by step.', 'role': 'system'}, {'content': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?', 'role': 'user'}, {'content': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72', 'role': 'assistant'}]}


In [16]:
# Function to apply chat templates to processed datasets
def apply_chat_template_to_dataset(dataset, tokenizer):
    """Apply chat template to dataset for training"""

    def format_messages(examples):
        formatted_texts = []

        for messages in examples["messages"]:
            # Apply chat template
            formatted_text = tokenizer.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=False,  # We want the complete conversation
            )
            formatted_texts.append(formatted_text)

        return {"text": formatted_texts}

    return dataset.map(format_messages, batched=True)


# Apply to our processed GSM8K dataset
gsm8k_formatted = apply_chat_template_to_dataset(gsm8k_processed, instruct_tokenizer)
print("=== FORMATTED TRAINING DATA ===")
print(gsm8k_formatted[0]["text"])


=== FORMATTED TRAINING DATA ===
<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 26 November 2025
Reasoning Mode: /think

## Custom Instructions

You are a math tutor. Solve problems step by step.

<|im_start|>user
Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?<|im_end|>
<|im_start|>assistant
Natalia sold 48/2 = <<48/2=24>>24 clips in May.
Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.
#### 72<|im_end|>



---

## Exercise 3: Fine-Tuning SmolLM3 with SFTTrainer

**Objective**: Perform supervised fine-tuning on SmolLM3 using TRL's SFTTrainer with real datasets.

**Prerequisites**: Completed Exercise 2, GPU with at least 8GB VRAM (or Google Colab Pro).

### Implementation

**Step 1: Setup and Model Loading**


In [17]:
# Import required libraries for fine-tuning
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
import torch

# Load SmolLM3 base model for fine-tuning
model_name = "HuggingFaceTB/SmolLM3-3B"
new_model_name = "SmolLM3-Custom-SFT"

print(f"Loading {model_name}...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.float16,  # Use float16 for memory efficiency
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # Set padding token
tokenizer.padding_side = "right"  # Padding on the right for generation

print(f"Model loaded! Parameters: {model.num_parameters():,}")


Loading HuggingFaceTB/SmolLM3-3B...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Model loaded! Parameters: 3,075,098,624


**Step 2: Dataset Preparation**


In [33]:
# Load and prepare training dataset
print("=== PREPARING DATASET ===\n")

from datasets import load_dataset
import itertools # Neu: Wird fÃ¼r die ÃœberprÃ¼fung des Streams benÃ¶tigt

print("=== STARTE SAUBERE STREAMING-LADUNG (KEIN LOKALER CACHE) ===\n")

training_split_name = "smoltalk_everyday_convs_reasoning_Qwen3_32B_think"

# Schritt 1: Das Dataset als reinen Stream (IterableDataset) laden.
streaming_dataset = load_dataset(
    "HuggingFaceTB/smoltalk2",
    "SFT",
    split=training_split_name, # Kein Slicing hier
    streaming=True              # Aktiviert den Stream
)

# Schritt 2: Den Stream auf die ersten 1000 Elemente begrenzen.
# Der Lese- und Download-Prozess stoppt nach 1000 EintrÃ¤gen.
train_dataset = streaming_dataset.take(1000)

print(f"\n[ERFOLG] Das Training-Dataset (1000 Elemente) ist als Stream bereit.")

# --- NEUE SEKTION: DATENBESTÃ„TIGUNG ---
print("\n=== DATENBESTÃ„TIGUNG (Proof of Load) ===")

# Wir erstellen einen *neuen* Stream fÃ¼r die ÃœberprÃ¼fung,
# da der vorherige train_dataset-Stream fÃ¼r das Training benÃ¶tigt wird.
# ACHTUNG: Streaming-Objekte kÃ¶nnen nur einmal initialisiert werden.
# Wir laden das Dataset daher *noch einmal* im Streaming-Modus,
# um die ersten drei Elemente zu prÃ¼fen, ohne den Haupt-Stream zu beeinflussen.

check_stream = load_dataset(
    "HuggingFaceTB/smoltalk2",
    "SFT",
    split=training_split_name,
    streaming=True
).take(3) # Nur die ersten 3 Elemente zur schnellen ÃœberprÃ¼fung

# Konvertiere die ersten 3 gestreamten Elemente in eine Liste zur Anzeige
first_three = list(check_stream)

# Anzeige der DatensÃ¤tze
for i, item in enumerate(first_three):
    print(f"\n--- Beispiel {i+1} ---")
    # Das SFT-Format enthÃ¤lt den gesamten Text in einem 'text'-Feld
    if 'text' in item:
        # Zeigt nur die ersten 400 Zeichen und die GesamtlÃ¤nge des Textes
        print(f"LÃ¤nge: {len(item['text'])} Zeichen")
        print(item['text'][:400].replace('\n', ' ') + '...')
    else:
        # Fallback, falls die Struktur anders ist
        print(item)

print("\n=== BESTÃ„TIGUNG ABGESCHLOSSEN ===")
# WICHTIG: Die Variable train_dataset (aus Schritt 2) ist weiterhin
# der volle 1000-Elemente-Stream und kann jetzt im SFTTrainer verwendet werden.


# Option 1: Use SmolTalk2 (recommended for beginners)
#dataset = load_dataset("HuggingFaceTB/smoltalk2", "SFT")

# load_dataset mit Split-Slicing
#training_split_name = "smoltalk_everyday_convs_reasoning_Qwen3_32B_think"
# WICHTIG: Die Angabe des Slices erfolgt direkt im 'split'-Parameter.
# Dies lÃ¤dt nur die ersten 1000 Samples und den Rest nicht.
#train_dataset = load_dataset(
#    "HuggingFaceTB/smoltalk2",
#    "SFT",
#    split=f"{training_split_name}[:1000]",
#    streaming=True # Der entscheidende Parameter
#)
#training_split = "smoltalk_everyday_convs_reasoning_Qwen3_32B_think"
#train_dataset = dataset[training_split].select(range(1000))  # Use subset for faster training


=== PREPARING DATASET ===

=== STARTE SAUBERE STREAMING-LADUNG (KEIN LOKALER CACHE) ===



Resolving data files:   0%|          | 0/124 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]


[ERFOLG] Das Training-Dataset (1000 Elemente) ist als Stream bereit.

=== DATENBESTÃ„TIGUNG (Proof of Load) ===


Resolving data files:   0%|          | 0/124 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/113 [00:00<?, ?it/s]


--- Beispiel 1 ---
{'messages': [{'content': 'Hi there', 'role': 'user'}, {'content': '<think>\nOkay, the user sent "Hi there". That\'s a friendly greeting. I should respond in a welcoming way. Let me check the guidelines. I need to be helpful, keep the conversation going, and maybe ask how I can assist them. Let me make sure the tone is warm and approachable. Alright, something like "Hello! How can I assist you today?" That should work. Let me confirm there\'s no typo and it\'s in a natural, conversational style.\n</think>\n\nHello! How can I assist you today?', 'role': 'assistant'}, {'content': "I'm looking for a healthy breakfast idea. What's a good option?", 'role': 'user'}, {'content': "<think>\nOkay, the user is asking for a healthy breakfast idea. Let me think about what makes a breakfast healthy. It should be balanced, providing a mix of nutrients like protein, fiber, healthy fats, and some carbs. Let me brainstorm some options.\n\nMaybe start with a classic like avocado toast

In [35]:
# Configure training parameters
training_config = SFTConfig(
    # Model and data
    output_dir=f"./{new_model_name}",
    dataset_text_field="text",
    max_length=2048,

    # Training hyperparameters
    per_device_train_batch_size=2,  # Adjust based on your GPU memory
    gradient_accumulation_steps=2,
    learning_rate=5e-5,
    num_train_epochs=1,  # Start with 1 epoch
    max_steps=500,  # Limit steps for demo

    # Optimization
    warmup_steps=50,
    weight_decay=0.01,
    optim="adamw_torch",

    # Logging and saving
    logging_steps=10,
    save_steps=100,
    eval_steps=100,
    save_total_limit=2,

    # Memory optimization
    dataloader_num_workers=0,
    group_by_length=True,  # Group similar length sequences

    # Hugging Face Hub integration
    push_to_hub=False,  # Set to True to upload to Hub
    hub_model_id=f"your-username/{new_model_name}",

    # Experiment tracking
    report_to=["trackio"],  # Use trackio for experiment tracking
    run_name=f"{new_model_name}-training",
)

print("Training configuration set!")
print(f"Effective batch size: {training_config.per_device_train_batch_size * training_config.gradient_accumulation_steps}")

Training configuration set!
Effective batch size: 4


In [32]:
from datasets import Dataset

# HINWEIS: train_dataset ist immer noch der Stream (IterableDataset)

print("\n=== MATERIALISIERUNG: STREAM ZU CACHED DATASET ===\n")

# 1. Den Stream in eine Python-Liste von Dictionaries konvertieren.
# ACHTUNG: Der Stream wird HIERbei vollstÃ¤ndig ausgelesen (aber nur 1000 Elemente).
train_list = list(train_dataset)
print(f"Schritt 1: {len(train_list)} DatensÃ¤tze in den RAM geladen.")

# 2. Die Liste zurÃ¼ck in ein Dataset-Objekt konvertieren.
# HIER schreib die 'datasets'-Bibliothek die Daten in den lokalen Festplatten-Cache.
train_dataset_cached = Dataset.from_list(train_list)
print("Schritt 2: Dataset wurde in den lokalen Cache geschrieben.")


# 3. Den SFTTrainer mit dem neuen, gecachten Dataset und der Optimierung initialisieren
# (Vorausgesetzt, Sie haben zuvor den TrainingArguments-Parameter korrigiert!)
training_config.group_by_length = True # <-- DIES IST NUN MÃ–GLICH!

trainer = SFTTrainer(
    model=model,
    args=training_config,
    train_dataset=train_dataset_cached, # <-- WICHTIG: Verwenden Sie das gecachte Objekt
    tokenizer=tokenizer,
    dataset_text_field="text",
    packing=False,
)

print("\n[ERFOLG] SFTTrainer mit GROUP_BY_LENGTH=True erfolgreich initialisiert.")


# Initialize the SFTTrainer
#trainer = SFTTrainer(
#    model=model,
#    args=training_config,
#    train_dataset=train_dataset,
#)



The model is already on multiple devices. Skipping the move to device specified in `args`.


ValueError: the `--group_by_length` option is only available for `Dataset`, not `IterableDataset

In [None]:
# Start training!
print("\n=== STARTING TRAINING ===")
trainer.train()

# Save the model
trainer.save_model()
print(f"Model saved to {training_config.output_dir}")

# LoRA SFT with TRL + SmolLM3

This short notebook shows how to fine-tune a small model with LoRA adapters using TRL's SFTTrainer. It uses a tiny model (SmolLM2-135M) and a small public chat dataset for a quick demonstration.



In [None]:
from peft import LoraConfig

In [30]:
# LoRA config
peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# SFT config (short run)
sft_config = SFTConfig(
    output_dir="./smollm2-lora-demo",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    packing=True,
    learning_rate=2e-4,
    logging_steps=10,
    save_strategy="no",
    report_to="none",
)


NameError: name 'LoraConfig' is not defined

In [21]:
# Trainer
trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=train_dataset,
    peft_config=peft_config,
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    },
)

# Short demo train
trainer.train()


NameError: name 'sft_config' is not defined