# Exploring Chat Templates with SmolLM2

This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses.

In [1]:
# Install the requirements in Google Colab
!pip install transformers datasets trl huggingface_hub

# Authenticate to Hugging Face
from huggingface_hub import login
!huggingface-cli login

login()

# for convenience you can create an environment variable containing your hub token as HF_TOKEN


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
The token `smol-course` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [2]:
# Import necessary libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import setup_chat_format
import torch

## SmolLM2 Chat Template

Let's explore how to use a chat template with the `SmolLM2` model. We'll define a simple conversation and apply the chat template.

In [3]:
# Dynamically set the device
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name
).to(device)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/831 [00:00<?, ?B/s]

In [4]:
# Define messages for SmolLM2
messages = [
    {"role": "user", "content": "Hello, how are you?"},
    {
        "role": "assistant",
        "content": "I'm doing well, thank you! How can I assist you today?",
    },
]

# Apply chat template without tokenization

The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.


In [5]:
input_text = tokenizer.apply_chat_template(messages, tokenize=False)

print("Conversation with template:", input_text)

Conversation with template: <|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>



# Decode the conversation

Note that the conversation is represented as above but with a further assistant message.


In [6]:
input_text = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True
)

print("Conversation decoded:", tokenizer.decode(token_ids=input_text))

Conversation decoded: <|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>
<|im_start|>assistant



# Tokenize the conversation

Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.



In [7]:
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

print("Conversation tokenized:", input_text)

Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]


In [8]:
from IPython.core.display import display, HTML

display(
    HTML(
        """<iframe
  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
  frameborder="0"
  width="100%"
  height="360px"
></iframe>
"""
    )
)

In [9]:
from datasets import load_dataset
from transformers import AutoTokenizer

# Load the dataset
ds = load_dataset("HuggingFaceTB/smoltalk", "everyday-conversations")

# Initialize your tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Add a padding token to the tokenizer
tokenizer.pad_token = tokenizer.eos_token

# Define a function to process each sample in the dataset
def process_dataset(sample):
    # Extract the existing messages
    original_messages = sample["messages"]

    # Define a system message to set the context
    messages = [
        {
            "role": "system",
            "content": "You are a friendly and knowledgeable assistant. Engage in a natural, helpful conversation."
        }
    ] + original_messages

    # Convert the messages to a string suitable for the model.
    chat_str = ""
    for msg in messages:
        chat_str += f"{msg['role'].capitalize()}: {msg['content']}\n"
    chat_str = chat_str.strip()

    # Tokenize using the tokenizer
    tokenized = tokenizer(
        chat_str,
        return_tensors="pt",  # Return PyTorch tensors
        padding="longest",
        truncation=True,
        max_length=1024
    )

    # Return the processed sample
    return {
        "messages": messages,
        "input_ids": tokenized["input_ids"][0],
        "attention_mask": tokenized["attention_mask"][0]
    }

# Apply the mapping
# Remove the original columns if desired, leaving only processed fields
ds = ds.map(process_dataset, remove_columns=ds["train"].column_names)

# Now ds["train"][0] might look like:
# {
#   "messages": [
#       {"role":"system","content":"You are a friendly..."},
#       {"role":"user","content":"Hi there! How are you?"},
#       {"role":"assistant","content":"I'm doing well, thank you!"},
#       ...
#   ],
#   "input_ids": [...],
#   "attention_mask": [...]
# }

README.md:   0%|          | 0.00/9.25k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/946k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/52.6k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2260 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/119 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/2260 [00:00<?, ? examples/s]

Map:   0%|          | 0/119 [00:00<?, ? examples/s]

In [10]:
display(
    HTML(
        """<iframe
  src="https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train"
  frameborder="0"
  width="100%"
  height="360px"
></iframe>
"""
    )
)

In [11]:
from datasets import load_dataset
from IPython.display import display
from transformers import AutoTokenizer

# Load the dataset
ds = load_dataset("openai/gsm8k", "main")

# Initialize your tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Add a padding token to the tokenizer
tokenizer.pad_token = tokenizer.eos_token

# Define a function to process each sample in the dataset
def process_dataset(sample):
    # Extract the question and answer
    question = sample["question"]
    answer = sample["answer"]

    # Create a message structure:
    # System message to provide context/instructions
    # User message with the question
    # Assistant message with the answer
    messages = [
        {
            "role": "system",
            "content": "You are a helpful math tutor. Provide clear reasoning steps and the final answer at the end."
        },
        {
            "role": "user",
            "content": question
        },
        {
            "role": "assistant",
            "content": answer
        }
    ]

    # Convert the messages to a string suitable for the model.
    chat_str = ""
    for msg in messages:
        chat_str += f"{msg['role'].capitalize()}: {msg['content']}\n"
    chat_str = chat_str.strip()

    # Tokenize using the tokenizer
    tokenized = tokenizer(
        chat_str,
        return_tensors="pt",  # Return PyTorch tensors
        padding="longest",
        truncation=True,
        max_length=1024
    )

    # Return the processed sample
    return {
        "messages": messages,
        "input_ids": tokenized["input_ids"][0],
        "attention_mask": tokenized["attention_mask"][0]
    }

# Apply the mapping
# Remove the original columns if desired, leaving only processed fields
ds = ds.map(process_dataset, remove_columns=ds["train"].column_names)

# Now each sample in ds["train"] and ds["test"] will have:
# {
#   "messages": [
#       {"role": "system", "content": "You are a friendly ..."},
#       {"role": "user", "content": "Hi there"},
#       {"role": "assistant", "content": "Hello! How can I help ..."},
#       ...
#   ],
#   "input_ids": [...],
#   "attention_mask": [...]
# }

README.md:   0%|          | 0.00/7.94k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

Map:   0%|          | 0/7473 [00:00<?, ? examples/s]

Map:   0%|          | 0/1319 [00:00<?, ? examples/s]

## Conclusion

This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.

In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood.