# Preference Alignment with Odds Ratio Preference Optimization (ORPO)

This notebook will guide you through the process of fine-tuning a language model using Odds Ratio Preference Optimization (ORPO). We will use the SmolLM2-135M model which has **not** been through SFT training, so it is not compatible with DPO. This means, you cannot use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).

<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>
     <h2 style='margin: 0;color:blue'>Exercise: Aligning SmolLM2 with ORPOTrainer</h2>
     <p>Take a dataset from the Hugging Face hub and align a model on it. </p>
     <p><b>Difficulty Levels</b></p>
     <p>🐢 Use the `trl-lib/ultrafeedback_binarized` dataset</p>
     <p>🐕 Try out the `argilla/ultrafeedback-binarized-preferences` dataset</p>
     <p>🦁 Try on a subset of mlabonne's `orpo-dpo-mix-40k` dataset</p>
</div>



In [1]:
!pip install -U datasets trl



In [2]:
!pip install -U bitsandbytes



## Import libraries


In [3]:

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)
from trl import ORPOConfig, ORPOTrainer, setup_chat_format
import os
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)
from trl import ORPOConfig, ORPOTrainer, setup_chat_format

# Authenticate to Hugging Face
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# Testing with Data

## Format dataset

In [7]:
dataset = load_dataset(path="argilla/ultrafeedback-binarized-preferences")
dataset

README.md:   0%|          | 0.00/8.62k [00:00<?, ?B/s]

(…)-00000-of-00001-9dffc9d46d32c335.parquet:   0%|          | 0.00/110M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/63619 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['source', 'instruction', 'chosen_response', 'rejected_response', 'chosen_avg_rating', 'rejected_avg_rating', 'chosen_model'],
        num_rows: 63619
    })
})

In [10]:
dataset = load_dataset(path="mlabonne/orpo-dpo-mix-40k", split = 'train[:10]')
dataset

Dataset({
    features: ['source', 'chosen', 'rejected', 'prompt', 'question'],
    num_rows: 10
})

In [35]:
# Load dataset
dataset_1 = load_dataset("Intel/orca_dpo_pairs")['train']

README.md:   0%|          | 0.00/196 [00:00<?, ?B/s]

orca_rlhf.jsonl:   0%|          | 0.00/36.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/12859 [00:00<?, ? examples/s]

In [36]:
dataset_1

Dataset({
    features: ['system', 'question', 'chosen', 'rejected'],
    num_rows: 12859
})

In [39]:
dataset_1['question'][0]

"You will be given a definition of a task first, then some input of the task.\nThis task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.\n\nAFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.\nOutput:"

In [40]:
dataset_1['chosen']

['[\n  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n]',
 'Midsummer House is a moderately priced Chinese restaurant with a 3/5 customer rating, located near All Bar One.',
 'C. She then dips the needle in ink and using the pencil to draw a design on her leg, rubbing it off with a rag in the end. In this option, she is continuing the process of using the needle, pencil, and thread, which is most related to what she was doing in the previous sentence.',
 'Based on the passage, discuss the primary motivations and outcomes of the 1901 Federation of Australia, including the roles and responsibilities of the federal government, as well as the continued governmental structures of the individual states involved.',
 'James pays the minor characters $15,000 each episode. Since there are 4 minor characters, he pays them a total of 4 * $15,000 = $60,000 per episode.\n\nThe major characters are paid three times as mu

In [6]:
# Load dataset

# TODO: 🦁🐕 change the dataset to one of your choosing
dataset = load_dataset(path="argilla/ultrafeedback-binarized-preferences")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/643 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/131M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/2.14M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/62135 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['chosen', 'rejected', 'score_chosen', 'score_rejected'],
        num_rows: 62135
    })
    test: Dataset({
        features: ['chosen', 'rejected', 'score_chosen', 'score_rejected'],
        num_rows: 1000
    })
})

In [52]:
dataset = load_dataset(path="mlabonne/orpo-dpo-mix-40k", split = 'train[:2]')
dataset

Dataset({
    features: ['source', 'chosen', 'rejected', 'prompt', 'question'],
    num_rows: 2
})

In [None]:
# TODO: 🐕 If your dataset is not represented as conversation lists, you can use the `process_dataset` function to convert it.

In [46]:
dataset['chosen'][0]

[{'content': 'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\n\nNow, immerse me in this ethereal journey through Zephyria.',
  'role': 'user'},
 {'content': "As you step onto the teleportation platform, there's a momentary sense of disorientation before your surroundings change abruptly. You find yourself standing on the outskirts of Zephyria, gazing at the sprawling metropolis that glows softly under the starlit canvas above. A gentle breeze, carrying hints of exotic fragrances from unknown

In [47]:
dataset['rejected'][0]

[{'content': 'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\n\nNow, immerse me in this ethereal journey through Zephyria.',
  'role': 'user'},
 {'content': "As you step onto the teleportation platform, there's a momentary sense of disorientation before your surroundings change abruptly. You find yourself standing on the outskirts of Zephyria, gazing at the sprawling metropolis that glows softly under the starlit canvas above. A gentle breeze, carrying hints of exotic fragrances from unknown

In [48]:
dataset['prompt']

['The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\n\nNow, immerse me in this ethereal journey through Zephyria.',
 'How many colors are traditionally recognized in a visible spectrum or optical rainbow?']

In [42]:
dataset['question'][1]

'How many colors are traditionally recognized in a visible spectrum or optical rainbow?'

In [55]:
from transformers import AutoTokenizer
from datasets import Dataset

# Define the chat template
#chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}"

def process_dataset(example):
    # Format instruction using the chat template
    message = {"role": "user", "content": example['prompt']}
    prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=False)

    # Format chosen answer using the chat template
    chosen_message = {"role": "assistant", "content": str(example['chosen'])}
    chosen = tokenizer.apply_chat_template([chosen_message], tokenize=False, add_generation_prompt=False)

    # Format rejected answer using the chat template
    rejected_message = {"role": "assistant", "content": str(example['rejected'])}
    rejected = tokenizer.apply_chat_template([rejected_message], tokenize=False, add_generation_prompt=False)

    return {
        "prompt": prompt,
        "chosen": chosen,
        "rejected": rejected,
    }






# Correct the tokenizer settings
#tokenizer.pad_token = tokenizer.eos_token # Set pad_token to eos_token as suggested
#tokenizer.padding_side = "left" # Set padding side to left

# Set eos and bos token correctly
#tokenizer.bos_token = "<|begin_of_text|>"
#tokenizer.eos_token = "<|end_of_text|>"

# No need to add special tokens because the tokenizer already has them

#tokenizer.chat_template = chat_template
#tokenizer.add_eos_token = True

# save columns
original_columns = dataset.column_names

#format dataset
train_dataset = dataset.map(
    function= process_dataset,
    remove_columns=original_columns
)

# checking only one example
print(train_dataset[0])

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

In [56]:
train_dataset

Dataset({
    features: ['chosen', 'rejected', 'prompt'],
    num_rows: 2
})

In [59]:
train_dataset['prompt']

['<|im_start|>user\nThe setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\n\nNow, immerse me in this ethereal journey through Zephyria.<|im_end|>\n<|im_start|>assistant\n',
 '<|im_start|>user\nHow many colors are traditionally recognized in a visible spectrum or optical rainbow?<|im_end|>\n<|im_start|>assistant\n']

In [57]:
train_dataset['rejected'][0]

'<|im_start|>assistant\n[{\'content\': \'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\\\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\\n\\nNow, immerse me in this ethereal journey through Zephyria.\', \'role\': \'user\'}, {\'content\': "As you step onto the teleportation platform, there\'s a momentary sense of disorientation before your surroundings change abruptly. You find yourself standing on the outskirts of Zephyria, gazing at the sprawling metropolis that glows softly under the starlit canvas above. A gentle breeze, carrying hin

In [61]:
train_dataset['chosen'][0]

'<|im_start|>assistant\n[{\'content\': \'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\\\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\\n\\nNow, immerse me in this ethereal journey through Zephyria.\', \'role\': \'user\'}, {\'content\': "As you step onto the teleportation platform, there\'s a momentary sense of disorientation before your surroundings change abruptly. You find yourself standing on the outskirts of Zephyria, gazing at the sprawling metropolis that glows softly under the starlit canvas above. A gentle breeze, carrying hin

In [32]:
train_dataset['chosen'][0]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 23 Jan 2025\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[{\'content\': \'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\\\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\\n\\nNow, immerse me in this ethereal journey through Zephyria.\', \'role\': \'user\'}, {\'content\': "As you step onto the teleportation platform, there\'s a momentary sense of disorientation before your surroundings change abruptly. You fi

In [29]:
train_dataset['chosen'][0]

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 23 Jan 2025\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[{\'content\': \'The setting is an otherworldly, yet eerily familiar, metropolis known as "Zephyria." It\\\'s a city suspended in the ether, floating amidst nebulous clouds of cosmic dust. The architecture here is surreal and alien, with buildings that twist and spiral like strands of DNA, reaching towards the cosmos. Streets are paved with luminescent cobblestones, casting soft hues of blues and purples, reflecting off iridescent structures. Strange vegetation, vibrant and bioluminescent, thrives everywhere, creating a stark contrast against the deep indigo sky.\\n\\nNow, immerse me in this ethereal journey through Zephyria.\', \'role\': \'user\'}, {\'content\': "As you step onto the teleportation platform, there\'s a momentary sense of disorientation before your surroundings change abruptly. You fi

In [6]:
print(tokenizer.chat_template)

In [5]:
print(tokenizer.chat_template)

# Testing with data end

# Data Preparation

In [4]:
from datasets import load_dataset

ds = load_dataset("HumanLLMs/Human-Like-DPO-Dataset", split = 'train[:10]')
ds

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 10
})

In [76]:
ds['prompt'][0]

'Oh, I just saw the best meme - have you seen it?'

In [75]:
ds['chosen'][0]

"😂 Ah, no I haven't! I'm dying to know, what's the meme about? Is it a funny cat or a ridiculous situation? Spill the beans! 🤣"

In [6]:
from transformers import AutoTokenizer
from datasets import Dataset

# Define the chat template
#chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}"

def process_dataset(example):
    # Format instruction using the chat template
    message = {"role": "user", "content": str(example['prompt'])}
    prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=False)

    # Format chosen answer using the chat template
    chosen_message = {"role": "assistant", "content": str(example['chosen'])}
    chosen = tokenizer.apply_chat_template([chosen_message], tokenize=False, add_generation_prompt=False)

    # Format rejected answer using the chat template
    rejected_message = {"role": "assistant", "content": str(example['rejected'])}
    rejected = tokenizer.apply_chat_template([rejected_message], tokenize=False, add_generation_prompt=False)

    return {
        "prompt": prompt,
        "chosen": chosen,
        "rejected": rejected,
    }






# Correct the tokenizer settings
#tokenizer.pad_token = tokenizer.eos_token # Set pad_token to eos_token as suggested
#tokenizer.padding_side = "left" # Set padding side to left

# Set eos and bos token correctly
#tokenizer.bos_token = "<|begin_of_text|>"
#tokenizer.eos_token = "<|end_of_text|>"

# No need to add special tokens because the tokenizer already has them

#tokenizer.chat_template = chat_template
#tokenizer.add_eos_token = True

# save columns
#original_columns = dataset.column_names

#format dataset
train_dataset = ds.map(
    function= process_dataset,
    #remove_columns=original_columns
)



In [7]:
train_dataset

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 10
})

In [87]:
train_dataset['chosen'][0]

"<|im_start|>assistant\n😂 Ah, no I haven't! I'm dying to know, what's the meme about? Is it a funny cat or a ridiculous situation? Spill the beans! 🤣<|im_end|>\n"

In [88]:
train_dataset['prompt'][0]

'<|im_start|>user\nOh, I just saw the best meme - have you seen it?<|im_end|>\n'

In [89]:
train_dataset['rejected'][0]

"<|im_start|>assistant\nI'm an artificial intelligence language model, I don't have personal experiences or opinions. However, I can provide you with information on highly-rated and critically acclaimed films, as well as recommendations based on specific genres or themes. Would you like me to suggest some notable movies or discuss a particular genre of interest?<|im_end|>\n"

In [71]:
tokenizer.chat_template

"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

In [70]:
tokenizer

PreTrainedTokenizerFast(name_or_path='meta-llama/Llama-3.2-1B-Instruct', vocab_size=128000, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|im_start|>', 'eos_token': '<|im_end|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	128000: AddedToken("<|begin_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128001: AddedToken("<|end_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128002: AddedToken("<|reserved_special_token_0|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128003: AddedToken("<|reserved_special_token_1|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128004: AddedToken("<|finetune_right_pad_id|>", rstrip=False, lstrip=False, single_word=False, nor

## Define the model

In [5]:
model_name = "meta-llama/Llama-3.2-1B-Instruct"

device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name,
    torch_dtype=torch.float32,
).to(device)
model.config.use_cache = False
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set the chat_template attribute to None before calling setup_chat_format
tokenizer.chat_template = None
model, tokenizer = setup_chat_format(model, tokenizer)

# Set our name for the finetune to be saved &/ uploaded to
finetune_name = "SmolLM2-FT-ORPO"
finetune_tags = ["smol-course", "module_1"]

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


In [12]:
tokenizer

PreTrainedTokenizerFast(name_or_path='meta-llama/Llama-3.2-1B-Instruct', vocab_size=128000, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|im_start|>', 'eos_token': '<|im_end|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	128000: AddedToken("<|begin_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128001: AddedToken("<|end_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128002: AddedToken("<|reserved_special_token_0|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128003: AddedToken("<|reserved_special_token_1|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128004: AddedToken("<|finetune_right_pad_id|>", rstrip=False, lstrip=False, single_word=False, nor

In [7]:
tokenizer.chat_template

"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

## Train model with ORPO

In [13]:
orpo_args = ORPOConfig(
    # Small learning rate to prevent catastrophic forgetting
    learning_rate=8e-6,
    # Linear learning rate decay over training
    lr_scheduler_type="linear",
    # Maximum combined length of prompt + completion
    max_length=1024,
    # Maximum length for input prompts
    max_prompt_length=512,
    # Controls weight of the odds ratio loss (λ in paper)
    beta=0.1,

    # Batch size for training
    per_device_train_batch_size=1,
    #per_device_eval_batch_size=2,
    # Helps with training stability by accumulating gradients before updating
    gradient_accumulation_steps=20,
    # Memory-efficient optimizer for CUDA, falls back to adamw_torch for CPU/MPS
    # Memory-efficient optimizer for CUDA, falls back to adamw_torch for CPU/MPS
    #optim="paged_adamw_8bit" if device == "cuda" else "adamw_torch",
    # Number of training epochs
    num_train_epochs=1,
    fp16=True,
    # When to run evaluation
    #evaluation_strategy="steps",
    # Evaluate every 20% of training
    #eval_steps=0.2,
    # Log metrics every step
    logging_steps=1,
    # Gradual learning rate warmup
    warmup_steps=10,
    # Disable external logging
    report_to="none",
    # Where to save model/checkpoints
    output_dir="./results/",
    #orpo_alpha=1.0,  # Controls strength of preference optimization
    #orpo_beta=0.1,   # Temperature parameter for odds ratio
    # Enable MPS (Metal Performance Shaders) if available
    use_mps_device=device == "mps",
    hub_model_id=finetune_name,
)

In [14]:
trainer = ORPOTrainer(
    model=model,
    args=orpo_args,
    train_dataset=train_dataset,
    #eval_dataset=dataset["test"],
    processing_class=tokenizer,
)



In [None]:
import torch
torch.cuda.empty_cache()  # Clear GPU cache
trainer.train()

# Save the model
trainer.save_model(f"./{finetune_name}")



Step,Training Loss
1,1.0232


## 💐 You're done!

This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `ORPOTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:

- Try this notebook on a harder difficulty
- Review a colleagues PR
- Improve the course material via an Issue or PR.