# Fine-tune `GPT-OSS 20B` on `Unsloth` using Supervised Fine-Tuning (SFT)
---

We will use unsloth to fine tune the gpt oss model. Unsloth is an open source library that accelerates and optimizes the fine-tuning of large language models, by re writing the computations of triton kernels. It achieves this by combining techniques like QLoRA, Flash attention, and 4-bit quantization to reduce the memory usage and increase the training speed by up to 2x.

In [1]:
import torch
import numpy
import logging
# import fast language model library from unsloth
# fast language model is a core component within the Unsloth library, 
# designed to faciliate efficient and fast memory effective fine-tuning and inference of LLMs.
# Fast Language Model aims to reduce VRAM consumption during LLM operations, enabling the fine-tuning
# of larger models on modest hardware.
# This is also designed to be fully compatible with HF libraries like transformers, PEFT, TRL.
from unsloth import FastLanguageModel

NotImplementedError: Unsloth currently only works on NVIDIA GPUs and Intel GPUs.

In [None]:
# Create a logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Remove existing handlers
logger.handlers.clear()

# Add a simple handler
handler = logging.StreamHandler()
formatter = logging.Formatter('[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

In [None]:
# These are the 4bit pre quantized models that we support for 4x faster downloading + no OOMs
# These are all models available on hugging face hub
fourbit_modes = [
    "unsloth/gpt-oss-20b-unsloth-bnb-4bit", # 20B model using bitsandbytes 4bit quantization
    "unsloth/gpt-oss-120b-unsloth-bnb-4bit",
    "unsloth/gpt-oss-20b", # 20B model using MXFP4 format
    "unsloth/gpt-oss-120b",
]

# Now, we will load the pre-trained model. A pre-trained model is a model that has been previously
# trained on several large datasets with billions of parameters. This model has been trained on a large corpus of text data
# and has learned to generate human-like text. We will use this pre-trained model as a starting point for our fine-tuning process.
hf_model_name = "openai/gpt-oss-20b"
logger.info(f"Going to fine-tune the model {hf_model_name} using Unsloth")

In [None]:
logger.info(f"Going to load the model {hf_model_name} using unsloth fast language model")
# load the pre-trained model using unsloth fast language model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=hf_model_name,
    # This sets the precision of the model weights and computations
    dtype=None, 
    # Maximum context length the model will proces in one forward pass (number of tokens)
    max_seq_length=4096, 
    load_in_4bit=False, # load the model in 4bit mode
    full_finetuning=False,
    # token = ...
)
logger.info(f"Loaded the model {model} using unsloth fast language model")

# Reasoning effort for GPT OSS models
"""
The GPT OSS models are designed to handle complex reasoning tasks, including multi-step reasoning at varying levelts, 
that allows users to adjust the model's reasoning dwelling time based on the complexity of the task at hand. 

The gpt-oss models offer three distinct levels of reasoning effort you can choose from:

- Low: Optimized for tasks that need very fast responses and don't require complex, multi-step reasoning.
- Medium: A balance between performance and speed.
- High: Provides the strongest reasoning performance for tasks that require it, though this results in higher latency.
"""
from transformers import TextStreamer

# define the messages to experiment with
