<a href="https://colab.research.google.com/github/lrobsky/oss-gpt-20b-leetcode-finetune/blob/main/gpt_oss_(20B)_Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Finetuning OSS-GPT-20B with [unsloth](https://unsloth.ai/) on, dataset [newfacade/LeetCodeDataset ](https://huggingface.co/datasets/newfacade/LeetCodeDataset)


Install dependencies


In [None]:
!pip install --upgrade -qqq uv
try: import numpy; get_numpy = f"numpy=={numpy.__version__}"
except: get_numpy = "numpy"
!uv pip install -qqq \
    "torch>=2.8.0" "triton>=3.4.0" {get_numpy} torchvision bitsandbytes "transformers>=4.55.3" \
    "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
    "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
    git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels
!uv pip install transformers==4.55.4


from unsloth import FastLanguageModel

from unsloth.chat_templates import standardize_sharegpt
from trl import SFTConfig, SFTTrainer
from datasets import load_dataset
from google.colab import userdata # add secret with HF token
import torch

[2mUsing Python 3.12.11 environment at: /usr[0m
[2mAudited [1m1 package[0m [2min 365ms[0m[0m
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


Load base model and set model parameters

In [None]:
max_seq_length = 1024
dtype = None

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gpt-oss-20b",
    dtype = dtype,
    max_seq_length = max_seq_length,
    load_in_4bit = True,  # 4 bit quantization to allow model loading on colab
    full_finetuning = False)


#add adapter layers
model = FastLanguageModel.get_peft_model(
    model,
    r = 8,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None)

==((====))==  Unsloth 2025.8.10: Fast Gpt_Oss patching. Transformers: 4.55.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gpt_oss won't work! Using float32.
Unsloth: Gpt_Oss does not support SDPA - switching to fast eager.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Unsloth: Making `model.base_model.model.model` require gradients


<a name="Data"></a>
### **Data Prep**

### Format dataset queries and responses into prompts for finetuning

In [None]:
def formatting_prompts_func(examples):
    # Construct the messages list from 'query' and 'response' columns
    convos = []
    for i in range(len(examples["query"])):
        messages = [
            {"role": "user", "content": examples["query"][i]},
            {"role": "assistant", "content": examples["response"][i]}]

        convos.append(messages)

    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return {"text" : texts}



dataset = load_dataset("newfacade/LeetCodeDataset", split="train")
dataset

# map to correct gpt-oss prompt format
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True)

#example of how a dataset entry looks like
print(dataset[0]['text'])

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-31

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests.

### Question:
Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
You can return the answer in any order.
 
Example 1:

Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].

Example 2:

Input: nums = [3,2,4], target = 6
Output:

<a name="Train"></a>
### **Training**


### Set up trainer with model, tokenizer, dataset, and training configuration

In [None]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = -1,
        max_steps = 200, # set either max_steps or num_train_epochs to -1
        learning_rate = 1e-4,
        logging_steps = 1,
        max_grad_norm = 1.0, # clip gradients
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Switching to float32 training since model cannot work with float16


### Begin training


In [None]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,641 | Num Epochs = 1 | Total steps = 200
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 3,981,312 of 20,918,738,496 (0.02% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,entropy
1,5.2447,0
2,6.1938,No Log
3,5.3277,No Log
4,5.6946,No Log
5,5.3447,No Log
6,4.0401,No Log
7,3.2906,No Log
8,3.1117,No Log
9,2.5492,No Log
10,2.2613,No Log


TrainOutput(global_step=200, training_loss=0.7081063675880432, metrics={'train_runtime': 5607.8948, 'train_samples_per_second': 0.143, 'train_steps_per_second': 0.036, 'total_flos': 8.496162193888051e+16, 'train_loss': 0.7081063675880432, 'epoch': 0.3029155622870125})

<a name="Save"></a>
### Save finetuned model to HuggingFace



In [None]:


# save locally
model.save_pretrained("finetuned_model")
tokenizer.save_pretrained("finetuned_model")

# push model to HF
hf_name = "lrobsky/gpt-oss-20b-finetuned-leetcode"

model.push_to_hub(hf_name, token = userdata.get('HF_TOKEN')) # Save to HF - ONLY adapter weights
# model.push_to_hub(hf_name,save_method = "merged_16bit", token = userdata.get('HF_TOKEN')) # Save to HF - MERGED (16bit) with base model


# push tokenizer to HF
tokenizer.push_to_hub(hf_name,token = userdata.get('HF_TOKEN'))

In [None]:
# below is code that was tested to try and save the model in GGUF format, sadly it does not work at the moment
model = FastLanguageModel.for_inference(model)  # Merge LoRA weights

merged_model = model.merge_and_unload()
merged_model.save_pretrained("finetuned_model_merged")
tokenizer.save_pretrained("finetuned_model_merged")


# model.save_pretrained(
#     "finetuned_model_merged",
#     tokenizer,
#     save_method="merged_16bit")



# Convert to GGUF
model.save_pretrained_gguf("finetuned_model_merged", tokenizer, quantization_method="f16")
model.push_to_hub_gguf("lrobsky/gpt-oss-20b-finetuned-leetcode-GGUF", tokenizer, quantization_method="f16", token=userdata.get('HF_TOKEN'))

Unsloth: Updating system package directories
Unsloth: Install GGUF and other packages
Unsloth GGUF:hf-to-gguf:Loading model: finetuned_model_merged
Unsloth GGUF:hf-to-gguf:Model architecture: GptOssForCausalLM
Unsloth GGUF:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
Unsloth GGUF:hf-to-gguf:Exporting model...
Unsloth GGUF:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
Unsloth GGUF:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Unsloth GGUF:hf-to-gguf:gguf: loading model part 'model-00002-of-00003.safetensors'
Unsloth GGUF:hf-to-gguf:gguf: loading model part 'model-00003-of-00003.safetensors'
Unsloth GGUF:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
Unsloth GGUF:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
Unsloth GGUF:hf-to-gguf:token_embd.weight,                 torch.float16 --> F16, shape = {2880, 201088}
Traceback (most recent call last):
  File "/con

RuntimeError: Unsloth: Failed to convert llama.cpp/unsloth_convert_hf_to_gguf.py to GGUF.

In [None]:
#TEST : save as GGUF
model.save_pretrained_gguf("finetuned_model", tokenizer, quantization_method = "f16")
model.push_to_hub_gguf("lrobsky/gpt-oss-20b-finetuned-leetcode-GGUF", tokenizer, quantization_method = "f16", token = userdata.get('HF_TOKEN'))

Unsloth: Updating system package directories
Unsloth: Install GGUF and other packages


RuntimeError: Unsloth: `config.json` does not exist inside `finetuned_model`.

In [None]:
# # Save the model
# model.save_pretrained("finetuned_model")
# tokenizer.save_pretrained("finetuned_model")

('finetuned_model/tokenizer_config.json',
 'finetuned_model/special_tokens_map.json',
 'finetuned_model/chat_template.jinja',
 'finetuned_model/tokenizer.json')

In [None]:
# #TEST : save as GGUF
# model.save_pretrained_gguf("finetuned_model", tokenizer, quantization_method = "f16")
# model.push_to_hub_gguf("lrobsky/gpt-oss-20b-finetuned-leetcode-GGUF", tokenizer, quantization_method = "f16", token = userdata.get('HF_TOKEN'))

Unsloth: Updating system package directories
Unsloth: Install GGUF and other packages


RuntimeError: Unsloth: `config.json` does not exist inside `finetuned_model`.