<a href="https://colab.research.google.com/github/khalilhimura/oumi-explore/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
</div>

👋 Welcome to Open Universal Machine Intelligence (Oumi)!

🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](hhttps://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

### Khalil's Comment
*   In order to run on Colab T4, used SmolLM2 360M instead
*   Chaged max_steps: 5
*   As expected, generated code from finetuned model not good
*   Gave to Gemini to fix

# Finetuning Overview

In this tutorial, we'll LoRA tune a large language model to produce "thoughts" before producing its output.

We'll use the Oumi framework to streamline the process and achieve high-quality results.

We'll cover the following topics:
1. Prerequisites
2. Data Preparation & Sanity Checks
3. Training Config Preparation
4. Launching Training
5. Monitoring Progress
6. Evaluation
7. Analyzing Results
8. Inference


# Prerequisites
## Oumi Installation

First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html).

If you have a GPU, you can run the following commands to install Oumi:

In [1]:
%pip install uv -q
!uv pip install oumi[gpu] vllm --no-progress --system

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.2/16.2 MB[0m [31m72.9 MB/s[0m eta [36m0:00:00[0m
[?25h[2mUsing Python 3.11.11 environment at: /usr[0m
[2mResolved [1m170 packages[0m [2min 3.71s[0m[0m
[2mPrepared [1m76 packages[0m [2min 8m 41s[0m[0m
[2mUninstalled [1m21 packages[0m [2min 1.26s[0m[0m
[2mInstalled [1m76 packages[0m [2min 779ms[0m[0m
 [32m+[39m [1maiofiles[0m[2m==24.1.0[0m
 [32m+[39m [1maioresponses[0m[2m==0.7.8[0m
 [32m+[39m [1mantlr4-python3-runtime[0m[2m==4.9.3[0m
 [32m+[39m [1mbitsandbytes[0m[2m==0.45.1[0m
 [32m+[39m [1mcolorama[0m[2m==0.4.6[0m
 [32m+[39m [1mdataproperty[0m[2m==1.1.0[0m
 [32m+[39m [1mdatasets[0m[2m==3.2.0[0m
 [32m+[39m [1mdill[0m[2m==0.3.8[0m
 [32m+[39m [1mevaluate[0m[2m==0.4.3[0m
 [32m+[39m [1mfastapi[0m[2m==0.115.7[0m
 [32m+[39m [1mfschat[0m[2m==0.2.36[0m
 [31m-[39m [1mfsspec[0m[2m==2024.10.0[0m
 [32m+[39m [1mfsspec[0m[2m==2024

## Creating our working directory
For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs.

In [1]:
from pathlib import Path

tutorial_dir = "finetuning_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

## Setup the environment

We'll need to set the following environment variables:
- [Optional] HF_TOKEN: Your [HuggingFace](https://huggingface.co/docs/hub/en/security-tokens) token, in case you want to access a private model.
- [Optional] WANDB_API_KEY: Your [wandb](https://wandb.ai) token, in case you want to log your experiments to wandb.

# Getting Started


## Data Preparation
Let's start by checking out our datasets, and seeing what the data looks like. The OpenO1-SFT dataset includes a variety of tasks, including code generation and explanation, with most examples having a "thought" produced prior to the output.

In [12]:
from oumi.builders import build_tokenizer
from oumi.core.configs import ModelParams
from oumi.datasets import PromptResponseDataset

# Initialize the dataset
tokenizer = build_tokenizer(
    ModelParams(model_name="HuggingFaceTB/SmolLM2-360M-Instruct")
)
dataset = PromptResponseDataset(
    tokenizer=tokenizer,
    hf_dataset_path="O1-OPEN/OpenO1-SFT",
    prompt_column="instruction",
    response_column="output",
)

# Print a few examples
for i in range(3):
    conversation = dataset.conversation(i)
    print(f"Example {i + 1}:")
    for message in conversation.messages:
        print(f"{message.role}: {message.content[:100]}...")  # Truncate for brevity
    print("\n")

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.76k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

[2025-01-30 15:16:31,667][oumi][rank0][pid:16046][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: PromptResponseDataset) dataset_name: 'O1-OPEN/OpenO1-SFT', dataset_path: 'None'...
[2025-01-30 15:16:32,982][oumi][rank0][pid:16046][MainThread][INFO]][base_map_dataset.py:472] Dataset Info:
	Split: train
	Version: 0.0.0
	Dataset size: 372897013
	Download size: 383545217
	Size: 756442230 bytes
	Rows: 77685
	Columns: ['instruction', 'output']
[2025-01-30 15:16:34,145][oumi][rank0][pid:16046][MainThread][INFO]][base_map_dataset.py:411] Loaded DataFrame with shape: (77685, 2). Columns:
instruction    object
output         object
dtype: object
Example 1:
user: Consider a regular octagon. How many different triangles can be formed if the octagon is placed insi...
assistant: <Thought>
Alright, I need to figure out how many different triangles can be formed in a regular octa...


Example 2:
user: Create a Python class that encodes a given number using the Full Kociołek Encr

## Model Preparation

For code generation, we want a model with strong general language understanding and coding capabilities.

We also want a model that is small enough to train and run on a single GPU.

Some good options include:
- ["microsoft/Phi-3-mini-128k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
- ["google/gemma-2b"](https://huggingface.co/google/gemma-2b)
- ["Qwen/Qwen2-1.5B-Instruct"](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)
- ["meta-llama/Llama-3.2-3B-Instruct"](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- ["HuggingFaceTB/SmolLM2-1.7B-Instruct"](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)


For this tutorial, we'll use "HuggingFaceTB/SmolLM2-1.7B-Instruct" as our base model.



## Initial Model Responses

Let's see how our model performs on an example prompt.

In [13]:
%%writefile $tutorial_dir/infer.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-360M-Instruct"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 256
  batch_size: 1

Overwriting finetuning_tutorial/infer.yaml


In [14]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-01-30 15:17:15,187][oumi][rank0][pid:16046][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-01-30 15:17:15,282][oumi][rank0][pid:16046][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.


model.safetensors:   0%|          | 0.00/724M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

[2025-01-30 15:17:34,433][oumi][rank0][pid:16046][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`
conversation_id=None messages=[USER: Write a Python function to implement the quicksort algorithm. Please include comments explaining each step., ASSISTANT: ```python
def quicksort(arr):
    # Base case: If the array is empty or contains only one element, return it as is
    if len(arr) <= 1:
        return arr

    # Choose a pivot element
    pivot = arr[len(arr) // 2]

    # Partition the array around the pivot
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]

    # Recursively sort the left and right partitions
    quicksort(left)
    quicksort(right)

    # Combine the sorted partitions
    return middle
```

This quicksort function implements the standard quicksort algorithm. It first checks if the input array is empty or has only one element, in which case it returns the

In [12]:
!pip install --upgrade vllm



In [10]:
!uv pip install --upgrade --force-reinstall --system vllm

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2mmsgspec   [0m [32m------------------------------[2m[0m[0m 203.62 KiB/205.74 KiB
[2mgoogle-auth[0m [32m------------[2m------------------[0m[0m 75.89 KiB/205.83 KiB
[2mgoogleapis-common-protos[0m [32m------[2m------------------------[0m[0m 38.94 KiB/216.49 KiB
[2mpropcache [0m [32m------------------------------[2m[0m[0m 221.93 KiB/225.65 KiB
[2mfrozenlist[0m [32m------------------------------[2m[0m[0m 268.43 KiB/268.43 KiB
[2mpsutil    [0m [32m---------------------[2m---------[0m[0m 189.90 KiB/280.74 KiB
[2mprotobuf  [0m [32m-----------------[2m-------------[0m[0m 172.91 KiB/312.18 KiB
[2mpytest    [0m [32m------------[2m------------------[0m[0m 123.43 KiB/335.04 KiB
[2moutlines-core[0m [32m----------------------------[2m--[0m[0m 306.07 KiB/335.22 KiB
[2myarl      [0m [32m-----------------------------[2m-[0m[0m 316.17 KiB/336.08 KiB
[2mjiter     [0m [32m-----

## Preparing our training experiment



Let's create a YAML file for our training config:

In [15]:
%%writefile $tutorial_dir/train.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-360M-Instruct"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"
  tokenizer_pad_token: "<|endoftext|>"
  device_map: "auto"

data:
  train:
    datasets:
      - dataset_name: "PromptResponseDataset"
        split: "train"
        sample_count: 8000
        dataset_kwargs: {
          "hf_dataset_path": "O1-OPEN/OpenO1-SFT",
          "prompt_column": "instruction",
          "response_column": "output",
          "assistant_only": true,
          "instruction_template": "<|im_start|>user\n",
          "response_template": "<|im_start|>assistant\n",
        }
        shuffle: True
        seed: 42
    collator_name: "text_with_padding"
    seed: 42

training:
  output_dir: "finetuning_tutorial/output"

  # For a single GPU, the following gives us a batch size of 16
  # If training with multiple GPUs, feel free to reduce gradient_accumulation_steps
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8

  # ***NOTE***
  # We set it to 10 steps to first verify that it works
  # Swap to 1500 steps to get more meaningful results.
  # Note: 1500 steps will take 2-3 hours on a single A100-40GB GPU.
  max_steps: 5
  # max_steps: 1500

  learning_rate: 1e-3
  warmup_ratio: 0.1
  logging_steps: 10
  save_steps: 0
  max_grad_norm: 1
  weight_decay: 0.01


  trainer_type: "TRL_SFT"
  optimizer: "adamw_torch_fused"
  enable_gradient_checkpointing: True
  gradient_checkpointing_kwargs:
    use_reentrant: False
  ddp_find_unused_parameters: False
  dataloader_num_workers: "auto"
  dataloader_prefetch_factor: 32
  empty_device_cache_steps: 1
  use_peft: true

peft:
  lora_r: 16
  lora_alpha: 32
  lora_dropout: 0.00
  lora_target_modules:
    - "q_proj"
    - "k_proj"
    - "v_proj"
    - "o_proj"
    - "gate_proj"
    - "up_proj"
    - "down_proj"

Overwriting finetuning_tutorial/train.yaml


## Fine-tuning the model

This will start the fine-tuning process using the Oumi framework. Because we set `max_steps: 5`, this should be very quick. The full fine-tuning process may take a few hours, depending on your GPU.

### SINGLE GPU

In [16]:
!oumi train -c "$tutorial_dir/train.yaml"


@@@@@@@@@@@@@@@@@@@
@                 @
@   @@@@@  @  @   @
@   @   @  @  @   @
@   @@@@@  @@@@   @
@                 @
@   @@@@@@@   @   @
@   @  @  @   @   @
@   @  @  @   @   @
@                 @
@@@@@@@@@@@@@@@@@@@

2025-01-30 15:19:04.947161: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1738250344.976368   28188 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738250344.985819   28188 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2025-01-30 15:19:08,644][oumi][rank0][pid:28188][MainThread][INFO]][distributed.py:546] Setting random seed to 42 on rank 0.
[2025-01-30 15:19:10,005][oumi][rank0][pid:28188][MainThread][INFO]][torch_utils.py:66] T

In [18]:
 %pip install datasets --upgrade

Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (179 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, dill
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2024.12.0
    Uninstalling fsspec-2024.12.0:
      Successfully uninstalled fsspec-2024.12.0
  Attempting uninstall: dill
    Found existing installation: dill 0.3.9
    Uninstalling dill-0.3.9:
      Successfully uninstalled dill-0.3.9
[31mERROR: pip's dependency reso

### MULTI-GPU

In [None]:
!oumi distributed torchrun -m oumi train -c "$tutorial_dir/train.yaml"

## Evaluation


As an example, let's create an evaluation configuration file!

**Note:** Since we've finetuned our model to produce thoughts before answering, it's very likely to do worse on most evals out-of-the-box.

Many evals do not allow models to decode and thus don't take advantage of things like inference-time reasoning.

In [17]:
%%writefile $tutorial_dir/eval.yaml

model:
  model_name: "finetuning_tutorial/output"
  torch_dtype_str: "bfloat16"

tasks:
  - evaluation_platform: lm_harness
    task_name: mmlu_college_computer_science

output_dir: "finetuning_tutorial/output/evaluation"
generation:
  batch_size: null # This will let LM HARNESS find the maximum possible batch size.

Writing finetuning_tutorial/eval.yaml


In [18]:
!oumi evaluate -c "$tutorial_dir/eval.yaml"


@@@@@@@@@@@@@@@@@@@
@                 @
@   @@@@@  @  @   @
@   @   @  @  @   @
@   @@@@@  @@@@   @
@                 @
@   @@@@@@@   @   @
@   @  @  @   @   @
@   @  @  @   @   @
@                 @
@@@@@@@@@@@@@@@@@@@

2025-01-30 15:45:42.324934: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1738251942.344887   34929 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738251942.350958   34929 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[2025-01-30 15:45:46,398][oumi][rank0][pid:34929][MainThread][INFO]][model_params.py:225] Found LoRA adapter at finetuning_tutorial/output, setting `adapter_model` to `model_name`.
[2025-01-30 15:45:46,398][oumi][ra

## Use the Fine-tuned Model

Once we're happy with the results, we can serve the fine-tuned model for interactive inference:

In [20]:
%%writefile $tutorial_dir/trained_infer.yaml

model:
  model_name: "HuggingFaceTB/SmolLM2-360M-Instruct"
  adapter_model: "finetuning_tutorial/output"
  trust_remote_code: true
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 2048
  batch_size: 1

Overwriting finetuning_tutorial/trained_infer.yaml


In [21]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "trained_infer.yaml"))

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-01-30 16:06:55,494][oumi][rank0][pid:16046][MainThread][INFO]][models.py:185] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-01-30 16:06:55,496][oumi][rank0][pid:16046][MainThread][INFO]][models.py:255] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-01-30 16:06:56,992][oumi][rank0][pid:16046][MainThread][INFO]][models.py:236] Loading PEFT adapter from: finetuning_tutorial/output ...
[2025-01-30 16:06:58,005][oumi][rank0][pid:16046][MainThread][INFO]][native_text_inference_engine.py:111] Setting EOS token id to `2`
conversation_id=None messages=[USER: Write a Python function to implement the quicksort algorithm. Please include comments explaining each step., ASSISTANT: ```python
# Import the quicksort algorithm from the huggingface/huggingface-python-utils package
from huggingface.huggingface_utils import quicksort

# Define the quicksort f

In [22]:
def quicksort(arr):
    # Base case: if the array is empty or contains only one element, return it
    if len(arr) <= 1:
        return arr

    # Choose a pivot element
    pivot_index = quicksort(arr[0:len(arr)//2])
    pivot = arr[len(arr)//2]

    # Partition the array around the pivot
    for i in range(len(arr)):
        if arr[i] < pivot:
            arr[i], arr[len(arr)//2] = quicksort(arr[i:len(arr)//2])
            arr[len(arr)//2], arr[i] = arr[i], arr[len(arr)//2]

    # Recursively sort the subarrays
    return arr[0:len(arr)//2] + quicksort(arr[len(arr)//2:])

# Example usage:
arr = [5, 2, 8, 1, 9, 3, 6, 4]
print("Original array:", arr)
print("Sorted array:", quicksort(arr))

Original array: [5, 2, 8, 1, 9, 3, 6, 4]


ValueError: not enough values to unpack (expected 2, got 1)

Code generated gave out error, used Gemini to suggest fix. New code generated below.

In [23]:
def quicksort(arr):
    if len(arr) < 2:  # Base case: array with 0 or 1 element is already sorted
        return arr
    else:
        pivot = arr[0]  # Choose the first element as pivot
        less = [i for i in arr[1:] if i <= pivot]  # Elements less than or equal to pivot
        greater = [i for i in arr[1:] if i > pivot]  # Elements greater than pivot
        return quicksort(less) + [pivot] + quicksort(greater)  # Recursive calls and combining

# Example usage:
arr = [5, 2, 8, 1, 9, 3, 6, 4]
print("Original array:", arr)
print("Sorted array:", quicksort(arr))

Original array: [5, 2, 8, 1, 9, 3, 6, 4]
Sorted array: [1, 2, 3, 4, 5, 6, 8, 9]
