<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
<a target="_blank" href="https://colab.research.google.com/github/oumi-ai/oumi/blob/main/notebooks/Oumi - Finetuning Tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
</div>

👋 Welcome to Open Universal Machine Intelligence (Oumi)!

🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

# Finetuning Overview

In this tutorial, we’ll use LoRA to fine-tune a large language model on multi-turn conversational data.

We'll use the Oumi framework to streamline the process and achieve high-quality results.

Oumi supports [Unsloth](https://unsloth.ai/), a lightweight and efficient patch for [Transformers](https://huggingface.co/docs/transformers/en/index) library that significantly speeds up LoRA training while reducing memory usage.

We'll cover the following topics:
1. Prerequisites
2. Data Preparation & Sanity Checks
3. Training Config Preparation
4. Launching Training
5. Monitoring Progress
6. Evaluation
7. Analyzing Results
8. Inference


## Prerequisites

❗**NOTICE:** We recommend running this notebook on a GPU. If running on Google Colab, you can use the free T4 GPU runtime (Colab Menu: `Runtime` -> `Change runtime type`).

First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html). Here, we include Oumi's GPU and Unsloth dependencies.

In [None]:
%pip install oumi[gpu,unsloth]

## Creating our working directory
For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs.

In [None]:
from pathlib import Path

tutorial_dir = "finetuning_unsloth_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

## Setup the environment

You may need to set the following environment variables:
- [Optional] HF_TOKEN: Your [HuggingFace](https://huggingface.co/docs/hub/en/security-tokens) token, in case you want to access a private model like Llama.
- [Optional] WANDB_API_KEY: Your [wandb](https://wandb.ai) token, in case you want to log your experiments to wandb.

# Getting Started


## Data Preparation
We begin by loading the FineTome-100k dataset from Hugging Face. This dataset contains a variety of multi-turn conversations formatted for fine-tuning language models.

In [None]:
from datasets import load_dataset

dataset = load_dataset("mlabonne/FineTome-100k", split="train")
print(dataset[100])

Next, we preprocess the dataset to convert the raw conversation structure into a standard format of messages, filtering out any system prompts. This step prepares the data for training or inference.

In [None]:
from multiprocessing import cpu_count


def preprocess(example):
    """Preprocess the examples."""
    preprocessed_example = {"messages": []}
    for message in example["conversations"]:
        if message["from"] != "system":
            preprocessed_example["messages"].append(
                {
                    "role": "user"
                    if message["from"] in ["user", "human", "input"]
                    else "assistant",
                    "content": message["value"],
                }
            )
    return preprocessed_example


dataset = dataset.map(
    preprocess,
    num_proc=cpu_count(),
    remove_columns=["conversations", "source", "score"],
)
print(dataset[100])

Finally, we save the preprocessed dataset locally for access during training.

In [None]:
dataset_path = str(Path(tutorial_dir) / "mlabonne/FineTome-100k")
dataset.save_to_disk(dataset_path)

The HuggingFaceDataset class is used to wrap the preprocessed dataset. It takes care of formatting conversations into a structure suitable for training, including tokenization and role handling.

In [None]:
from oumi.builders import build_tokenizer
from oumi.core.configs import ModelParams
from oumi.datasets import HuggingFaceDataset

# Initialize the dataset
tokenizer = build_tokenizer(ModelParams(model_name="unsloth/gemma-3-1b-it"))

dataset = HuggingFaceDataset(
    tokenizer=tokenizer,
    hf_dataset_path=dataset_path,
    dataset_path=dataset_path,
)

# Print a few examples
for i in range(4):
    conversation = dataset.conversation(i)
    print(f"Example {i + 1}:")
    for message in conversation.messages:
        print(f"{message.role}: {message.content[:100]}...")  # Truncate for brevity
    print("\n")

## Model Preparation

For code generation, we want a model with strong general language understanding and coding capabilities. 

We also want a model that is small enough to train and run on a single GPU.

For this tutorial, we'll use "unsloth/gemma-3-1b-it" as our base model.

This model is accessible through the Unsloth library, which is seamlessly integrated into Oumi, streamlining our fine-tuning workflow.

## Initial Model Responses

Let's see how our model performs on an example prompt.

In [None]:
%%writefile $tutorial_dir/infer.yaml

model:
  model_name: "unsloth/gemma-3-1b-it"

generation:
  max_new_tokens: 128
  batch_size: 1

In [None]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config=config, inputs=[input_text])

print(results[0])

## Preparing our training experiment



Let's create a YAML file for our training config:

In [None]:
%%writefile $tutorial_dir/train.yaml

model:
  model_name: "unsloth/gemma-3-1b-it"
  model_max_length: 2048
  model_kwargs: {
    "unsloth_model": "FastModel",
    "load_in_4bit": True,
    "load_in_8bit": False,
    "full_finetuning": False,
    "peft": {
        "finetune_vision_layers": False,
        "finetune_language_layers": True,
        "finetune_attention_modules": True,
        "finetune_mlp_modules": True,
        "random_state": 3407,
    }
  }

data:
  train:
    datasets:
      - dataset_name: "HuggingFaceDataset"
        dataset_path: "finetuning_unsloth_tutorial/mlabonne/FineTome-100k"
        split: "train"
        dataset_kwargs: {
          "hf_dataset_path": "finetuning_unsloth_tutorial/mlabonne/FineTome-100k",
          "assistant_only": True,
          "instruction_template": "<start_of_turn>user\n",
          "response_template": "<start_of_turn>model\n",
        }

training:
  use_peft: true
  trainer_type: "TRL_SFT"
  per_device_train_batch_size: 1
  warmup_steps: 5
  max_steps: 5
  learning_rate: 2e-4
  logging_steps: 1
  optimizer: "adamw_8bit"
  weight_decay: 0.01
  lr_scheduler_type: "linear"
  seed: 3407
  dataloader_num_workers: 2
  output_dir: "finetuning_unsloth_tutorial/output"

peft:
  lora_r: 8
  lora_alpha: 8
  lora_dropout: 0
  lora_bias: "none"

## Fine-tuning the model

This will start the fine-tuning process using the Oumi framework. Because we set `max_steps: 5`, this should be very quick. The full fine-tuning process may take a few hours, depending on your GPU.

### SINGLE GPU

In [None]:
!oumi train -c "$tutorial_dir/train.yaml"

### MULTI-GPU

In [None]:
!oumi distributed torchrun -m oumi train -c "$tutorial_dir/train.yaml"

## Evaluation


As an example, let's create an evaluation configuration file!

In [None]:
%%writefile $tutorial_dir/eval.yaml

model:
  model_name: "finetuning_unsloth_tutorial/output"

tasks:
  - evaluation_backend: lm_harness
    task_name: mmlu_college_computer_science

output_dir: "finetuning_unsloth_tutorial/output/evaluation"
generation:
  batch_size: null # This will let LM HARNESS find the maximum possible batch size.

In [None]:
!oumi evaluate -c "$tutorial_dir/eval.yaml"

## Use the Fine-tuned Model

Once we're happy with the results, we can serve the fine-tuned model for interactive inference:

In [None]:
%%writefile $tutorial_dir/trained_infer.yaml

model:
  model_name: "unsloth/gemma-3-1b-it"
  adapter_model: "finetuning_unsloth_tutorial/output"

generation:
  max_new_tokens: 2048
  batch_size: 1

In [None]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "trained_infer.yaml"))

input_text = (
    "Write a Python function to implement the quicksort algorithm. "
    "Please include comments explaining each step."
)

results = infer(config=config, inputs=[input_text])

print(results[0])