<div class="align-center">
<a href="https://oumi.ai/"><img src="https://oumi.ai/docs/en/latest/_static/logo/header_logo.png" height="200"></a>

[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)
[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)
[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)
<a target="_blank" href="https://colab.research.google.com/github/oumi-ai/oumi/blob/main/notebooks/Oumi - A Tour.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
</div>

üëã Welcome to Open Universal Machine Intelligence (Oumi)!

üöÄ Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

ü§ù Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.

‚≠ê If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi).

# A Tour of Oumi

This tutorial will give you a brief overview of Oumi's core functionality. We'll cover:

1. Training a model
1. Performing model inference
1. Evaluating a model against common benchmarks
1. Launching jobs
1. Customizing datasets and clouds

# üìã Prerequisites

‚ùó**NOTICE:** We recommend running this notebook on a GPU. If running on Google Colab, you can use the free T4 GPU runtime (Colab Menu: `Runtime` -> `Change runtime type`).

## Oumi Installation

First, let's install Oumi. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html).


In [1]:
%pip install oumi



In [2]:
import os
from pathlib import Path

tutorial_dir = "tour_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)
os.environ["TOKENIZERS_PARALLELISM"] = "false"  # Disable warnings from HF.

# ‚öíÔ∏è Training a Model

Oumi supports training both custom and out-of-the-box models. Want to try out a model on HuggingFace? You can do that. Want to train your own custom Pytorch model? No problem.

## A Quick Demo

Let's try training a pre-existing model on HuggingFace. We'll use SmolLM2 135M as it's small and trains quickly.

Oumi uses [training configuration files](https://oumi.ai/docs/en/latest/api/oumi.core.configs.html#oumi.core.configs.TrainingConfig) to specify training parameters. We've already created a training config for SmolLM2 ‚Äî let's give it a try!

In [3]:
import os
# Manually define the directory since the setup cell was missed
tutorial_dir = "./oumi_tutorial_output"
os.makedirs(tutorial_dir, exist_ok=True)

In [4]:
yaml_content = f"""
model:
  model_name: "HuggingFaceTB/SmolLM2-135M-Instruct"
  torch_dtype_str: "bfloat16"
  trust_remote_code: True

data:
  train:
    datasets:
      - dataset_name: "yahma/alpaca-cleaned"

training:
  trainer_type: "TRL_SFT"
  per_device_train_batch_size: 2
  max_steps: 10 # Quick "mini" training, for demo purposes only.
  run_name: "smollm2_135m_sft"
  output_dir: "{tutorial_dir}/output"
"""

with open(f"{tutorial_dir}/train.yaml", "w") as f:
    f.write(yaml_content)

In [5]:
!pip install protobuf==3.20.3

Collecting protobuf==3.20.3
  Using cached protobuf-3.20.3-py2.py3-none-any.whl.metadata (720 bytes)
Using cached protobuf-3.20.3-py2.py3-none-any.whl (162 kB)
Installing collected packages: protobuf
  Attempting uninstall: protobuf
    Found existing installation: protobuf 6.33.2
    Uninstalling protobuf-6.33.2:
      Successfully uninstalled protobuf-6.33.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
oumi 0.5.0 requires protobuf>=6.32, but you have protobuf 3.20.3 which is incompatible.
databricks-sdk 0.73.0 requires protobuf!=5.26.*,!=5.27.*,!=5.28.*,!=5.29.0,!=5.29.1,!=5.29.2,!=5.29.3,!=5.29.4,!=6.30.0,!=6.30.1,!=6.31.0,<7.0,>=4.25.8, but you have protobuf 3.20.3 which is incompatible.
opentelemetry-proto 1.37.0 requires protobuf<7.0,>=5.0, but you have protobuf 3.20.3 which is incompatible.
tensorflow-metadata 1.17.2 requires protobuf>=4.25.2; py

In [6]:
import os
from pathlib import Path

# Ensure protobuf is at a compatible version for Oumi
# It's highly recommended to restart the runtime after this installation
# to ensure all modules are loaded correctly.
!pip install protobuf --upgrade --quiet

from oumi.core.configs import TrainingConfig
from oumi.train import train

config = TrainingConfig.from_yaml(str(Path(tutorial_dir) / "train.yaml"))

train(config)

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-ai-generativelanguage 0.6.15 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2, but you have protobuf 6.33.2 which is incompatible.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.33.2 which is incompatible.
tensorflow 2.19.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3, but you have protobuf 6.33.2 which is incompatible.
tensorflow 2.19.0 requires tensorboard~=2.19.0, but you have tensorboard 2.20.0 which is incompatible.
gcsfs 2025.3.0 requires fsspec==2025.3.0, but you have fsspec 2024.9.0 which is incompatible.[0m[31m
[0m

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

[2025-12-09 17:27:15,161][oumi][rank0][pid:13195][MainThread][INFO]][train.py:117] Creating training.output_dir: ./oumi_tutorial_output/output...
[2025-12-09 17:27:15,162][oumi][rank0][pid:13195][MainThread][INFO]][train.py:119] Created training.output_dir absolute path: /content/oumi_tutorial_output/output
[2025-12-09 17:27:15,163][oumi][rank0][pid:13195][MainThread][INFO]][train.py:117] Creating training.telemetry_dir: oumi_tutorial_output/output/telemetry...
[2025-12-09 17:27:15,164][oumi][rank0][pid:13195][MainThread][INFO]][train.py:119] Created training.telemetry_dir absolute path: /content/oumi_tutorial_output/output/telemetry
[2025-12-09 17:27:15,165][oumi][rank0][pid:13195][MainThread][INFO]][torch_utils.py:80] Torch version: 2.8.0+cu128. NumPy version: 2.0.2
[2025-12-09 17:27:15,166][oumi][rank0][pid:13195][MainThread][INFO]][torch_utils.py:88] CUDA version: 12.8 
[2025-12-09 17:27:15,224][oumi][rank0][pid:13195][MainThread][INFO]][torch_utils.py:91] CuDNN version: 91.0.2
[20

config.json:   0%|          | 0.00/861 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

[2025-12-09 17:27:17,686][oumi][rank0][pid:13195][MainThread][INFO]][models.py:544] Using the model's built-in chat template for model 'HuggingFaceTB/SmolLM2-135M-Instruct'.
[2025-12-09 17:27:17,687][oumi][rank0][pid:13195][MainThread][INFO]][base_map_dataset.py:91] Creating map dataset (type: AlpacaDataset)... dataset_name: 'yahma/alpaca-cleaned'


README.md: 0.00B [00:00, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

[2025-12-09 17:27:23,256][oumi][rank0][pid:13195][MainThread][INFO]][base_map_dataset.py:487] Dataset Info:
	Split: train
	Version: 0.0.0
	Dataset size: 40283906
	Download size: 44307561
	Size: 84591467 bytes
	Rows: 51760
	Columns: ['output', 'input', 'instruction']
[2025-12-09 17:27:23,890][oumi][rank0][pid:13195][MainThread][INFO]][base_map_dataset.py:426] Loaded DataFrame with shape: (51760, 3). Columns:
output         object
input          object
instruction    object
dtype: object


Generating train split: 0 examples [00:00, ? examples/s]

[2025-12-09 17:27:23,963][oumi][rank0][pid:13195][MainThread][INFO]][base_map_dataset.py:312] AlpacaDataset: features=dict_keys(['input_ids', 'attention_mask'])


Generating train split: 0 examples [00:00, ? examples/s]

[2025-12-09 17:28:16,469][oumi][rank0][pid:13195][MainThread][INFO]][base_map_dataset.py:376] Finished transforming dataset (AlpacaDataset)! Speed: 985.81 examples/sec. Examples: 51760. Duration: 52.5 sec. Transform workers: 1.
[2025-12-09 17:28:16,482][oumi][rank0][pid:13195][MainThread][INFO]][models.py:260] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-12-09 17:28:16,586][oumi][rank0][pid:13195][MainThread][INFO]][models.py:336] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.


`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

[2025-12-09 17:28:22,210][oumi][rank0][pid:13195][MainThread][INFO]][torch_utils.py:288] 
Model Parameters Summary:
üî¢ Total     parameters: 134,515,008
üîó Embedding parameters: 28,311,552
üéØ Trainable parameters: 134,515,008
üîí Frozen    parameters: 0 (0.00%)

[2025-12-09 17:28:22,218][oumi][rank0][pid:13195][MainThread][INFO]][train.py:486] Skipping dataset preparation for TRL_SFT trainer since the dataset is already processed.
[2025-12-09 17:28:22,709][oumi][rank0][pid:13195][MainThread][INFO]][torch_profiler_utils.py:164] PROF: Torch Profiler disabled!


The model is already on multiple devices. Skipping the move to device specified in `args`.


[2025-12-09 17:28:22,756][oumi][rank0][pid:13195][MainThread][INFO]][device_utils.py:343] GPU Metrics Before Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=1, used_memory_mb=677.0, temperature=49, fan_speed=None, fan_speeds=None, power_usage_watts=27.549, power_limit_watts=70.0, gpu_utilization=0, memory_utilization=0, performance_state=0, clock_speed_graphics=585, clock_speed_sm=585, clock_speed_memory=5000).
[2025-12-09 17:28:22,757][oumi][rank0][pid:13195][MainThread][INFO]][train.py:558] Training init time: 67.596s
[2025-12-09 17:28:22,757][oumi][rank0][pid:13195][MainThread][INFO]][train.py:559] Starting training... (TrainerType.TRL_SFT, transformers: 4.57.3)


Step,Training Loss


[2025-12-09 17:28:37,186][oumi][rank0][pid:13195][MainThread][INFO]][train.py:566] Training is Complete.
[2025-12-09 17:28:37,197][oumi][rank0][pid:13195][MainThread][INFO]][device_utils.py:343] GPU Metrics After Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=1, used_memory_mb=2983.0, temperature=52, fan_speed=None, fan_speeds=None, power_usage_watts=27.944, power_limit_watts=70.0, gpu_utilization=3, memory_utilization=0, performance_state=0, clock_speed_graphics=585, clock_speed_sm=585, clock_speed_memory=5000).
[2025-12-09 17:28:37,198][oumi][rank0][pid:13195][MainThread][INFO]][torch_utils.py:135] Peak GPU memory usage: 2.06 GB
[2025-12-09 17:28:37,198][oumi][rank0][pid:13195][MainThread][INFO]][train.py:573] Saving final state...
[2025-12-09 17:28:37,201][oumi][rank0][pid:13195][MainThread][INFO]][train.py:578] Saving final model...
[2025-12-09 17:28:38,243][oumi][rank0][pid:13195][MainThread][INFO]][hf_trainer.py:127] Model has been saved at ./oumi_t

Congratulations, you've trained your first model using Oumi!

You can also train your own custom Pytorch model. We cover that in depth in our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Finetuning%20Tutorial.ipynb).

# üß† Model Inference

Now that you've trained a model, let's run inference.

In [7]:
yaml_content = f"""
model:
  model_name: "{tutorial_dir}/output"
  torch_dtype_str: "bfloat16"

generation:
  max_new_tokens: 128
  batch_size: 1
"""

with open(f"{tutorial_dir}/infer.yaml", "w") as f:
    f.write(yaml_content)

In [8]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))

input_text = (
    "Remember that we didn't train for long, so the results might not be great."
)

results = infer(config=config, inputs=[input_text])

print(results[0])

[2025-12-09 17:28:38,369][oumi][rank0][pid:13195][MainThread][INFO]][models.py:260] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-12-09 17:28:38,372][oumi][rank0][pid:13195][MainThread][INFO]][models.py:336] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-12-09 17:28:38,909][oumi][rank0][pid:13195][MainThread][INFO]][models.py:544] Using the model's built-in chat template for model './oumi_tutorial_output/output'.
[2025-12-09 17:28:38,912][oumi][rank0][pid:13195][MainThread][INFO]][native_text_inference_engine.py:151] Setting EOS token id to `2`


`generation_config` default values have been modified to match model-specific defaults: {'pad_token_id': 2, 'bos_token_id': 1}. If this is not desired, please set these values explicitly.


conversation_id='dd887f56-8d2c-5143-8d01-73b5f5be0c67' messages=[USER: Remember that we didn't train for long, so the results might not be great., ASSISTANT: I'm sorry for the inconvenience, but as a chatbot, I don't have the ability to access or process data from external sources. I'm designed to provide information and guidance based on your input, not to collect or store any data. I recommend checking the official website or platform for the most accurate and up-to-date information.] metadata={}


We can also run inference using the pretrained model by slightly tweaking our config:

In [9]:
base_model_config = InferenceConfig.from_yaml(str(Path(tutorial_dir) / "infer.yaml"))
base_model_config.model.model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"

input_text = "Input for the pretrained model: What is your name? "

results = infer(config=base_model_config, inputs=[input_text])

print(results[0])

[2025-12-09 17:28:42,371][oumi][rank0][pid:13195][MainThread][INFO]][models.py:260] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2025-12-09 17:28:42,464][oumi][rank0][pid:13195][MainThread][INFO]][models.py:336] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
[2025-12-09 17:28:43,409][oumi][rank0][pid:13195][MainThread][INFO]][models.py:544] Using the model's built-in chat template for model 'HuggingFaceTB/SmolLM2-135M-Instruct'.
[2025-12-09 17:28:43,412][oumi][rank0][pid:13195][MainThread][INFO]][native_text_inference_engine.py:151] Setting EOS token id to `2`
conversation_id='59714fc4-9715-5220-b9f5-7f3ed324f822' messages=[USER: Input for the pretrained model: What is your name? , ASSISTANT: My name is Alex Chen. I'm a data scientist and AI assistant, trained on a vast dataset of text data, which I use to train my models for various tasks.] metadata={}


# üìä Evaluating a Model against Common Benchmarks

You can use Oumi to evaluate pretrained and tuned models against standard benchmarks. For example, let's evaluate our tuned model against `Hellaswag`:

In [10]:
yaml_content = f"""
model:
  model_name: "{tutorial_dir}/output"
  torch_dtype_str: "bfloat16"

tasks:
  - evaluation_backend: lm_harness
    task_name: mmlu_college_computer_science

generation:
  batch_size: null # This will let LM HARNESS find the maximum possible batch size.
output_dir: "{tutorial_dir}/output/evaluation"
"""

with open(f"{tutorial_dir}/eval.yaml", "w") as f:
    f.write(yaml_content)

In [11]:
from oumi.core.configs import EvaluationConfig
from oumi.evaluate import evaluate

eval_config = EvaluationConfig.from_yaml(str(Path(tutorial_dir) / "eval.yaml"))

# Uncomment the following line to run evals against the V1 HuggingFace Leaderboard.
# This may take a while.
# eval_config.data.datasets[0].dataset_name = "huggingface_leaderboard_v1"

evaluate(eval_config)



README.md: 0.00B [00:00, ?B/s]

dataset_infos.json: 0.00B [00:00, ?B/s]

college_computer_science/test-00000-of-0(‚Ä¶):   0%|          | 0.00/28.1k [00:00<?, ?B/s]

college_computer_science/validation-0000(‚Ä¶):   0%|          | 0.00/6.25k [00:00<?, ?B/s]

college_computer_science/dev-00000-of-00(‚Ä¶):   0%|          | 0.00/6.81k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/5 [00:00<?, ? examples/s]

[2025-12-09 17:28:52,754][oumi][rank0][pid:13195][MainThread][INFO]][lm_harness.py:334] 	LM Harness `task_params`:
LMHarnessTaskParams(evaluation_backend='lm_harness',
                    task_name='mmlu_college_computer_science',
                    num_samples=None,
                    log_samples=False,
                    eval_kwargs={},
                    num_fewshot=None)
[2025-12-09 17:28:52,755][oumi][rank0][pid:13195][MainThread][INFO]][lm_harness.py:335] 	LM Harness `task_dict`:
{'mmlu_college_computer_science': ConfigurableTask(task_name=mmlu_college_computer_science,output_type=multiple_choice,num_fewshot=None,num_samples=100)}
[2025-12-09 17:28:52,757][oumi][rank0][pid:13195][MainThread][INFO]][lm_harness.py:347] 	LM Harness `model_params`:
{'batch_size': 1,
 'device': 'cuda:0',
 'device_map': 'auto',
 'dtype': torch.bfloat16,
 'max_batch_size': None,
 'max_length': None,
 'parallelize': False,
 'pretrained': './oumi_tutorial_output/output',
 'trust_remote_code': False}
[

100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100/100 [00:00<00:00, 652.49it/s]
Running loglikelihood requests: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [00:05<00:00, 75.99it/s]

[2025-12-09 17:28:58,886][oumi][rank0][pid:13195][MainThread][INFO]][lm_harness.py:369] mmlu_college_computer_science's metric dict is {'acc,none': 0.25,
 'acc_stderr,none': 0.04351941398892446,
 'alias': 'college_computer_science'}







[{'results': {'mmlu_college_computer_science': {'alias': 'college_computer_science',
    'acc,none': 0.25,
    'acc_stderr,none': 0.04351941398892446}}}]

# ‚òÅÔ∏è Launching Jobs

Oftentimes you'll need to run various tasks (training, evaluation, etc.) on remote hardware that's better suited for the task. Oumi can handle this for you by launching jobs on various compute clusters. For more information about running jobs, see our [Running Jobs Remotely tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Running%20Jobs%20Remotely.ipynb). For running jobs on custom clusters, see our [Launching Jobs on Custom Clusters tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).


Today, Oumi supports running jobs on several cloud provider platforms.

For the latest list, we can run the `which_clouds` method:

In [12]:
import oumi.launcher as launcher

print("Supported Clouds in Oumi:")
for cloud in launcher.which_clouds():
    print(cloud)

Supported Clouds in Oumi:
frontier
local
perlmutter
polaris
runpod
gcp
lambda
aws
azure
slurm


Let's run a simple "Hello World" job locally to demonstrate how to use the Oumi job launcher. This job will echo `Hello World`, then run the same training job executed above. Running this job on a cloud provider like GCP simply involves changing the `cloud` field.

In [13]:
yaml_content = f"""
name: hello-world
resources:
  cloud: local

working_dir: .

envs:
  TEST_ENV_VARIABLE: '"Hello, World!"'
  OUMI_LOGGING_DIR: "{tutorial_dir}/logs"

run: |
  echo "$TEST_ENV_VARIABLE"
  oumi train -c {tutorial_dir}/train.yaml
"""

with open(f"{tutorial_dir}/job.yaml", "w") as f:
    f.write(yaml_content)

In [14]:
import time

job_config = launcher.JobConfig.from_yaml(str(Path(tutorial_dir) / "job.yaml"))
cluster, job_status = launcher.up(job_config, cluster_name=None)

while job_status and not job_status.done:
    print("Job is running...")
    time.sleep(15)
    job_status = cluster.get_job(job_status.id)
print("Job is done!")

Job is running...
Job is running...
Job is running...
Job is done!


The job created logs under our tutorial directory. Let's take a look at the directory:

In [15]:
logs_dir = f"{tutorial_dir}/logs"
Path(logs_dir).iterdir()

<generator object Path.iterdir at 0x7b4b911a60c0>

Now let's parse the logfiles.

In [16]:
for log_file in Path(logs_dir).iterdir():
    print(f"Log file: {log_file}")
    with open(log_file) as f:
        print(f.read())

Log file: oumi_tutorial_output/logs/2025_12_09_17_29_06_868_0.stdout
"Hello, World!"

   ____  _    _ __  __ _____
  / __ \| |  | |  \/  |_   _|
 | |  | | |  | | \  / | | |
 | |  | | |  | | |\/| | | |
 | |__| | |__| | |  | |_| |_
  \____/ \____/|_|  |_|_____|

[2025-12-09 17:29:22,275][oumi][rank0][pid:14116][MainThread][INFO]][distributed.py:616] Setting random seed to 42 on rank 0.
[2025-12-09 17:29:25,789][oumi][rank0][pid:14116][MainThread][INFO]][torch_utils.py:80] Torch version: 2.8.0+cu128. NumPy version: 2.0.2
[2025-12-09 17:29:25,789][oumi][rank0][pid:14116][MainThread][INFO]][torch_utils.py:88] CUDA version: 12.8 
[2025-12-09 17:29:25,793][oumi][rank0][pid:14116][MainThread][INFO]][torch_utils.py:91] CuDNN version: 91.0.2
[2025-12-09 17:29:25,794][oumi][rank0][pid:14116][MainThread][INFO]][torch_utils.py:124] CPU cores: 2 CUDA devices: 1
device(0)='Tesla T4' Capability: (7, 5) Memory: [Total: 14.74GiB Free: 11.99GiB Allocated: 0.0GiB Cached: 0.0GiB]
[2025-12-09 17:29:25,801][

# ‚öôÔ∏è Customizing Datasets and Clusters

Oumi offers rich customization that allows users to build custom solutions on top of our existing building blocks. Several of Oumi's primary resources (Datasets, Clouds, etc.) leverage the Oumi Registry when invoked.

This registry allows users to build custom classes that function as drop-in replacements for core functionality.

For more details on registering custom datasets, see the [tutorial here](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Datasets%20Tutorial.ipynb).

For a tutorial on writing a custom cloud/cluster for running jobs, see the [tutorial here](https://github.com/oumi-ai/oumi/blob/main/notebooks/Oumi%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).

You can find further information about the required registry decorators [here](https://oumi.ai/docs/en/latest/api/oumi.core.registry.html#oumi.core.registry.register_cloud_builder).

# üß≠ What's Next?

Congrats on finishing this notebook! Feel free to check out our other [notebooks](https://github.com/oumi-ai/oumi/tree/main/notebooks) in the [Oumi GitHub](https://github.com/oumi-ai/oumi), and give us a star! You can also join the Oumi community over on [Discord](https://discord.gg/oumi).

üì∞ Want to keep up with news from Oumi? Subscribe to our [Substack](https://blog.oumi.ai/) and [Youtube](https://www.youtube.com/@Oumi_AI)!

‚ö° Interested in building custom AI in hours, not months? Apply to get [early access](https://oumi-ai.typeform.com/early-access) to the Oumi Platform, or [chat with us](https://calendly.com/d/ctcx-nps-47m/chat-with-us-get-early-access-to-the-oumi-platform) to learn more!