# Introduction


Welcome to Open Unified Machine Intelligence (OUMI)! We created this platform to democratize the development of large models for the open source world. We strongly believe that together, as a community, we can push the boundaries of AI.

This tutorial will give you a brief overview of OUMI's core functionality. We'll cover:

1. Training a model
1. Model Inference
1. Evaluating a model against common benchmarks
1. Launching jobs
1. Customizing datasets and clouds

# Prerequisites
## OUMI Installation
First, let's install OUMI. You can find detailed instructions [here](https://github.com/oumi-ai/oumi/blob/main/README.md), but it should be as simple as:

```bash
pip install -e ".[dev,train]"
```

## Creating our working directory
For our experiments, we'll use the following folder to save the model, training artifacts, and our working configs.

In [1]:
from pathlib import Path

tutorial_dir = "tour_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

# Training a model

OUMI supports training both custom and out-of-the-box models. Want to try out a model on HuggingFace? We can do that. Want to train your own custom Pytorch model? We've got you covered there too.

## A quick demo

Let's try training a pre-existing model on HuggingFace. We'll use GPT2 as it's small and trains quickly.

OUMI uses [training configuration files](https://learning-machines.ai/docs/latest/apidoc/oumi.core.configs.html#oumi.core.configs.TrainingConfig) to specify training parameters. We've already created a training config for GPT2--let's give it a try!

In [2]:
%%writefile $tutorial_dir/train.yaml

model:
  model_name: "gpt2" # 124M params
  model_max_length: 128
  torch_dtype_str: "bfloat16"
  load_pretrained_weights: False
  trust_remote_code: True
  model_kwargs:
    disable_dropout: True

data:
  train:
    datasets:
      - dataset_name: "HuggingFaceFW/fineweb-edu"
        subset: "sample-10BT"
        split: "train"
    stream: True
    pack: True
    target_col: "text"

training:
  trainer_type: TRL_SFT
  per_device_train_batch_size: 2
  max_steps: 10

  enable_gradient_checkpointing: False
  gradient_checkpointing_kwargs:
    use_reentrant: False

  learning_rate: 6.0e-04
  lr_scheduler_type: "cosine_with_min_lr"
  lr_scheduler_kwargs:
    min_lr_rate: 0.1
  warmup_steps: 715
  adam_beta1: 0.9
  adam_beta2: 0.95
  weight_decay: 0.1

  run_name: "gpt2_pt"


Writing tour_tutorial/train.yaml


In [3]:
from oumi.core.configs import TrainingConfig
from oumi.train import train

config = TrainingConfig.from_yaml(str(Path(tutorial_dir) / "train.yaml"))
config.training.output_dir = str(Path(tutorial_dir) / "output")

train(config)

[2024-09-19 09:56:43,046][oumi][rank0][pid:231378][MainThread][INFO]][train.py:124] Creating training.output_dir: tour_tutorial/output...
[2024-09-19 09:56:43,047][oumi][rank0][pid:231378][MainThread][INFO]][train.py:126] Created training.output_dir absolute path: /home/user/oumi/notebooks/tour_tutorial/output
[2024-09-19 09:56:43,048][oumi][rank0][pid:231378][MainThread][INFO]][train.py:124] Creating training.telemetry_dir: tour_tutorial/output/telemetry...
[2024-09-19 09:56:43,048][oumi][rank0][pid:231378][MainThread][INFO]][train.py:126] Created training.telemetry_dir absolute path: /home/user/oumi/notebooks/tour_tutorial/output/telemetry
[2024-09-19 09:56:43,049][oumi][rank0][pid:231378][MainThread][INFO]][torch_utils.py:48] Torch version: 2.4.1+cu121. NumPy version: 2.1.1
[2024-09-19 09:56:43,050][oumi][rank0][pid:231378][MainThread][INFO]][torch_utils.py:59] CUDA version: 12.1 CuDNN version: 90.1.0
[2024-09-19 09:56:43,939][oumi][rank0][pid:231378][MainThread][INFO]][torch_utils.

README.md:   0%|          | 0.00/23.2k [00:00<?, ?B/s]

Resolving data files:   0%|          | 0/1630 [00:00<?, ?it/s]

[2024-09-19 09:57:04,048][oumi][rank0][pid:231378][MainThread][INFO]][torch_profiler_utils.py:148] PROF: Torch Profiler disabled!
[2024-09-19 09:57:04,098][oumi][rank0][pid:231378][MainThread][INFO]][training.py:44] SFTConfig(output_dir='tour_tutorial/output',
          overwrite_output_dir=False,
          do_train=False,
          do_eval=False,
          do_predict=False,
          eval_strategy=<IntervalStrategy.NO: 'no'>,
          prediction_loss_only=False,
          per_device_train_batch_size=2,
          per_device_eval_batch_size=8,
          per_gpu_train_batch_size=None,
          per_gpu_eval_batch_size=None,
          gradient_accumulation_steps=1,
          eval_accumulation_steps=None,
          eval_delay=0,
          torch_empty_cache_steps=None,
          learning_rate=0.0006,
          weight_decay=0.1,
          adam_beta1=0.9,
          adam_beta2=0.95,
          adam_epsilon=1e-08,
          max_grad_norm=1.0,
          num_train_epochs=3,
          max_steps=10

Step,Training Loss


[2024-09-19 09:57:07,739][oumi][rank0][pid:231378][MainThread][INFO]][train.py:312] Training is Complete.
[2024-09-19 09:57:07,740][oumi][rank0][pid:231378][MainThread][INFO]][debugging_utils.py:74] Max Memory Usage After Training: GPU memory occupied: 3687.0 MiB.
[2024-09-19 09:57:07,744][oumi][rank0][pid:231378][MainThread][INFO]][debugging_utils.py:106] Device Temperature After Training: GPU temperature: 39.0 C.
[2024-09-19 09:57:07,745][oumi][rank0][pid:231378][MainThread][INFO]][train.py:319] Saving final state...
[2024-09-19 09:57:07,746][oumi][rank0][pid:231378][MainThread][INFO]][train.py:322] Saving final model...
[2024-09-19 09:57:07,958][oumi][rank0][pid:231378][MainThread][INFO]][hf_trainer.py:56] Model has been saved at tour_tutorial/output.


Congratulations, you've trained your first model using OUMI!

You can also train your own custom Pytorch model. We cover that in depth in our [Finetuning Tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/OUMI%20-%20Finetuning%20Tutorial.ipynb).

# Model Inference

Now that you've trained a model, let's run inference.

In [4]:
%%writefile $tutorial_dir/train_inference_config.yaml

model:
  model_name: "tour_tutorial/output"
  trust_remote_code: true
  torch_dtype_str: "half"
  device_map: "auto"

generation:
  max_new_tokens: 128
  batch_size: 1

Writing tour_tutorial/train_inference_config.yaml


In [11]:
from oumi.core.configs import InferenceConfig
from oumi.infer import infer

config = InferenceConfig.from_yaml(
    str(Path(tutorial_dir) / "train_inference_config.yaml")
)

input_text = (
    "Remember that we didn't train for long, so the results might not be great."
)

results = infer(config.model, config.generation, [input_text])

print(results[0])

[2024-09-19 10:04:03,323][oumi][rank0][pid:231378][MainThread][INFO]][models.py:144] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2024-09-19 10:04:03,324][oumi][rank0][pid:231378][MainThread][INFO]][models.py:240] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
2024-09-19:10:04:03,340 INFO     [modeling.py:1086] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Generating Model Responses:   0%|                                                                 | 0/1 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Generating Model Responses: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.28it/s]

 stream stream stream stream stream stream stream stream stream stream stream stream stream stream stream stream stream stream hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat hepat circulate circulate circulate circulate circulate circulate circulate circulate circulate rapist rapist rapist Mult Mult Mult Mult Mult combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination combination




We can also run inference using the pretrained model by slightly tweaking our config:

In [10]:
pretrained_config = InferenceConfig.from_yaml(
    str(Path(tutorial_dir) / "train_inference_config.yaml")
)
pretrained_config.model.model_name = "gpt2"

input_text = "Input for the pretrained model: What is your name? "

results = infer(pretrained_config.model, pretrained_config.generation, [input_text])

print(results[0])

[2024-09-19 10:03:55,553][oumi][rank0][pid:231378][MainThread][INFO]][models.py:144] Building model using device_map: auto (DeviceRankInfo(world_size=1, rank=0, local_world_size=1, local_rank=0))...
[2024-09-19 10:03:55,651][oumi][rank0][pid:231378][MainThread][INFO]][models.py:240] Using model class: <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> to instantiate model.
2024-09-19:10:03:55,666 INFO     [modeling.py:1086] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Generating Model Responses:   0%|                                                                 | 0/1 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Generating Model Responses: 100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.30it/s]

The first time I saw the new "The Walking Dead" trailer, I was so excited. I was so excited to see the first trailer for the upcoming season of the AMC series. I was so excited to see the first trailer for the upcoming season of the AMC series.

I was so excited to see the first trailer for the upcoming season of the AMC series. I was so excited to see the first trailer for the upcoming season of the AMC series.

I was so excited to see the first trailer for the upcoming season of the AMC series. I was so excited to see the first trailer for the upcoming season of the





# Evaluating a model against common benchmarks

You can use OUMI to evaluate pretrained and tuned models against standard benchmarks. For example, let's evaluate the pretrained version of our GPT2 model against `Hellaswag`:

In [12]:
%%writefile $tutorial_dir/eval.yaml

model:
  model_name: "gpt2"
  trust_remote_code: True

data:
  datasets:
    - dataset_name: "hellaswag"

generation:
  batch_size: 0  # This will let LM HARNESS decide.

evaluation_framework: LM_HARNESS

Writing tour_tutorial/eval.yaml


In [13]:
from oumi.core.configs import EvaluationConfig
from oumi.evaluate import evaluate

eval_config = EvaluationConfig.from_yaml(str(Path(tutorial_dir) / "eval.yaml"))
# Uncomment the following line to run evals against the V1 HuggingFace Leaderboard.
# This may take a while.
# eval_config.data.datasets[0].dataset_name = "huggingface_leaderboard_v1"

evaluate(eval_config)

2024-09-19:10:04:21,328 INFO     [evaluator.py:152] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-09-19:10:04:21,328 INFO     [evaluator.py:176] Initializing hf model, with arguments: {'pretrained': 'gpt2', 'trust_remote_code': True, 'parallelize': False}
2024-09-19:10:04:21,329 INFO     [huggingface.py:170] Using device 'cuda:0'


hellaswag.py:   0%|          | 0.00/4.36k [00:00<?, ?B/s]

dataset_infos.json:   0%|          | 0.00/2.53k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/6.84k [00:00<?, ?B/s]

The repository for hellaswag contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hellaswag.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N]  y


Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/11.8M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/12.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/39905 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10003 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10042 [00:00<?, ? examples/s]

Map:   0%|          | 0/39905 [00:00<?, ? examples/s]

Map:   0%|          | 0/10042 [00:00<?, ? examples/s]

2024-09-19:10:04:44,461 INFO     [evaluator.py:261] Setting fewshot random generator seed to 1234
2024-09-19:10:04:44,462 INFO     [task.py:411] Building contexts for hellaswag on rank 0...
100%|███████████████████████████████████████████████████████████████████████████| 10042/10042 [00:01<00:00, 6011.29it/s]
2024-09-19:10:04:46,709 INFO     [evaluator.py:438] Running loglikelihood requests
Running loglikelihood requests: 100%|████████████████████████████████████████████| 40168/40168 [03:17<00:00, 203.64it/s]
[2024-09-19 10:08:17,897][oumi][rank0][pid:231378][MainThread][INFO]][evaluate.py:199] hellaswag's metric dictionary is {'acc,none': 0.2894841665006971,
 'acc_norm,none': 0.3110934076877116,
 'acc_norm_stderr,none': 0.004619948037222892,
 'acc_stderr,none': 0.004525960965551725,
 'alias': 'hellaswag',
 'elapsed_time_sec': 236.5706021785736}


# Launching Jobs

Often times you'll need to run various tasks (training, evaluation, etc) on remote hardware that's better suited for the task. OUMI can handle this for you by launching jobs on various compute clusters. For more information about running jobs, see our [Running Jobs Remotely tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/OUMI%20-%20Running%20Jobs%20Remotely.ipynb). For running jobs on custom clusters, see our [Launching Jobs on Custom Clusters tutorial](https://github.com/oumi-ai/oumi/blob/main/notebooks/OUMI%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).


Today, OUMI supports running jobs on several cloud provider platforms.

For the latest list, we can run the `which_cloud` method:

In [14]:
%%writefile $tutorial_dir/job.yaml

name: hello-world
resources:
  cloud: local

# Upload working directory to remote.
working_dir: .

envs:
  TEST_ENV_VARIABLE: '"Hello, World!"'
  OUMI_LOGGING_DIR: "tour_tutorial/logs"


run: |
  set -e  # Exit if any command failed.

  echo "$TEST_ENV_VARIABLE"
  oumi-train -c tour_tutorial/train.yaml


Writing tour_tutorial/job.yaml


In [15]:
import oumi.launcher as launcher

print("Supported Clouds in OUMI:")
for cloud in launcher.which_clouds():
    print(cloud)

Supported Clouds in OUMI:
local
polaris
runpod
gcp
lambda


Let's run a simple "Hello World" job locally to demonstrate how to use the OUMI job launcher. This job will echo `Hello World`, then run the same GPT2 training job executed above.

In [16]:
import time

job_config = launcher.JobConfig.from_yaml(str(Path(tutorial_dir) / "job.yaml"))
cluster, job = launcher.up(job_config, cluster_name=None)

while job and not job.done:
    print("Job is running...")
    time.sleep(5)
    job = cluster.get_job(job.id)
print("Job is done!")

Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is running...
Job is done!


The job created logs under `/tour_tutorial/logs`. Let's take a look:

In [17]:
logs_dir = Path(tutorial_dir) / "logs"
for log_file in logs_dir.iterdir():
    print(f"Log file: {log_file}")
    with open(log_file, "r") as f:
        print(f.read())

Log file: tour_tutorial/logs/2024_09_19_10_10_14_180_0.stdout
"Hello, World!"
{'train_runtime': 2.4145, 'train_samples_per_second': 8.283, 'train_steps_per_second': 4.142, 'train_loss': 10.95625, 'epoch': 1.0}

Log file: tour_tutorial/logs/2024_09_19_10_10_14_180_0.stderr
[2024-09-19 10:10:17,059][oumi][rank0][pid:235563][MainThread][INFO]][train.py:176] Setting random seed to 42 on rank 0.
[2024-09-19 10:10:17,060][oumi][rank0][pid:235563][MainThread][INFO]][train.py:124] Creating training.telemetry_dir: output/telemetry...
[2024-09-19 10:10:17,060][oumi][rank0][pid:235563][MainThread][INFO]][train.py:126] Created training.telemetry_dir absolute path: /home/user/oumi/notebooks/output/telemetry
[2024-09-19 10:10:17,060][oumi][rank0][pid:235563][MainThread][INFO]][torch_utils.py:48] Torch version: 2.4.1+cu121. NumPy version: 2.1.1
[2024-09-19 10:10:17,061][oumi][rank0][pid:235563][MainThread][INFO]][torch_utils.py:59] CUDA version: 12.1 CuDNN version: 90.1.0
[2024-09-19 10:10:17,061][ou

# Customizing datasets and clusters

OUMI offers rich customization that allows users to build custom solutions on top of our existing building blocks. Several of OUMI's primary resources (Datasets, Clouds, etc) leverage the OUMI Registry when invoked.

This registry allows users to build custom classes that function as drop-in replacements for core functionality.

For more details on registering custom datasets, see the [demo here](https://github.com/oumi-ai/oumi/blob/main/USAGE.md#6-custom-datasets).

For a tutorial on writing a custom cloud/cluster for running jobs, see the [tutorial here](https://github.com/oumi-ai/oumi/blob/main/notebooks/OUMI%20-%20Launching%20Jobs%20on%20Custom%20Clusters.ipynb).

You can find further information about the required registry decorators [here](https://learning-machines.ai/docs/latest/apidoc/oumi.core.html#oumi.core.registry.register_cloud_builder).

# What's next?

Now that you've completed our tour, you're ready to tackle our other [notebook guides](https://github.com/oumi-ai/oumi/tree/main/notebooks). 

Make sure you also take a look at our [Usage guide](https://github.com/oumi-ai/oumi/blob/main/USAGE.md) for an overview of our CLI.
