# Train Models Using LeRobot on MI300x

This guide walks you through setting up environment for training imitation learning policies using LeRobot library on a DigitalOcean (DO) instance equipped with AMD MI300x GPUs and ROCm.

## ⚙️ Requirements
- A Hugging Face dataset repo ID containing your training data (`--dataset.repo_id=${HF_USER}/${DATASET_NAME}`).
  If you don’t have an access token yet, you can sign up for Hugging Face [here](https://huggingface.co/join). After signing up, create an access token by visiting [here](https://huggingface.co/settings/tokens).
- A wandb account to enable training visualization and upload your training evidence to our github.
  You can sign up for Wandb [here](https://wandb.ai/signup) and visit [here](https://wandb.ai/authorize) to create a token.
- Access to DO instance AMD Mi300x GPU


## Verify ROCm and GPU availability
This cell uses `pytorch` to check AMD GPU Info. The expected ouput is 
```
CUDA compatible device availability: True
device name [0]: AMD Instinct MI300X VF
```

In [1]:
import torch
print(f'CUDA compatible device availability:',torch.cuda.is_available())
print(f'device name [0]:', torch.cuda.get_device_name(0))


CUDA compatible device availability: True
device name [0]: AMD Instinct MI300X VF


## Install FFmpeg 7.x
This cell uses `apt` to install ffmpeg 7.x for LeRobot.

In [2]:
!add-apt-repository ppa:ubuntuhandbook1/ffmpeg7 -y # install PPA which contains ffmpeg 7.x
!apt update && apt install ffmpeg -y

Repository: 'Types: deb
URIs: https://ppa.launchpadcontent.net/ubuntuhandbook1/ffmpeg7/ubuntu/
Suites: noble
Components: main
'
Description:
unofficial build for FFmpeg 7 for Ubuntu 22.04 | 24.04, backport from Debian's deb.multimedia.org repository

If the packages here are helpful, you may buy me a coffee:

         https://ko-fi.com/ubuntuhandbook1
More info: https://launchpad.net/~ubuntuhandbook1/+archive/ubuntu/ffmpeg7
Adding repository.
Hit:1 https://repo.radeon.com/amdgpu/30.10/ubuntu jammy InRelease
Hit:2 https://repo.radeon.com/rocm/apt/7.0 jammy InRelease                     
Hit:3 https://repo.radeon.com/graphics/7.0/ubuntu jammy InRelease              
Hit:4 http://archive.ubuntu.com/ubuntu noble InRelease                         
Get:5 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB]      
Get:6 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]        
Get:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu noble InRelease [17.8 kB]


## Install LeRobot v0.4.1
This cell clones the `lerobot` repository from Hugging Face, and installs the package in editable mode. Extra Features: To install additional dependencies for training SmolVLA or Pi models, refer to the [LeRobot offical page](https://huggingface.co/docs/lerobot/index). 


In [3]:
!git clone https://github.com/huggingface/lerobot.git
!cd lerobot && git checkout -b v0.4.2 v0.4.2 # let’s synchronize using this version
!cd lerobot && pip install -e .

Cloning into 'lerobot'...
remote: Enumerating objects: 44252, done.[K
remote: Counting objects: 100% (281/281), done.[K
remote: Compressing objects: 100% (173/173), done.[K
remote: Total 44252 (delta 214), reused 113 (delta 105), pack-reused 43971 (from 4)[K
Receiving objects: 100% (44252/44252), 220.73 MiB | 74.92 MiB/s, done.
Resolving deltas: 100% (28517/28517), done.
Switched to a new branch 'v0.4.2'
Obtaining file:///workspace/lerobot
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
[?25hCollecting datasets<4.2.0,>=4.0.0 (from lerobot==0.4.2)
  Downloading datasets-4.1.1-py3-none-any.whl.metadata (18 kB)
Collecting diffusers<0.36.0,>=0.27.2 (from lerobot==0.4.2)
  Downloading diffusers-0.35.2-py3-none-any.whl.metadata (20 kB)
Collecting huggingface-hub<0.36.0,>=0.34.2 (from hugg

## Weights & Biases login
This cell install and log into Weights & Biases (wandb) to enable experiment tracking and logging.

In [None]:
!pip install wandb
import wandb
wandb.login(key="")

## Login into Hugging Face Hub

In [20]:
from huggingface_hub import login
login(token="")

## Start Training Models with LeRobot

This cell uses the lerobot-train CLI from the lerobot library to train a robot control policy.  

Make sure to adjust the following arguments to your setup:

1. `--dataset.repo_id=YOUR_HF_USERNAME/YOUR_DATASET`:  
   Replace this with the Hugging Face Hub repo ID where your dataset is stored, e.g., `lerobot/svla_so100_pickplace`.

2. `--policy.type=act`:  
   Specifies the policy configuration to use. `act` refers to [configuration_act.py](../lerobot/common/policies/act/configuration_act.py), which will automatically adapt to your dataset’s setup (e.g., number of motors and cameras).

3. `--output_dir=outputs/train/...`:  
   Directory where training logs and model checkpoints will be saved.

4. `--job_name=...`:  
   A name for this training job, used for logging and Weights & Biases.The name typically includes the model type (e.g., act, smolvla), the dataset name, and additional descriptive tags.

5. `--policy.device=cuda`:  
   Use `cuda` if training on an AMD or NVIDIA GPU. 

6. `--wandb.enable=true`:  
   Enables Weights & Biases for visualizing training progress. You must be logged in via `wandb login` before running this.

7. `--policy.push_to_hub=`:

   Enables automatic uploading of the trained policy to the Hugging Face Hub. You must specify `--policy.repo_id` (e.g., ${HF_USER}/{REPO_NAME}) if it is True.

In [19]:
!lerobot-train \
  --dataset.repo_id=masato-ka/greate-akihabara-take2-misson1 \
  --batch_size=8 \
  --save_freq=5000 \
  --steps=20000 \
  --output_dir=outputs/train/act_greate_akihabara_misson1 \
  --job_name=greate_akihabara_misson1 \
  --policy.device=cuda \
  --policy.type=act \
  --policy.push_to_hub=true \
  --policy.repo_id=masato-ka/act-greate-akihabara-take2-misson1 \
  --wandb.enable=true

INFO 2025-12-05 14:16:55 ot_train.py:163 {'batch_size': 8,
 'checkpoint_path': None,
 'dataset': {'episodes': None,
             'image_transforms': {'enable': False,
                                  'max_num_transforms': 3,
                                  'random_order': False,
                                  'tfs': {'affine': {'kwargs': {'degrees': [-5.0,
                                                                            5.0],
                                                                'translate': [0.05,
                                                                              0.05]},
                                                     'type': 'RandomAffine',
                                                     'weight': 1.0},
                                          'brightness': {'kwargs': {'brightness': [0.8,
                                                                                   1.2]},
                                                         't

**Notes**:

- If using a local dataset, add `--dataset.root=/path/to/dataset`.
- Adjust `--batch_size` and `--steps` based on your hardware and dataset.
- Model checkpoints, logs, and training plots will be saved to the specified `--output_dir`
- Training progress visualized in your wandb dashboard


## Upload Checkpoints to Hugging Face
Now after training is done, upload the last checkpoint. You may refer to [here](https://github.com/huggingface/lerobot/blob/v0.4.0/README.md#add-a-pretrained-policy) for details.

In [21]:
!huggingface-cli upload masato-ka/act-greate-akihabara-take2-misson1 outputs/train/act_greate_akihabara_misson1/checkpoints/last/pretrained_model
# e.g. huggingface-cli upload ${HF_USER}/act_so101_3cube_1ksteps \
#  outputs/train/act_so101_3cube_1ksteps/checkpoints/last/pretrained_model

Start hashing 7 files.
Finished hashing 7 files.
Processing Files (0 / 0)      : |                  |  0.00B /  0.00B            
New Data Upload               : |                  |  0.00B /  0.00B            [A

  ...d_model/model.safetensors:   0%|              |  782kB /  207MB            [A[A

Processing Files (0 / 1)      :   0%|              |  782kB /  207MB,   ???B/s  [A[A

Processing Files (0 / 1)      :   1%|▏             | 2.47MB /  207MB, 8.48MB/s  [A[A
New Data Upload               :   1%|▏             | 1.69MB /  134MB, 8.48MB/s  [A

Processing Files (0 / 1)      :  10%|█▎            | 19.9MB /  207MB, 47.8MB/s  [A[A
New Data Upload               :   9%|█▎            | 19.1MB /  201MB, 47.8MB/s  [A

Processing Files (0 / 1)      :  43%|█████▉        | 88.2MB /  207MB,  146MB/s  [A[A
New Data Upload               :  42%|█████▉        | 87.4MB /  206MB,  146MB/s  [A


  ...zer_processor.safetensors:  96%|█████████████▍| 7.25kB / 7.54kB            [A[A[A




## Miscs
1. Once the environment is setup, you can open a terminal session for training by navigating to `File → New Launcher → Other → Terminal`.
2. You can also upload your datasets to the container by clicking the `Upload Files` button in the left pane.

## Q&A
1. If you encounter an error like:
   ```
   FileExistsError: Output directory outputs/train/act_so101_3cube_1ksteps already exists and resume is False. Please change your output directory so that outputs/train/act_so101_3cube_1ksteps is not overwritten. 
   ```
   Remove the existing directory before proceeding:

In [None]:
!rm -fr outputs/train/act_so101_3cube_1ksteps

2. When running models other than ACT, ensure you install the required additional dependencies for those models.

In [None]:
# For smolVLA
!cd lerobot && pip install -e ".[smolvla]"
# For Pi
!cd lerobot && pip install -e ".[pi]"

3. If you want to resume the training from last checkpoint, run the command below:

In [None]:
!lerobot-train \
  --resume=true \
  --config_path=outputs/train/<job name>/checkpoints/last/pretrained_model/train_config.json \
  --steps=<new total steps>