# Fine-tuning Isaac GR00T N1.5 for LeRobot SO-101 Arm

> _This tutorial is based on the [official Hugging Face blog post](https://huggingface.co/blog/nvidia/gr00t-n1-5-so101-tuning) by the NVIDIA team._

<div style="border: 4px solid #f84773ff; padding: 12px; margin: 16px 0; border-radius: 4px;">
<strong>⚠️ Note:</strong> You don't need to run any command here. We fine-tuned the model for you. Just get familiar with fine-tuning steps.
</div>

## Introduction

This notebook provides a comprehensive tutorial on how to post-train (fine-tune) [Isaac GR00T N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B) using teleoperation data from a single SO-101 robot arm.

[NVIDIA Isaac GR00T](https://developer.nvidia.com/isaac/gr00t) (Generalist Robot 00 Technology) is a research and development platform for building robot foundation models and data pipelines, designed to accelerate the creation of intelligent, adaptable robots.

Isaac GR00T N1.5 is the first major update to Isaac GR00T N1, the world's first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model processes multimodal inputs, including language and images, to perform manipulation tasks across diverse environments. It is adaptable through post-training for specific embodiments, tasks, and environments.

![GR00T Demo](https://cdn-uploads.huggingface.co/production/uploads/67b8da81d01134f89899b4a7/rbIZbAfvia_oWaztGlRbu.gif)

## Step 1: Dataset Preparation

Users can fine-tune GR00T N1.5 with any LeRobot dataset. For this tutorial, we'll use the [table cleanup task](https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=youliangtan%2Fso100-table-cleanup) as an example.

**Important**: Datasets for the SO-100 or SO-101 are not included in GR00T N1.5's initial pre-training. Therefore, we'll be training it as a `new_embodiment`.

![Dataset Example](https://cdn-uploads.huggingface.co/production/uploads/67c205dafa508474d715f1d6/fu3Xd1XX4fP2TYJ6hv38Z.png)

### 1.1 Download Your Dataset

** Download Example Dataset**
Download the pre-existing [so101-table-cleanup](https://huggingface.co/datasets/youliangtan/so101-table-cleanup) dataset:

```bash
huggingface-cli download \
    --repo-type dataset youliangtan/so101-table-cleanup \
    --local-dir ./demo_data/so101-table-cleanup
```

### 1.2 Configure Modality File

The `modality.json` file provides essential information about state and action modalities to make the dataset "GR00T-compatible".

**For dual-camera setup (like SO-101):**

```bash
cp getting_started/examples/so100_dualcam__modality.json ./demo_data/so101-table-cleanup/meta/modality.json
```

### 1.3 Verify Dataset Loading

Test that your dataset can be loaded correctly:

```bash
python scripts/load_dataset.py \
    --dataset-path ./demo_data/so101-table-cleanup \
    --plot-state-action \
    --video-backend torchvision_av
```

This script will visualize the dataset and confirm it's properly formatted for GR00T training.

## Step 2: Fine-tuning the Model

Now we'll fine-tune GR00T N1.5 using the prepared dataset. The fine-tuning process adapts the pre-trained model to your specific robotic embodiment and tasks.

### 2.1 Basic Fine-tuning Command

Execute the following command to start fine-tuning:

```bash
python scripts/gr00t_finetune.py \
   --dataset-path ./demo_data/so101-table-cleanup/ \
   --num-gpus 1 \
   --output-dir ./so101-checkpoints  \
   --max-steps 10000 \
   --data-config so100_dualcam \
   --video-backend torchvision_av
```

### 2.2 Fine-tuning Parameters Explained

- `--dataset-path`: Path to your prepared dataset
- `--num-gpus`: Number of GPUs to use for training
- `--output-dir`: Directory where model checkpoints will be saved
- `--max-steps`: Maximum number of training steps
- `--data-config`: Configuration matching your camera setup
- `--video-backend`: Backend for video processing

**Advanced Training Parameters:**

- `--no-tune_diffusion_model`: Reduces VRAM usage by ~10GB
- `--batch-size`: Adjust based on available memory
- `--lora-rank`: Lower values reduce memory usage
- `--dataloader-num-workers`: Adjust for CPU/memory balance

### 2.4 Monitoring Training Progress

During training, monitor:

- Loss curves in the output logs
- Checkpoint saving frequency
- Memory usage
- Training time estimates

Training typically takes several hours depending on dataset size and hardware.
