Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,12 @@
title: Using LeRobotDataset
- local: porting_datasets_v3
title: Porting Large Datasets
- local: using_dataset_tools
title: Using the Dataset Tools
title: "Datasets"
- sections:
- local: act
title: ACT
- local: smolvla
title: SmolVLA
- local: pi0
Expand Down
92 changes: 92 additions & 0 deletions docs/source/act.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# ACT (Action Chunking with Transformers)

ACT is a **lightweight and efficient policy for imitation learning**, especially well-suited for fine-grained manipulation tasks. It's the **first model we recommend when you're starting out** with LeRobot due to its fast training time, low computational requirements, and strong performance.

<div class="video-container">
<iframe
width="100%"
height="415"
src="https://www.youtube.com/embed/ft73x0LfGpM"
title="LeRobot ACT Tutorial"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>
</div>

_Watch this tutorial from the LeRobot team to learn how ACT works: [LeRobot ACT Tutorial](https://www.youtube.com/watch?v=ft73x0LfGpM)_

## Model Overview

Action Chunking with Transformers (ACT) was introduced in the paper [Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware](https://arxiv.org/abs/2304.13705) by Zhao et al. The policy was designed to enable precise, contact-rich manipulation tasks using affordable hardware and minimal demonstration data.

### Why ACT is Great for Beginners

ACT stands out as an excellent starting point for several reasons:

- **Fast Training**: Trains in a few hours on a single GPU
- **Lightweight**: Only ~80M parameters, making it efficient and easy to work with
- **Data Efficient**: Often achieves high success rates with just 50 demonstrations

### Architecture

ACT uses a transformer-based architecture with three main components:

1. **Vision Backbone**: ResNet-18 processes images from multiple camera viewpoints
2. **Transformer Encoder**: Synthesizes information from camera features, joint positions, and a learned latent variable
3. **Transformer Decoder**: Generates coherent action sequences using cross-attention

The policy takes as input:

- Multiple RGB images (e.g., from wrist cameras, front/top cameras)
- Current robot joint positions
- A latent style variable `z` (learned during training, set to zero during inference)

And outputs a chunk of `k` future action sequences.

## Installation Requirements

1. Install LeRobot by following our [Installation Guide](./installation).
2. ACT is included in the base LeRobot installation, so no additional dependencies are needed!

## Training ACT

ACT works seamlessly with the standard LeRobot training pipeline. Here's a complete example for training ACT on your dataset:

```bash
lerobot-train \
--dataset.repo_id=${HF_USER}/your_dataset \
--policy.type=act \
--output_dir=outputs/train/act_your_dataset \
--job_name=act_your_dataset \
--policy.device=cuda \
--wandb.enable=true \
--policy.repo_id=${HF_USER}/act_policy
```

### Training Tips

1. **Start with defaults**: ACT's default hyperparameters work well for most tasks
2. **Training duration**: Expect a few hours for 100k training steps on a single GPU
3. **Batch size**: Start with batch size 8 and adjust based on your GPU memory

### Train using Google Colab

If your local computer doesn't have a powerful GPU, you can utilize Google Colab to train your model by following the [ACT training notebook](./notebooks#training-act).

## Evaluating ACT

Once training is complete, you can evaluate your ACT policy using the `lerobot-record` command with your trained policy. This will run inference and record evaluation episodes:

```bash
lerobot-record \
--robot.type=so100_follower \
--robot.port=/dev/ttyACM0 \
--robot.id=my_robot \
--robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 640, height: 480, fps: 30}}" \
--display_data=true \
--dataset.repo_id=${HF_USER}/eval_act_your_dataset \
--dataset.num_episodes=10 \
--dataset.single_task="Your task description" \
--policy.path=${HF_USER}/act_policy
```
7 changes: 4 additions & 3 deletions docs/source/il_robots.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -513,13 +513,14 @@ from lerobot.cameras.opencv.configuration_opencv import OpenCVCameraConfig
from lerobot.datasets.lerobot_dataset import LeRobotDataset
from lerobot.datasets.utils import hw_to_dataset_features
from lerobot.policies.act.modeling_act import ACTPolicy
from lerobot.policies.factory import make_pre_post_processors
from lerobot.robots.so100_follower.config_so100_follower import SO100FollowerConfig
from lerobot.robots.so100_follower.so100_follower import SO100Follower
from lerobot.scripts.lerobot_record import record_loop
from lerobot.utils.control_utils import init_keyboard_listener
from lerobot.utils.utils import log_say
from lerobot.utils.visualization_utils import init_rerun
from lerobot.record import record_loop
from lerobot.policies.factory import make_processor


NUM_EPISODES = 5
FPS = 30
Expand Down Expand Up @@ -562,7 +563,7 @@ init_rerun(session_name="recording")
# Connect the robot
robot.connect()

preprocessor, postprocessor = make_processor(
preprocessor, postprocessor = make_pre_post_processors(
policy_cfg=policy,
pretrained_path=HF_MODEL_ID,
dataset_stats=dataset.meta.stats,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/integrate_hardware.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ To that end, we provide the [`Robot`](https://github.com/huggingface/lerobot/blo

- Your own robot which exposes a communication interface (e.g. serial, CAN, TCP)
- A way to read sensor data and send motor commands programmatically, e.g. manufacturer's SDK or API, or your own protocol implementation.
- LeRobot installed in your environment. Follow our [Installation Guide](./installation.mdx).
- LeRobot installed in your environment. Follow our [Installation Guide](./installation).

## Choose your motors

Expand Down Expand Up @@ -65,7 +65,7 @@ class MyCoolRobotConfig(RobotConfig):
```
<!-- prettier-ignore-end -->

[Cameras tutorial](./cameras.mdx) to understand how to detect and add your camera.
[Cameras tutorial](./cameras) to understand how to detect and add your camera.

Next, we'll create our actual robot class which inherits from `Robot`. This abstract class defines a contract you must follow for your robot to be usable with the rest of the LeRobot tools.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/introduction_processors.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -297,9 +297,9 @@ LeRobot provides many registered processor steps. Here are the most commonly use

### Next Steps

- **[Implement Your Own Processor](implement_your_own_processor.mdx)** - Create custom processor steps
- **[Debug Your Pipeline](debug_processor_pipeline.mdx)** - Troubleshoot and optimize pipelines
- **[Processors for Robots and Teleoperators](processors_robots_teleop.mdx)** - Real-world integration patterns
- **[Implement Your Own Processor](./implement_your_own_processor)** - Create custom processor steps
- **[Debug Your Pipeline](./debug_processor_pipeline)** - Troubleshoot and optimize pipelines
- **[Processors for Robots and Teleoperators](./processors_robots_teleop)** - Real-world integration patterns

## Summary

Expand Down
33 changes: 33 additions & 0 deletions docs/source/lerobot-dataset-v3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -279,3 +279,36 @@ python -m lerobot.datasets.v30.convert_dataset_v21_to_v30 --repo-id=<HF_USER/DAT
- Aggregates parquet files: `episode-0000.parquet`, `episode-0001.parquet`, … β†’ **`file-0000.parquet`**, …
- Aggregates mp4 files: `episode-0000.mp4`, `episode-0001.mp4`, … β†’ **`file-0000.mp4`**, …
- Updates `meta/episodes/*` (chunked Parquet) with per‑episode lengths, tasks, and byte/frame offsets.

## Common Issues

### Always call `finalize()` before pushing

When creating or recording datasets, you **must** call `dataset.finalize()` to properly close parquet writers. See the [PR #1903](https://github.com/huggingface/lerobot/pull/1903) for more details.

```python
from lerobot.datasets.lerobot_dataset import LeRobotDataset

# Create dataset and record episodes
dataset = LeRobotDataset.create(...)

for episode in range(num_episodes):
# Record frames
for frame in episode_data:
dataset.add_frame(frame)
dataset.save_episode()

# Call finalize() when done recording and before push_to_hub()
dataset.finalize() # Closes parquet writers, writes metadata footers
dataset.push_to_hub()
```

**Why is this necessary?**

Dataset v3.0 uses incremental parquet writing with buffered metadata for efficiency. The `finalize()` method:

- Flushes any buffered episode metadata to disk
- Closes parquet writers to write footer metadata, otherwise the parquet files will be corrupt
- Ensures the dataset is valid for loading

Without calling `finalize()`, your parquet files will be incomplete and the dataset won't load properly.
2 changes: 1 addition & 1 deletion docs/source/phone_teleop.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ After running the example:
- Android: after starting the script, open the printed local URL on your phone, tap Start, then press and hold Move.
- iOS: open HEBI Mobile I/O first; B1 enables motion. A3 controls the gripper.

Additionally you can customize mapping or safety limits by editing the processor steps shown in the examples. You can also remap inputs (e.g., use a different analog input) or adapt the pipeline to other robots (e.g., LeKiwi) by modifying the input and kinematics steps. More about this in the [Processors for Robots and Teleoperators](./processors_robots_teleop.mdx) guide.
Additionally you can customize mapping or safety limits by editing the processor steps shown in the examples. You can also remap inputs (e.g., use a different analog input) or adapt the pipeline to other robots (e.g., LeKiwi) by modifying the input and kinematics steps. More about this in the [Processors for Robots and Teleoperators](./processors_robots_teleop) guide.

- Run this example to record a dataset, which saves absolute end effector observations and actions:

Expand Down
102 changes: 102 additions & 0 deletions docs/source/using_dataset_tools.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Using Dataset Tools

This guide covers the dataset tools utilities available in LeRobot for modifying and editing existing datasets.

## Overview

LeRobot provides several utilities for manipulating datasets:

1. **Delete Episodes** - Remove specific episodes from a dataset
2. **Split Dataset** - Divide a dataset into multiple smaller datasets
3. **Merge Datasets** - Combine multiple datasets into one. The datasets must have identical features, and episodes are concatenated in the order specified in `repo_ids`
4. **Add Features** - Add new features to a dataset
5. **Remove Features** - Remove features from a dataset

The core implementation is in `lerobot.datasets.dataset_tools`.
An example script detailing how to use the tools API is available in `examples/dataset/use_dataset_tools.py`.

## Command-Line Tool: lerobot-edit-dataset

`lerobot-edit-dataset` is a command-line script for editing datasets. It can be used to delete episodes, split datasets, merge datasets, add features, and remove features.

Run `lerobot-edit-dataset --help` for more information on the configuration of each operation.

### Usage Examples

#### Delete Episodes

Remove specific episodes from a dataset. This is useful for filtering out undesired data.

```bash
# Delete episodes 0, 2, and 5 (modifies original dataset)
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--operation.type delete_episodes \
--operation.episode_indices "[0, 2, 5]"

# Delete episodes and save to a new dataset (preserves original dataset)
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--new_repo_id lerobot/pusht_after_deletion \
--operation.type delete_episodes \
--operation.episode_indices "[0, 2, 5]"
```

#### Split Dataset

Divide a dataset into multiple subsets.

```bash
# Split by fractions (e.g. 80% train, 20% test, 20% val)
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--operation.type split \
--operation.splits '{"train": 0.8, "test": 0.2, "val": 0.2}'

# Split by specific episode indices
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--operation.type split \
--operation.splits '{"task1": [0, 1, 2, 3], "task2": [4, 5]}'
```

There are no constraints on the split names, they can be determined by the user. Resulting datasets are saved under the repo id with the split name appended, e.g. `lerobot/pusht_train`, `lerobot/pusht_task1`, `lerobot/pusht_task2`.

#### Merge Datasets

Combine multiple datasets into a single dataset.

```bash
# Merge train and validation splits back into one dataset
lerobot-edit-dataset \
--repo_id lerobot/pusht_merged \
--operation.type merge \
--operation.repo_ids "['lerobot/pusht_train', 'lerobot/pusht_val']"
```

#### Remove Features

Remove features from a dataset.

```bash
# Remove a camera feature
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--operation.type remove_feature \
--operation.feature_names "['observation.images.top']"
```

### Push to Hub

Add the `--push_to_hub` flag to any command to automatically upload the resulting dataset to the Hugging Face Hub:

```bash
lerobot-edit-dataset \
--repo_id lerobot/pusht \
--new_repo_id lerobot/pusht_after_deletion \
--operation.type delete_episodes \
--operation.episode_indices "[0, 2, 5]" \
--push_to_hub
```

There is also a tool for adding features to a dataset that is not yet covered in `lerobot-edit-dataset`.
Loading