## Converting Existing LeRobot Datasets from v2.1 to v3.0

Hi! You've stumbled upon a notebook by mandryl.io for converting LeRobot v2.1 datasets to the newest, v3.0 format.

We created this because we still experienced some workflow difficulties when running our own experiments and thought this would be a neat way to help people overcome early setup hassles so that you can get to what matters with your time and ambitions - training AI models for robots!

We kindly ask that for any datasets you port over, please ensure you attribute any of the original dataset sources :)

### Prerequisites

- A personal token from `huggingface` with read/write repo permissions
- A `repo_id` for a dataset you wish to port from `v2.1` to `v3.0`

In [1]:
%pip install lerobot==0.4.3 --quiet

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m993.6/993.6 kB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.8/71.8 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.4/91.4 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.1/4.1 MB[0m [31m91.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.3/564.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.7/67

In [2]:
import huggingface_hub
huggingface_hub.login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Configuration

Fill in the two values below and then run the rest of the notebook — no other changes needed.

In [3]:
ORIGINAL_REPO_ID = input("Enter the repo ID of the v2.1 dataset to convert (e.g. youliangtan/so101-table-cleanup): ").strip()
DESTINATION_REPO_ID = input("Enter the repo ID where you want to upload the v3.0 dataset (e.g. my-org/so101-table-cleanup): ").strip()

print(f"\nSource:      {ORIGINAL_REPO_ID}")
print(f"Destination: {DESTINATION_REPO_ID}")


Source:      5hadytru/so101_IF_7
Destination: mandryl-io/so101_IF_7


### Step 1 — Download the v2.1 dataset

In [6]:
from huggingface_hub import snapshot_download
from pathlib import Path

local_dataset_dir = Path("./datasets")
local_dataset_dir.mkdir(exist_ok=True)

local_path = snapshot_download(
    repo_id=ORIGINAL_REPO_ID,
    repo_type="dataset",
    local_dir=local_dataset_dir / ORIGINAL_REPO_ID,
)

print(f"Dataset downloaded to: {local_path}")

Returning existing local_dir `datasets/5hadytru/so101_IF_7` as remote repo cannot be accessed in `snapshot_download` (429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/5hadytru/so101_IF_7/revision/main (Request ID: Root=1-69930469-4a1c80e006c697183ecbe4d9;40535b95-5c12-4464-81b4-4983c087de38)

We had to rate limit your IP (34.158.62.229). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API.).

We had to rate limit your IP (34.158.62.229). To continue using our service, create a HF account or login to your existing account, and make sure you pass a HF_TOKEN if you're using the API.).


Dataset downloaded to: /content/datasets/5hadytru/so101_IF_7


### Step 2 — Convert to v3.0

In [7]:
from lerobot.datasets.v30.convert_dataset_v21_to_v30 import convert_dataset

convert_dataset(
    repo_id=ORIGINAL_REPO_ID,
    branch=None,
    data_file_size_in_mb=None,
    video_file_size_in_mb=None,
    root=str(local_dataset_dir),
    push_to_hub=False,
    force_conversion=False,
)

print("Conversion complete!")

ValueError: Local dataset has codebase version 'v3.0', expected 'v2.1'. This script is specifically for converting v2.1 datasets to v3.0.

### Step 3 — Upload to Hugging Face

In [None]:
import subprocess, sys

local_converted_path = str(local_dataset_dir / ORIGINAL_REPO_ID)

subprocess.run(
    [sys.executable, "-m", "huggingface_hub", "repo", "create",
     DESTINATION_REPO_ID, "--type", "dataset"],
    check=False,
)

subprocess.run(
    [sys.executable, "-m", "huggingface_hub", "upload",
     DESTINATION_REPO_ID, local_converted_path, "--repo-type", "dataset"],
    check=True,
)

print(f"Uploaded to {DESTINATION_REPO_ID}")

In [None]:
from huggingface_hub import HfApi

api = HfApi()
api.create_tag(
    repo_id=DESTINATION_REPO_ID,
    repo_type="dataset",
    tag="v3.0",
    tag_message="Tagging v3.0 to match codebase_version",
)

print(f"Tagged {DESTINATION_REPO_ID} with v3.0")

### Step 4 — Verify the converted dataset

In [None]:
from lerobot.datasets.lerobot_dataset import LeRobotDataset

dataset = LeRobotDataset(
    repo_id=DESTINATION_REPO_ID,
    force_cache_sync=True,
)

print(f"Dataset loaded successfully from {DESTINATION_REPO_ID}!")
print(f"Dataset metadata: {dataset.meta}")