A high-throughput orchestrator for synthetic robotics datasets; named after Harry Domin of R.U.R., the visionary who sought to free humanity from labor through the mass production of robots.
domin_demo.mp4
domin makes the generation of synthetic datasets for Visual-Language-Action (VLA) models easy, performant, and scalable.
While most current simulation in robotics is geared towards Reinforcement Learning (RL), this package leverages the same massively parallel, multi-environment paradigm to generate training data. This allows you to produce large-scale datasets using NVIDIA Isaac Sim and Isaac Lab with a single configuration file.
This package generates datasets in the popular LeRobot format, compatible with Hugging Face's LeRobot library. The core dataset builder is a modified and extended version of the LeRobot dataset builder, optimized for simulation environments.
Key features inherited from our extended dataset_builder:
- Ease of Use: A single command to install the package and a single configuration file can generate a complete dataset.
- Simultaneous Episode Recording: Record multiple episodes (environments) in parallel for high throughput, significantly speeding up data generation.
- Scheduled Re-recording: Robust handling of failed episodes. If an episode fails, it is automatically cleared and scheduled for a retry in the next batch, ensuring dataset completeness without manual intervention.
- Metadata & Custom Metrics: Easily save arbitrary metadata (e.g., success rates, simulation parameters) and automatically compute episode statistics in the dataset's
info.json. - LeRobot Format Compatibility: Produces datasets in the standard LeRobot format (Parquet files with embedded or external images), ready for training.
You can install this package via pip. Note that you must specify the NVIDIA PyPI index for isaacsim and related dependencies.
pip install domin --extra-index-url https://pypi.nvidia.comFor advanced installation instructions, including setting up Isaac Sim and creating a development environment, please see CONTRIBUTING.md.
To generate a dataset, use the domin-gen command with a dataset configuration file.
The dataset configuration file should inherit domin.base_dataset_config.BaseDatasetConfig (see example).
domin-gen examples/dexterous_dataset_config.py --num_envs 10 --num_episodes 100config_path: Path to the Python file containing the dataset configuration (e.g.,examples/dexterous_dataset_config.py).--num_envs: (Optional) Number of parallel environments to simulate (default: 1).--num_episodes: (Required) Total number of episodes to record.
This project builds upon the excellent work of the Hugging Face LeRobot team. The domin.dataset_builder module is a modified adaptation of their dataset building tools, tailored for the specific needs of massive parallel simulation in Isaac Lab. I gratefully acknowledge their contributions to the open-source robotics community.