A streamlined library for pretraining vision-language-action (VLA) models on robotics datasets. Derived from LeRobot, this library focuses specifically on efficient pretraining workflows across multi-GPU setups and SLURM clusters and can be considered as an official reproduction kit for SmolVLA.
VLAb is designed for researchers who want to pretrain VLA models on HuggingFace datasets efficiently. It provides:
- Pretraining-Focused Architecture: It includes built-in architecture and data-processing logic to let you iterate quickly on real-world datasets without environment setup overhead.
- SmolVLA Reproduction: Official reproduction kit for SmolVLA pretrainingβincludes almost exact datasets, configurations, and workflows used to train the original model
- Simple Setup with Reduced Dependencies: Single-command environment creation with
conda env create -f environment.yml - Distributed Training: Multi-GPU and multi-node support via Accelerate, tested on single machines and SLURM clusters
- Multi-Dataset Support: Train on multiple datasets simultaneously with configurable sampling strategies
Important: This library is optimized for pretraining. For fine-tuning and inference, we recommend using LeRobot with the latest updates. See the Migration to LeRobot section for checkpoint compatibility.
- Installation
- Quick Start
- Reproducing SmolVLA Training
- Video Backend Configuration
- Migration to LeRobot
- Troubleshooting
- Additional Resources
- Citation
- Project Structure
conda env create -f environment.yml
conda activate vlabexport PYTHONPATH="${PWD}/src:${PYTHONPATH}"For persistence, add to your shell config:
echo 'export PYTHONPATH="${PWD}/src:${PYTHONPATH}"' >> ~/.bashrc
source ~/.bashrcpython tests/test_installation.pyExpected output:
============================================================
VLAb Installation Test
============================================================
β TrainPipelineConfig
β Policy factory
β Dataset factory
============================================================
β
All tests passed!
Only needed if downloading datasets or models from the Hub:
huggingface-cli loginTrain SmolVLA2 on 2 datasets with a single GPU:
accelerate launch --config_file accelerate_configs/single_gpu.yaml \
src/lerobot/scripts/train.py \
--policy.type=smolvla2 \
--policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
--dataset.repo_id="Beegbrain/pick_lemon_and_drop_in_bowl,Chojins/chess_game_000_white_red" \
--dataset.video_backend=pyav \
--output_dir="./outputs/training" \
--batch_size=8 \
--steps=10000 \
--wandb.enable=true \
--wandb.project="smolvla2-quickstart"Note: --policy.repo_id specifies the base vision-language model (SmolVLM) to use. The trained model will be saved to --output_dir.
This will:
- Download the specified 2 datasets from HuggingFace Hub
- Train SmolVLA2 on a single GPU for 10,000 steps
- Save checkpoints to
./outputs/training - Log metrics to Weights & Biases
If you want to use VLAb with the SmolVLA pretraining datasets and reproduce SmolVLA results, use the following community datasets:
- Community Dataset v1: 128 datasets from 55 contributors (11.1K episodes, 5.1M frames, 46.9 hours, 119.3 GB) β the curated subset used to pretrain SmolVLA with quality filtering and manual task description curation
- Community Dataset v2: 340 datasets from 117 contributors (6.3K episodes, 5M frames, 46.6 hours, 59 GB) with LeRobot v2.0/v2.1 format support
Both datasets feature SO-100 robotic arm demonstrations focused on tabletop manipulation tasks, pick-and-place operations, and everyday object interactions. Community Dataset v1 represents a curated, high-quality subset with manually verified task descriptions, while v2 expands the collection with more contributors and datasets.
Dataset Structure: Both datasets use a hierarchical structure with contributor subdirectories:
community_dataset_v1/v2/
βββ contributor1/
β βββ dataset_name_1/
β β βββ data/
β β βββ videos/
β β βββ meta/
β βββ dataset_name_2/
βββ contributor2/
β βββ dataset_name_3/
βββ ...
Note that the downloading may take some time (3/4 hours), especially for the first dataset
# Download Community Dataset v1 (128 datasets, 11.1K episodes, 119.3 GB)
hf download HuggingFaceVLA/community_dataset_v1 \
--repo-type=dataset \
--local-dir /path/local_dir/community_dataset_v1
# Download Community Dataset v2 (340 datasets, 6.3K episodes, 59 GB)
hf download HuggingFaceVLA/community_dataset_v2 \
--repo-type=dataset \
--local-dir /path/local_dir/community_dataset_v2VLAb uses Accelerate for distributed training. Choose between provided configs or create your own:
# Configure your own (one-time setup)
accelerate config
# Or use provided configs
accelerate launch --config_file accelerate_configs/single_gpu.yaml ...
accelerate launch --config_file accelerate_configs/multi_gpu.yaml ...If you've pre-downloaded datasets, specify the root directory:
# Train on locally downloaded datasets
accelerate launch --config_file accelerate_configs/multi_gpu.yaml \
src/lerobot/scripts/train.py \
--policy.type=smolvla2 \
--policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
--dataset.repo_id="community_dataset_v2/airthebear/so100_GL,community_dataset_v2/acrampette/third_arm_01" \
--dataset.root="/path/to/datasets" \
--dataset.video_backend=pyav \
--dataset.features_version=2 \
--output_dir="./outputs/training" \
--batch_size=8 \
--steps=200000 \
--wandb.enable=true \
--wandb.project="smolvla2-training"Important: When using --dataset.root, the --dataset.repo_id paths should be relative to the root directory. For example:
- Root:
/path/to/datasets - Repo ID:
community_dataset_v1/user/dataset_name - Dataset location:
/path/to/datasets/community_dataset_v1/user/dataset_name
For distributed training on SLURM clusters, we provide example scripts:
examples/scripts/train_smolvla_optimized_fresh.slurm: Start training from scratchexamples/scripts/train_smolvla_resume.slurm: Resume from checkpoint
Usage:
- Edit the script to match your cluster configuration (partitions, nodes, GPUs, etc.)
- Submit the job:
sbatch examples/scripts/reproduce_smolvla.slurmFor detailed documentation on SLURM scripts, dataset configuration, and advanced options, see the Examples README.
Checkpoints from VLAb may not be directly compatible with the latest LeRobot version due to updated normalization formats. To use your pretrained models with LeRobot:
- Convert Checkpoint: Use the migration script from LeRobot
- Fine-tune: Follow the LeRobot fine-tuning guide
- Inference: Use LeRobot's updated inference pipeline
If you encounter corrupted files, outdated metadata, or persistent errors:
Manual Cache Cleanup
rm -rf ~/.cache/huggingface/datasets
rm -rf ~/.cache/huggingface/hub
rm -rf ~/.cache/huggingface/transformersAutomatic Cleanup in SLURM
In SLURM scripts, set CLEAN_CACHE=true to automatically clean cache before training.
Note: After cleaning cache, datasets will be re-downloaded on first use.
For additional help, open an issue on GitHub.
- LeRobot GitHub: Main LeRobot repository for fine-tuning and inference
- SmolVLA Fine-tuning Guide: Complete guide for fine-tuning with LeRobot
- LeRobot Installation: Detailed installation instructions
- Accelerate Documentation: Distributed training configuration
If you use this library in your research, please cite:
@misc{aubakirova2025vlab,
author = {Dana Aubakirova, Mustafa Shukor and Jade Cholgari and Leandro von Werra},
title = {VLAb: Your Laboratory for Pretraining VLAs},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/huggingface/vlab}}
}And the SmolVLA paper:
@article{shukor2025smolvla,
title = {SmolVLA: A vision-language-action model for affordable and efficient robotics},
author = {Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi},
year = {2025},
journal = {arXiv preprint},
eprint = {2506.01844},
archivePrefix = {arXiv},
primaryClass = {cs.RO}
}VLAb/
βββ src/lerobot/ # Core library code
β βββ configs/ # Configuration classes
β βββ datasets/ # Dataset loading and processing
β βββ optim/ # Optimizers and schedulers
β βββ policies/ # Policy implementations
β β βββ smolvla2/ # SmolVLA2 architecture
β βββ scripts/ # Training scripts
β βββ utils/ # Utility functions
βββ examples/ # Example scripts and notebooks
β βββ scripts/ # SLURM training scripts
β βββ all_datasets_relative.txt # Pretraining dataset list
βββ tests/ # Test scripts
β βββ test_installation.py # Installation verification script
βββ accelerate_configs/ # Accelerate configuration files
βββ environment.yml # Conda environment specification
βββ README.md # This file