Skip to content

huggingface/VLAb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Logo

VLAb: Your Laboratory for Pretraining VLAs

Pretraining-Focused SmolVLA Reproduction Multi-Dataset Support Multi-GPU

A streamlined library for pretraining vision-language-action (VLA) models on robotics datasets. Derived from LeRobot, this library focuses specifically on efficient pretraining workflows across multi-GPU setups and SLURM clusters and can be considered as an official reproduction kit for SmolVLA.

Overview

VLAb is designed for researchers who want to pretrain VLA models on HuggingFace datasets efficiently. It provides:

  • Pretraining-Focused Architecture: It includes built-in architecture and data-processing logic to let you iterate quickly on real-world datasets without environment setup overhead.
  • SmolVLA Reproduction: Official reproduction kit for SmolVLA pretrainingβ€”includes almost exact datasets, configurations, and workflows used to train the original model
  • Simple Setup with Reduced Dependencies: Single-command environment creation with conda env create -f environment.yml
  • Distributed Training: Multi-GPU and multi-node support via Accelerate, tested on single machines and SLURM clusters
  • Multi-Dataset Support: Train on multiple datasets simultaneously with configurable sampling strategies

Important: This library is optimized for pretraining. For fine-tuning and inference, we recommend using LeRobot with the latest updates. See the Migration to LeRobot section for checkpoint compatibility.


Table of Contents


Installation

Step 1: Create Environment

conda env create -f environment.yml
conda activate vlab

Step 2: Set Python Path (IMPORTANT)

export PYTHONPATH="${PWD}/src:${PYTHONPATH}"

For persistence, add to your shell config:

echo 'export PYTHONPATH="${PWD}/src:${PYTHONPATH}"' >> ~/.bashrc
source ~/.bashrc

Step 3: Verify Installation

python tests/test_installation.py

Expected output:

============================================================
VLAb Installation Test
============================================================
βœ“ TrainPipelineConfig
βœ“ Policy factory
βœ“ Dataset factory
============================================================
βœ… All tests passed!

Step 4: Configure HuggingFace (Optional)

Only needed if downloading datasets or models from the Hub:

huggingface-cli login

Quick Start

Train SmolVLA2 on 2 datasets with a single GPU:

accelerate launch --config_file accelerate_configs/single_gpu.yaml \
    src/lerobot/scripts/train.py \
    --policy.type=smolvla2 \
    --policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
    --dataset.repo_id="Beegbrain/pick_lemon_and_drop_in_bowl,Chojins/chess_game_000_white_red" \
    --dataset.video_backend=pyav \
    --output_dir="./outputs/training" \
    --batch_size=8 \
    --steps=10000 \
    --wandb.enable=true \
    --wandb.project="smolvla2-quickstart"

Note: --policy.repo_id specifies the base vision-language model (SmolVLM) to use. The trained model will be saved to --output_dir.

This will:

  1. Download the specified 2 datasets from HuggingFace Hub
  2. Train SmolVLA2 on a single GPU for 10,000 steps
  3. Save checkpoints to ./outputs/training
  4. Log metrics to Weights & Biases

Reproducing SmolVLA Training

If you want to use VLAb with the SmolVLA pretraining datasets and reproduce SmolVLA results, use the following community datasets:

SmolVLA Community Datasets

  • Community Dataset v1: 128 datasets from 55 contributors (11.1K episodes, 5.1M frames, 46.9 hours, 119.3 GB) β€” the curated subset used to pretrain SmolVLA with quality filtering and manual task description curation
  • Community Dataset v2: 340 datasets from 117 contributors (6.3K episodes, 5M frames, 46.6 hours, 59 GB) with LeRobot v2.0/v2.1 format support

Both datasets feature SO-100 robotic arm demonstrations focused on tabletop manipulation tasks, pick-and-place operations, and everyday object interactions. Community Dataset v1 represents a curated, high-quality subset with manually verified task descriptions, while v2 expands the collection with more contributors and datasets.

Dataset Structure: Both datasets use a hierarchical structure with contributor subdirectories:

community_dataset_v1/v2/        
β”œβ”€β”€ contributor1/               
β”‚   β”œβ”€β”€ dataset_name_1/    
β”‚   β”‚   β”œβ”€β”€ data/                 
β”‚   β”‚   β”œβ”€β”€ videos/             
β”‚   β”‚   └── meta/                 
β”‚   └── dataset_name_2/           
β”œβ”€β”€ contributor2/                  
β”‚   └── dataset_name_3/           
└── ...                            

Downloading Datasets

Note that the downloading may take some time (3/4 hours), especially for the first dataset

# Download Community Dataset v1 (128 datasets, 11.1K episodes, 119.3 GB)
hf download HuggingFaceVLA/community_dataset_v1 \
       --repo-type=dataset \
       --local-dir /path/local_dir/community_dataset_v1

# Download Community Dataset v2 (340 datasets, 6.3K episodes, 59 GB)
hf download HuggingFaceVLA/community_dataset_v2 \
       --repo-type=dataset \
       --local-dir /path/local_dir/community_dataset_v2

Local Training with Accelerate

VLAb uses Accelerate for distributed training. Choose between provided configs or create your own:

# Configure your own (one-time setup)
accelerate config

# Or use provided configs
accelerate launch --config_file accelerate_configs/single_gpu.yaml ...
accelerate launch --config_file accelerate_configs/multi_gpu.yaml ...

Training from Local Datasets

If you've pre-downloaded datasets, specify the root directory:

# Train on locally downloaded datasets
accelerate launch --config_file accelerate_configs/multi_gpu.yaml \
    src/lerobot/scripts/train.py \
    --policy.type=smolvla2 \
    --policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
    --dataset.repo_id="community_dataset_v2/airthebear/so100_GL,community_dataset_v2/acrampette/third_arm_01" \
    --dataset.root="/path/to/datasets" \
    --dataset.video_backend=pyav \
    --dataset.features_version=2 \
    --output_dir="./outputs/training" \
    --batch_size=8 \
    --steps=200000 \
    --wandb.enable=true \
    --wandb.project="smolvla2-training"

Important: When using --dataset.root, the --dataset.repo_id paths should be relative to the root directory. For example:

  • Root: /path/to/datasets
  • Repo ID: community_dataset_v1/user/dataset_name
  • Dataset location: /path/to/datasets/community_dataset_v1/user/dataset_name

SLURM Cluster Training

For distributed training on SLURM clusters, we provide example scripts:

  • examples/scripts/train_smolvla_optimized_fresh.slurm: Start training from scratch
  • examples/scripts/train_smolvla_resume.slurm: Resume from checkpoint

Usage:

  1. Edit the script to match your cluster configuration (partitions, nodes, GPUs, etc.)
  2. Submit the job:
sbatch examples/scripts/reproduce_smolvla.slurm

For detailed documentation on SLURM scripts, dataset configuration, and advanced options, see the Examples README.

Migration to LeRobot

Checkpoints from VLAb may not be directly compatible with the latest LeRobot version due to updated normalization formats. To use your pretrained models with LeRobot:

  1. Convert Checkpoint: Use the migration script from LeRobot
  2. Fine-tune: Follow the LeRobot fine-tuning guide
  3. Inference: Use LeRobot's updated inference pipeline

Troubleshooting

Cache Issues

If you encounter corrupted files, outdated metadata, or persistent errors:

Manual Cache Cleanup

rm -rf ~/.cache/huggingface/datasets
rm -rf ~/.cache/huggingface/hub
rm -rf ~/.cache/huggingface/transformers

Automatic Cleanup in SLURM

In SLURM scripts, set CLEAN_CACHE=true to automatically clean cache before training.

Note: After cleaning cache, datasets will be re-downloaded on first use.

For additional help, open an issue on GitHub.

Additional Resources

Citation

If you use this library in your research, please cite:

@misc{aubakirova2025vlab,
  author = {Dana Aubakirova, Mustafa Shukor and Jade Cholgari and Leandro von Werra},
  title = {VLAb: Your Laboratory for Pretraining VLAs},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/vlab}}
}

And the SmolVLA paper:

@article{shukor2025smolvla,
  title   = {SmolVLA: A vision-language-action model for affordable and efficient robotics},
  author  = {Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi},
  year    = {2025},
  journal = {arXiv preprint},
  eprint  = {2506.01844},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO}
}

Project Structure

VLAb/
β”œβ”€β”€ src/lerobot/                  # Core library code
β”‚   β”œβ”€β”€ configs/                  # Configuration classes
β”‚   β”œβ”€β”€ datasets/                 # Dataset loading and processing
β”‚   β”œβ”€β”€ optim/                    # Optimizers and schedulers
β”‚   β”œβ”€β”€ policies/                 # Policy implementations
β”‚   β”‚   └── smolvla2/            # SmolVLA2 architecture
β”‚   β”œβ”€β”€ scripts/                  # Training scripts
β”‚   └── utils/                    # Utility functions
β”œβ”€β”€ examples/                      # Example scripts and notebooks
β”‚   β”œβ”€β”€ scripts/                   # SLURM training scripts
β”‚   └── all_datasets_relative.txt  # Pretraining dataset list
β”œβ”€β”€ tests/                         # Test scripts
β”‚   └── test_installation.py       # Installation verification script
β”œβ”€β”€ accelerate_configs/            # Accelerate configuration files
β”œβ”€β”€ environment.yml                # Conda environment specification
└── README.md                     # This file

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages