VLAb: Your Laboratory for Pretraining VLAs

A streamlined library for pretraining vision-language-action (VLA) models on robotics datasets. Derived from LeRobot, this library focuses specifically on efficient pretraining workflows across multi-GPU setups and SLURM clusters and can be considered as an official reproduction kit for SmolVLA.

Overview

VLAb is designed for researchers who want to pretrain VLA models on HuggingFace datasets efficiently. It provides:

Pretraining-Focused Architecture: It includes built-in architecture and data-processing logic to let you iterate quickly on real-world datasets without environment setup overhead.
SmolVLA Reproduction: Official reproduction kit for SmolVLA pretraining—includes almost exact datasets, configurations, and workflows used to train the original model
Simple Setup with Reduced Dependencies: Single-command environment creation with conda env create -f environment.yml
Distributed Training: Multi-GPU and multi-node support via Accelerate, tested on single machines and SLURM clusters
Multi-Dataset Support: Train on multiple datasets simultaneously with configurable sampling strategies

Important: This library is optimized for pretraining. For fine-tuning and inference, we recommend using LeRobot with the latest updates. See the Migration to LeRobot section for checkpoint compatibility.

Installation

Step 1: Create Environment

conda env create -f environment.yml
conda activate vlab

Step 2: Set Python Path (IMPORTANT)

export PYTHONPATH="${PWD}/src:${PYTHONPATH}"

For persistence, add to your shell config:

echo 'export PYTHONPATH="${PWD}/src:${PYTHONPATH}"' >> ~/.bashrc
source ~/.bashrc

Step 3: Verify Installation

python tests/test_installation.py

Expected output:

============================================================
VLAb Installation Test
============================================================
✓ TrainPipelineConfig
✓ Policy factory
✓ Dataset factory
============================================================
✅ All tests passed!

Step 4: Configure HuggingFace (Optional)

Only needed if downloading datasets or models from the Hub:

huggingface-cli login

Quick Start

Train SmolVLA2 on 2 datasets with a single GPU:

accelerate launch --config_file accelerate_configs/single_gpu.yaml \
    src/lerobot/scripts/train.py \
    --policy.type=smolvla2 \
    --policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
    --dataset.repo_id="Beegbrain/pick_lemon_and_drop_in_bowl,Chojins/chess_game_000_white_red" \
    --dataset.video_backend=pyav \
    --output_dir="./outputs/training" \
    --batch_size=8 \
    --steps=10000 \
    --wandb.enable=true \
    --wandb.project="smolvla2-quickstart"

Note: --policy.repo_id specifies the base vision-language model (SmolVLM) to use. The trained model will be saved to --output_dir.

This will:

Download the specified 2 datasets from HuggingFace Hub
Train SmolVLA2 on a single GPU for 10,000 steps
Save checkpoints to ./outputs/training
Log metrics to Weights & Biases

Reproducing SmolVLA Training

If you want to use VLAb with the SmolVLA pretraining datasets and reproduce SmolVLA results, use the following community datasets:

SmolVLA Community Datasets

Community Dataset v1: 128 datasets from 55 contributors (11.1K episodes, 5.1M frames, 46.9 hours, 119.3 GB) — the curated subset used to pretrain SmolVLA with quality filtering and manual task description curation
Community Dataset v2: 340 datasets from 117 contributors (6.3K episodes, 5M frames, 46.6 hours, 59 GB) with LeRobot v2.0/v2.1 format support

Both datasets feature SO-100 robotic arm demonstrations focused on tabletop manipulation tasks, pick-and-place operations, and everyday object interactions. Community Dataset v1 represents a curated, high-quality subset with manually verified task descriptions, while v2 expands the collection with more contributors and datasets.

Dataset Structure: Both datasets use a hierarchical structure with contributor subdirectories:

community_dataset_v1/v2/        
├── contributor1/               
│   ├── dataset_name_1/    
│   │   ├── data/                 
│   │   ├── videos/             
│   │   └── meta/                 
│   └── dataset_name_2/           
├── contributor2/                  
│   └── dataset_name_3/           
└── ...

Downloading Datasets

Note that the downloading may take some time (3/4 hours), especially for the first dataset

# Download Community Dataset v1 (128 datasets, 11.1K episodes, 119.3 GB)
hf download HuggingFaceVLA/community_dataset_v1 \
       --repo-type=dataset \
       --local-dir /path/local_dir/community_dataset_v1

# Download Community Dataset v2 (340 datasets, 6.3K episodes, 59 GB)
hf download HuggingFaceVLA/community_dataset_v2 \
       --repo-type=dataset \
       --local-dir /path/local_dir/community_dataset_v2

Local Training with Accelerate

VLAb uses Accelerate for distributed training. Choose between provided configs or create your own:

# Configure your own (one-time setup)
accelerate config

# Or use provided configs
accelerate launch --config_file accelerate_configs/single_gpu.yaml ...
accelerate launch --config_file accelerate_configs/multi_gpu.yaml ...

Training from Local Datasets

If you've pre-downloaded datasets, specify the root directory:

# Train on locally downloaded datasets
accelerate launch --config_file accelerate_configs/multi_gpu.yaml \
    src/lerobot/scripts/train.py \
    --policy.type=smolvla2 \
    --policy.repo_id=HuggingFaceTB/SmolVLM2-500M-Video-Instruct \
    --dataset.repo_id="community_dataset_v2/airthebear/so100_GL,community_dataset_v2/acrampette/third_arm_01" \
    --dataset.root="/path/to/datasets" \
    --dataset.video_backend=pyav \
    --dataset.features_version=2 \
    --output_dir="./outputs/training" \
    --batch_size=8 \
    --steps=200000 \
    --wandb.enable=true \
    --wandb.project="smolvla2-training"

Important: When using --dataset.root, the --dataset.repo_id paths should be relative to the root directory. For example:

Root: /path/to/datasets
Repo ID: community_dataset_v1/user/dataset_name
Dataset location: /path/to/datasets/community_dataset_v1/user/dataset_name

SLURM Cluster Training

For distributed training on SLURM clusters, we provide example scripts:

examples/scripts/train_smolvla_optimized_fresh.slurm: Start training from scratch
examples/scripts/train_smolvla_resume.slurm: Resume from checkpoint

Usage:

Edit the script to match your cluster configuration (partitions, nodes, GPUs, etc.)
Submit the job:

sbatch examples/scripts/reproduce_smolvla.slurm

For detailed documentation on SLURM scripts, dataset configuration, and advanced options, see the Examples README.

Migration to LeRobot

Checkpoints from VLAb may not be directly compatible with the latest LeRobot version due to updated normalization formats. To use your pretrained models with LeRobot:

Convert Checkpoint: Use the migration script from LeRobot
Fine-tune: Follow the LeRobot fine-tuning guide
Inference: Use LeRobot's updated inference pipeline

Troubleshooting

Cache Issues

If you encounter corrupted files, outdated metadata, or persistent errors:

Manual Cache Cleanup

rm -rf ~/.cache/huggingface/datasets
rm -rf ~/.cache/huggingface/hub
rm -rf ~/.cache/huggingface/transformers

Automatic Cleanup in SLURM

In SLURM scripts, set CLEAN_CACHE=true to automatically clean cache before training.

Note: After cleaning cache, datasets will be re-downloaded on first use.

For additional help, open an issue on GitHub.

Additional Resources

LeRobot GitHub: Main LeRobot repository for fine-tuning and inference
SmolVLA Fine-tuning Guide: Complete guide for fine-tuning with LeRobot
LeRobot Installation: Detailed installation instructions
Accelerate Documentation: Distributed training configuration

Citation

If you use this library in your research, please cite:

@misc{aubakirova2025vlab,
  author = {Dana Aubakirova, Mustafa Shukor and Jade Cholgari and Leandro von Werra},
  title = {VLAb: Your Laboratory for Pretraining VLAs},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/vlab}}
}

And the SmolVLA paper:

@article{shukor2025smolvla,
  title   = {SmolVLA: A vision-language-action model for affordable and efficient robotics},
  author  = {Shukor, Mustafa and Aubakirova, Dana and Capuano, Francesco and Kooijmans, Pepijn and Palma, Steven and Zouitine, Adil and Aractingi, Michel and Pascal, Caroline and Russi, Martino and Marafioti, Andres and Alibert, Simon and Cord, Matthieu and Wolf, Thomas and Cadene, Remi},
  year    = {2025},
  journal = {arXiv preprint},
  eprint  = {2506.01844},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO}
}

Project Structure

VLAb/
├── src/lerobot/                  # Core library code
│   ├── configs/                  # Configuration classes
│   ├── datasets/                 # Dataset loading and processing
│   ├── optim/                    # Optimizers and schedulers
│   ├── policies/                 # Policy implementations
│   │   └── smolvla2/            # SmolVLA2 architecture
│   ├── scripts/                  # Training scripts
│   └── utils/                    # Utility functions
├── examples/                      # Example scripts and notebooks
│   ├── scripts/                   # SLURM training scripts
│   └── all_datasets_relative.txt  # Pretraining dataset list
├── tests/                         # Test scripts
│   └── test_installation.py       # Installation verification script
├── accelerate_configs/            # Accelerate configuration files
├── environment.yml                # Conda environment specification
└── README.md                     # This file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VLAb: Your Laboratory for Pretraining VLAs

Overview

Table of Contents

Installation

Step 1: Create Environment

Step 2: Set Python Path (IMPORTANT)

Step 3: Verify Installation

Step 4: Configure HuggingFace (Optional)

Quick Start

Reproducing SmolVLA Training

SmolVLA Community Datasets

Downloading Datasets

Local Training with Accelerate

Training from Local Datasets

SLURM Cluster Training

Migration to LeRobot

Troubleshooting

Cache Issues

Additional Resources

Citation

Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
accelerate_configs		accelerate_configs
examples		examples
src/lerobot		src/lerobot
tests		tests
.gitignore		.gitignore
README.md		README.md
VLAb.png		VLAb.png
environment.yml		environment.yml

huggingface/VLAb

Folders and files

Latest commit

History

Repository files navigation

VLAb: Your Laboratory for Pretraining VLAs

Overview

Table of Contents

Installation

Step 1: Create Environment

Step 2: Set Python Path (IMPORTANT)

Step 3: Verify Installation

Step 4: Configure HuggingFace (Optional)

Quick Start

Reproducing SmolVLA Training

SmolVLA Community Datasets

Downloading Datasets

Local Training with Accelerate

Training from Local Datasets

SLURM Cluster Training

Migration to LeRobot

Troubleshooting

Cache Issues

Additional Resources

Citation

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages