Skip to content

lidc54/corridorVLA

Repository files navigation

🌌 CorridorVLA

Direct spatial constraints for Vision-Language-Action models via sparse physical anchors

arXiv
Code
License


🔍 TL;DR

  • Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using text-style physical anchors
  • Predict sparse end-effector Δ-positions
  • Use them to impose an explicit corridor constraint on action generation
  • Achieves 83.21% success rate on LIBERO-Plus

🧠 Motivation

Existing VLA paradigm

  • Spatial guidance is encoded as visual-style tokens or latent features
  • Action generation is influenced indirectly through the backbone features

CorridorVLA

  • Predict compact physical quantities (spatial anchors)
  • Apply them as direct constraints in the loss
  • No need for heavy visual intermediate representations

🏗️ Method Overview

Key components

(1) Sparse Anchor Prediction

  • Predict $K$ future Δ-position anchors
  • Represent trajectory structure in a compact form

(2) Action Augmentation

  • Concatenate state-related physical quantities (e.g., Δ-positions) to the action vector
  • Enable joint prediction of state and action, providing implicit alignment between state space and action space

(3) Corridor Loss

  • Defines a tolerance region over the predicted trajectory
  • Penalizes deviations outside the region while allowing smooth convergence within it

👉 Behaves like a structured smooth-L1 in trajectory space


📊 Results

LIBERO-Plus (GR00T-based)

Variant Description AVG
base 75.23
c1 query=3 77.25
c2 + extra data 77.25
c3 + Δpos anchors 79.21
c4 + corridor loss (CorridorVLA) 83.21

📈 Improvement:

  • +7.98% over baselines
  • Largest gain from explicit spatial constraint

⚙️ Implementation

  • Built on StarVLA@e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc

  • Minimal changes:

      • few prediction slots
      • loss terms
  • No heavy architecture redesign


📦 Code

👉 https://github.com/lidc54/corridorVLA

# clone repo
git clone https://github.com/lidc54/corridorVLA
cd corridorVLA

📌 Key Insights

  • Spatial guidance can be:

    • explicit (loss-level) instead of implicit (feature-level)
  • Physical quantities are:

    • more action-aligned
    • more interpretable
  • Simple constraints can:

    • significantly improve stability
    • reduce unstructured exploration

🙏 Acknowledgements

  • Built upon StarVLA
  • Inspired by recent advances in VLA and generative action models
  • [2026/04/09] 🌟 Huge thanks to Awesome-Latent-Space for including our work in their survey
    “The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook”

⭐ Star Us!

If you find this project useful, consider giving it a ⭐ on GitHub!


📖 Citation

If you find this work useful, please cite:

@article{corridorvla2025,
  title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
  author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
  year={2026},
  eprint={2604.21241},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2604.21241}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors