Direct spatial constraints for Vision-Language-Action models via sparse physical anchors
- Explore an alternative to common visual-style spatial guidance (e.g., predicting future images/videos) using text-style physical anchors
- Predict sparse end-effector Δ-positions
- Use them to impose an explicit corridor constraint on action generation
- Achieves 83.21% success rate on LIBERO-Plus
- Spatial guidance is encoded as visual-style tokens or latent features
- Action generation is influenced indirectly through the backbone features
- Predict compact physical quantities (spatial anchors)
- Apply them as direct constraints in the loss
- No need for heavy visual intermediate representations
(1) Sparse Anchor Prediction
- Predict
$K$ future Δ-position anchors - Represent trajectory structure in a compact form
(2) Action Augmentation
- Concatenate state-related physical quantities (e.g., Δ-positions) to the action vector
- Enable joint prediction of state and action, providing implicit alignment between state space and action space
(3) Corridor Loss
- Defines a tolerance region over the predicted trajectory
- Penalizes deviations outside the region while allowing smooth convergence within it
👉 Behaves like a structured smooth-L1 in trajectory space
| Variant | Description | AVG |
|---|---|---|
| base | 75.23 | |
| c1 | query=3 | 77.25 |
| c2 | + extra data | 77.25 |
| c3 | + Δpos anchors | 79.21 |
| c4 | + corridor loss (CorridorVLA) | 83.21 |
📈 Improvement:
- +7.98% over baselines
- Largest gain from explicit spatial constraint
-
Built on StarVLA@e1e6457c6cd124248f5ce7b2d3d40fb74f48c6fc
-
Minimal changes:
-
- few prediction slots
-
- loss terms
-
-
No heavy architecture redesign
👉 https://github.com/lidc54/corridorVLA
# clone repo
git clone https://github.com/lidc54/corridorVLA
cd corridorVLA-
Spatial guidance can be:
- explicit (loss-level) instead of implicit (feature-level)
-
Physical quantities are:
- more action-aligned
- more interpretable
-
Simple constraints can:
- significantly improve stability
- reduce unstructured exploration
- Built upon StarVLA
- Inspired by recent advances in VLA and generative action models
- [2026/04/09] 🌟 Huge thanks to Awesome-Latent-Space for including our work in their survey
“The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook”
If you find this project useful, consider giving it a ⭐ on GitHub!
If you find this work useful, please cite:
@article{corridorvla2025,
title={CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors},
author={Dachong Li and ZhuangZhuang Chen and Jin Zhang and Jianqiang Li},
year={2026},
eprint={2604.21241},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2604.21241},
}
