SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models (ICML'26 Spotlight)
Official implementation of "SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models".
Hyeonbeom Choi* | Daechul Ahn* | Youhan Lee | Taewook Kang | Seongwon Cho | Jonghyun Choi†
* Equal contribution † Corresponding author
TL;DR. VLA models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed—insufficient under perceptual ambiguity. We propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on self-uncertainty, inspired by Active Inference theory—requiring no additional training, no verifier, and only a single forward pass.
- Release the paper on arXiv
- Open the project page for SCALE!
- Release the code for SCALE
@article{choi2026scale,
title={SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models},
author={Hyeonbeom Choi and Daechul Ahn and Youhan Lee and Taewook Kang and Seongwon Cho and Jonghyun Choi},
journal={ICML},
year={2026}
}