GitHub - HaozheQi/AdaptToken

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Haozhe Qi^1,2*, Kevin Qu³, Mahdi Rad¹, Rui Wang¹, Alexander Mathis², Marc Pollefeys^1,3

¹Microsoft Spatial AI Lab ²EPFL ³ETH Zurich

*work done during an internship at Microsoft

AdaptToken is a training-free framework for long video understanding with MLLMs. It uses response entropy as a global uncertainty signal to allocate token budgets across video groups, together with cross-modal attention for intra-group token ranking. This enables both strong long-context performance and an efficient early-stopping variant (AdaptToken-Lite).

Code Release

We are currently preparing the codebase for release. Stay tuned.

Citation

If you find our work useful, please consider citing:

@misc{qi2026adapttoken,
      title={AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding}, 
      author={Haozhe Qi and Kevin Qu and Mahdi Rad and Rui Wang and Alexander Mathis and Marc Pollefeys},
      year={2026},
      eprint={2603.28696},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.28696}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Code Release

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Code Release

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages