Skip to content

HaozheQi/AdaptToken

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Haozhe Qi1,2*, Kevin Qu3, Mahdi Rad1, Rui Wang1, Alexander Mathis2, Marc Pollefeys1,3

1Microsoft Spatial AI Lab          2EPFL          3ETH Zurich

*work done during an internship at Microsoft

Project Page arXiv

AdaptToken is a training-free framework for long video understanding with MLLMs. It uses response entropy as a global uncertainty signal to allocate token budgets across video groups, together with cross-modal attention for intra-group token ranking. This enables both strong long-context performance and an efficient early-stopping variant (AdaptToken-Lite).

AdaptToken Figure

Code Release

We are currently preparing the codebase for release. Stay tuned.

Citation

If you find our work useful, please consider citing:

@misc{qi2026adapttoken,
      title={AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding}, 
      author={Haozhe Qi and Kevin Qu and Mahdi Rad and Rui Wang and Alexander Mathis and Marc Pollefeys},
      year={2026},
      eprint={2603.28696},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.28696}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors