LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

Jihwan Kim^1,2, Nikhil Parthasarathy¹, Danfeng Qin¹, Junhwa Hur¹, Deqing Sun¹, Bohyung Han^1,2, Ming-Hsuan Yang¹, Boqing Gong¹

¹Google DeepMind ²Seoul National University

TL;DR: We propose LiteFrame, a highly efficient video encoder for Video Large Language Models that unlocks scalable, long-form video understanding by resolving inefficiencies in both the LLM and the ViT.

🚧 Note: Code and model weights will be released soon.

1-Min Overview 🚀

LiteFrame.mp4

News 📰

[2026.05.18] Our paper, LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs, has been archived.

Citation 📃

If you find our work useful for your research, please consider citing:

@article{kim2026liteframe,
  title={LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs},
  author={Kim, Jihwan and Parthasarathy, Nikhil and Qin, Danfeng and Hur, Junhwa and Sun, Deqing and Han, Bohyung and Yang, Ming-Hsuan and Gong, Boqing},
  journal={arXiv preprint arXiv:2605.17260},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

1-Min Overview 🚀

News 📰

Citation 📃

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

1-Min Overview 🚀

News 📰

Citation 📃

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages