Skip to content

jjihwan/LiteFrame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

Jihwan Kim1,2, Nikhil Parthasarathy1, Danfeng Qin1, Junhwa Hur1, Deqing Sun1, Bohyung Han1,2, Ming-Hsuan Yang1, Boqing Gong1

1Google DeepMind ย ย ย ย ย ย  2Seoul National University

Google DeepMindย ย ย ย ย ย ย ย ย ย Seoul National University

ย ย ย 


TL;DR: We propose LiteFrame, a highly efficient video encoder for Video Large Language Models that unlocks scalable, long-form video understanding by resolving inefficiencies in both the LLM and the ViT.

๐Ÿšง Note: Code and model weights will be released soon.

1-Min Overview ๐Ÿš€

LiteFrame.mp4

News ๐Ÿ“ฐ

  • [2026.05.18] Our paper, LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs, has been archived.

Citation ๐Ÿ“ƒ

If you find our work useful for your research, please consider citing:

@article{kim2026liteframe,
  title={LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs},
  author={Kim, Jihwan and Parthasarathy, Nikhil and Qin, Danfeng and Hur, Junhwa and Sun, Deqing and Han, Bohyung and Yang, Ming-Hsuan and Gong, Boqing},
  journal={arXiv preprint arXiv:2605.17260},
  year={2026}
}

About

Official repository for LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors