Skip to content

[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"

License

Notifications You must be signed in to change notification settings

lcqysl/FrameThinker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FrameThinker

This is the official repository for the core code of the paper: FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting.

📖 About The Project

FrameThinker is a novel framework for long-video reasoning that challenges the inefficient, passive methods of traditional models. Instead of processing a fixed set of pre-sampled frames, FrameThinker actively interrogates video content through a multi-turn, iterative process. It intelligently spotlights relevant frame sequences to gather evidence, guided by a Cognitive Consistency Verification (CCV) module that ensures its reasoning is logical and interpretable. Across six challenging benchmarks, FrameThinker achieves an average +10.4% accuracy improvement over the baseline. As a highlight, it surpasses the strong LongVILA-R1 to set a new state-of-the-art on the LongVideo-Reason benchmark, using just 20.6 frames on average.

FrameThinker Framework
An illustration of the FrameThinker framework.

FrameThinker Framework
An example of FrameThinker solving a reasoning task.

FrameThinker Framework
An example of a multi-step reasoning process.

🚀 Getting Started

Prerequisites

  • Python==3.10
  • vllm==0.9.1
  • transformers==4.52.4
  • Other dependencies listed in requirements.txt

💾 Data

Our training data is sourced from the following open-source projects:

LongVideoReason, Video-R1, Video-Holmes, CG-Bench

⚙️ Training

bash examples/agent/train_frame_thinker.sh

🤝 Merge Checkpoints

python merge_script.py \
    --backend fsdp \
    --hf_model_path /path/to/original/hf-model \
    --local_dir /path/to/your/checkpoints \
    --target_dir /path/to/save/merged_hf_model

🚀 Inference & Evaluation

python examples/agent/infer.py 

🙏 Acknowledgements

We would like to express our sincere gratitude to the open-source community and the creators of the foundational projects that made this work possible.

Our implementation is built upon the excellent codebases of verl and DeepEyes. Their work provided a strong foundation and significantly accelerated our research. We highly recommend checking out their projects.

About

[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors