Skip to content

Gimlettt/MARC

Repository files navigation

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

Paper Paper Web License

Official implementation of MARC (Memory-Augmented RL Token Compression), accepted at ICLR 2026.

🔥 News

  • [2026/02/02] Preliminary code release including training and inference scripts
  • [2026/01/22] Our paper is accepted at ICLR 2026!

Note: Training data and VMR code will be released in the future.

Overview

MARC is a novel framework for efficient video understanding that combines:

  • Visual Memory Retriever (VMR): Segments videos into event-level fragments and retrieves query-relevant clips
  • Compression Group Relative Policy Optimization (C-GRPO): An RL-based distillation strategy that compresses video tokens while preserving reasoning ability

Key Results

  • 95% reduction in visual tokens (64 frames → 1 frame equivalent)
  • 72% reduction in GPU memory usage
  • 23.9% reduction in generation latency
  • Nearly identical performance to 64-frame baseline (42.20 vs 42.21 mean accuracy)

📐 Setup

git clone https://github.com/Gimlettt/MARC
cd MARC

# Create and activate conda environment
conda create -n marc python=3.11
conda activate marc

# Install base dependencies
bash setup.sh

# Install additional required packages
pip install wandb==0.18.3
pip install tensorboardx
pip install qwen_vl_utils torchvision
pip install flash-attn --no-build-isolation
pip install nltk
pip install rouge_score
pip install deepspeed

Replace Transformers Source Files

After installing transformers, you need to replace two files in your transformers installation with the modified versions that enable compression:

  1. Replace <TRANSFORMERS_PATH>/models/qwen2_5_vl/modeling_qwen2_5_vl.py with qwen2_5_vl/modeling_qwen2_5_vl(compress).py
  2. Replace <TRANSFORMERS_PATH>/models/qwen2_5_vl/processing_qwen2_5_vl.py with qwen2_5_vl/processing_qwen2_5_vl(compress).py

You can find your transformers installation path by running:

python -c "import transformers; import os; print(os.path.dirname(transformers.__file__))"

🔮 Inference

For a complete inference example, see inference_script/inference_example.py.

Benchmark Evaluation

To evaluate on benchmarks, use:

bash inference_script/eval_bench.sh

🚀 Training

C-GRPO Training

To train with Compression Group Relative Policy Optimization:

bash training_script/run_grpo_video.sh

Training script: training_script/grpo.py

Supervised Fine-Tuning (SFT)

For comparison with standard SFT:

bash training_script/run_sft_video.sh

Training script: training_script/sft_video.py

Results

Performance Comparison

Method VSI VideoMMMU MMVU MVBench TempCompass VideoMME Mean
Qwen2.5-VL-3B (64f) 32.93 35.33 48.64 44.77 38.05 53.55 42.21
MARC-3B (1f) 27.55 33.11 51.99 45.82 55.34 39.44 42.20

Efficiency Improvements

  • Visual Tokens: 2589.93 → 122.69 (95% reduction)
  • GPU Memory: 41.6 GB → 11.5 GB (72% reduction)
  • Generation Latency: 0.46s → 0.35s (23.9% reduction)

Training Data

We use a subset of the Video-R1-260K dataset:

  • 5K samples for C-GRPO training
  • Includes both video and image data
  • Covers multiple domains: Knowledge, Math, Chart, Spatial, OCR, General reasoning
  • See training data distribution in the paper

Note: Training data will be released in the future.

Citation

If you find MARC useful for your research, please cite:

@article{wu2025marc,
  title={MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding},
  author={Wu, Peiran and Yu, Zhuorui and Liu, Yunze and Wu, Chi-Hao and Zhou, Enmin and Shen, Junxiao},
  journal={arXiv preprint arXiv:2510.07915},
  year={2025}
}

Acknowledgments

This project builds upon:

  • Video-R1 for the base training framework
  • Qwen2.5-VL for the base vision-language model
  • TRL for the GRPO implementation

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

For questions and feedback, please open an issue on GitHub.

About

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors