Skip to content
@mars2workshop

MARS2workshop

Cover

🌌 MARS2 @ ICCV 2025

Multimodal Reasoning and Slow Thinking in the Large Model Era: Towards System 2 and Beyond

📅 October 2025 | 📍 Honolulu, Hawaii | 📖 ICCV 2025 Workshop

🔥Official Report "MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook"


🔍 About MARS2

The era of Large Reasoning Models (LRMs) has begun, bringing new opportunities and challenges to the computer vision and multimodal AI community. While Large Language Models (LLMs) excel at semantic understanding, the next frontier is System-2 style slow thinking—reasoning beyond pattern recognition to multi-step, causal, and neuro-symbolic reasoning.

MARS2 (Multimodal Reasoning and Slow Thinking) is an official workshop of ICCV 2025. Our goal is to unite researchers from computer vision, multimodal learning, and reasoning to explore how AI systems can achieve flexible, robust, and interpretable reasoning.

🌟 Key Features

  • Workshop at ICCV 2025
    Hosted in Honolulu, Hawaii, featuring keynote talks from world-renowned researchers in AI and computer vision.

  • High-Stakes Reasoning Competition
    A large-scale challenge with a ¥100,000 (~$14,000) prize pool, testing MLLMs on a diverse set of reasoning-oriented benchmarks.

  • Open-Source Baselines & Experiments
    Provides official repositories with baseline implementations using state-of-the-art models like Qwen-VL, InternVL, and Ferret to facilitate research and participation.

  • Fostering Community Collaboration
    Bringing together experts to define the next frontier of AI by bridging computer vision, NLP, and System-2 reasoning.


🏆 The MARS2 Challenge

The workshop hosts the MARS2 Multimodal Reasoning Challenge, designed to push the boundaries of current MLLMs on complex reasoning tasks. The challenge is divided into distinct tracks, each with corresponding baseline models and official code repositories.

Track Task Description Core Models Repository
Track 1 Fine-Grained Referring & Grounding: Given an image and a textual query, the model must output the corresponding bounding box coordinates for the referred object. Ferret, Qwen2.5-VL, Groma MARS2_Track1_Ferret
MARS2_Track1_Qwen2.5-VL
MARS2_Track1_Groma
Track 2 VQA with Spatial Awareness: Evaluates a model's ability to reason about spatial relationships, relative positions, commonsense, and counterfactual scenarios. Qwen2.5-VL, InternVL3, Mllms_know MARS2_Track2_QwenVL
MARS2_Track2_internVL3
MARS2_Track2_Mllms_know

The challenge will test performance on benchmarks including LENS, AdsVQA, open-ended reasoning tasks, designed for System-2 evaluation.


💻 Official Repositories

This organization serves as the official hub for all workshop activities, competition tracks, and related experiments.

Repository Description
🌐 Homepage The official website for the MARS2 @ ICCV 2025 Workshop, including the schedule, CFP, organizers, and keynote speakers.
MARS2_Track1_Ferret Baseline implementation for Track 1, using the Ferret model for referring and grounding tasks.
MARS2_Track1_Qwen2.5-VL An alternative baseline for Track 1, using the Qwen2.5-VL model for referring and grounding tasks.
MARS2_Track1_Groma A batch inference pipeline using Groma, a grounded multimodal large language model (MLLM) with strong region understanding and visual grounding capabilities, for referring and grounding tasks.
MARS2_Track2_Qwen2.5-VL Baseline implementation for Track 2, using the Qwen-VL model for Visual Question Answering with Spatial Awareness.
MARS2_Track2_internVL3 An alternative baseline for Track 2, using the InternVL model for spatial reasoning tasks.
MARS2_Track2_Mllms_know An experimental implementation based on the ICLR 2025 paper for training-free perception of small visual details using Qwen2.5-VL.

⚠️ These model-specific folders provide experimental support for benchmarking and challenge tasks. They are intended as baselines and resources for participants.

🙏 Acknowledgement

We gratefully acknowledge the contributions of the following open-source projects, which form the foundation for our experimental extensions and benchmarking.

  • Qwen: A powerful series of large language and vision-language models by Alibaba Cloud.
  • InternVL: A foundational open-source vision-language model designed for advanced multimodal understanding.
  • Ferret: An MLLM capable of referring and grounding anything anywhere at any granularity.
  • Groma: An MLLM with exceptional region understanding and visual grounding capabilities.

👍 Citation

If you use MARS2 materials, benchmarks, or code in your research, please cite our workshop:

```
@inproceedings{xu2025mars2,
author    = {Xu, Peng and Xiong, Shengwu and Zhang, Jiajun and Chen, Yaxiong and Zhou, Bowen and Loy, Chen Change and Clifton, David and Lee, Kyoung Mu and Van Gool, Luc and others},
title     = {MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook},
booktitle = {ICCV Workshop},
year      = {2025}
}
```

```
@article{yao2025lens,
title={LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models},
author={Yao, Ruilin and Zhang, Bo and Huang, Jirui and Long, Xinwei and Zhang, Yifang and Zou, Tianyu and Wu, Yufei and Su, Shichao and Xu, Yifan and Zeng, Wenxi and Yang, Zhaoyu and Li, Guoyou and Zhang, Shilan and Li, Zichan and Chen, Yaxiong and Xiong, Shengwu and Xu, Peng and Zhang, Jiajun and Zhou, Bowen and Clifton, David and Van Gool, Luc},
journal={arXiv preprint arXiv:2505.15616},
year={2025}
}
```

```
@inproceedings{long2025adsqa,
author    = {Long, Xinwei and Tian, Kai and Xu, Peng and Jia, Guoli and Li, Jingxuan and Yang, Sa and Shao, Yihua and Zhang, Kaiyan and Jiang, Che and Xu, Hao and Liu, Yang and Ma, Jiaheng and Zhou, Bowen},
title     = {AdsQA: Towards Advertisement Video Understanding},
booktitle = {ICCV},
year      = {2025}
}
```

Popular repositories Loading

  1. MARS2_Track1_Ferret MARS2_Track1_Ferret Public

    Python 3 1

  2. MARS2_Track1_Groma MARS2_Track1_Groma Public

    Python 2

  3. MARS2_Track1_Qwen2.5-VL MARS2_Track1_Qwen2.5-VL Public

    A simple baseline with QwenVL-2.5

    Python 1 1

  4. iccv2025 iccv2025 Public

    HTML 1

  5. MARS2_Track2_Mllms_know MARS2_Track2_Mllms_know Public

    Jupyter Notebook 1

  6. MARS2_Track2_InternVL3 MARS2_Track2_InternVL3 Public

    Python 1

Repositories

Showing 10 of 13 repositories

Top languages

Loading…

Most used topics

Loading…