🚀 February 20, 2026 - Conan got accepted by CVPR 2026.
🚀 November 17, 2025 — We release the Conan-91k dataset!
🚀 October 25, 2025 — We release the training framework of Conan!
🚀 October 23, 2025 — We release the paper!
🚀 October 20, 2025 — We are excited to release Conan-7B and its accompanying evaluation toolkit, Conan-Eval!
🚀 September 30, 2025 — Conan-SFT-7B has officially landed on Hugging Face!
Conan is an innovative Multimodal Large Language Model (MLLM) designed with advanced reasoning capabilities inspired by a detective's investigative process. It excels in:
- Identifying multi-scale frames of visual evidence.
- Reasoning over cross-frame clues to connect information.
- Deciding plausible actions based on its deductions.
🏆 Performance Highlights
Here's a glimpse of Conan's impressive capabilities:

- Clone the Repository:
git clone https://github.com/OuyangKun10/Conan.git
cd Conan
- Create and Activate Environment:
conda create --name Conan python=3.10
conda activate Conan
- Install Dependencies:
cd ms-swift
pip install -e .
cd ms-swift/training_scripts
- Texual Reasoning
bash Conan-SFT-Stage1.sh
- Multimodal alignment Reasoning
bash Conan-SFT-Stage2.sh
- Vision-centric Reasoning
bash Conan-SFT-Stage3.sh
- AIR RLVR
bash Conan-server.sh
bash Conan-AIR-RLVR.sh
Conan-Eval toolkit allows for comprehensive evaluation across various benchmarks:
-
Multi-step Reasoning Benchmarks:
-
Long-video Understanding Benchmarks:
-
Usage: To run the evaluation, simply execute:
bash run.sh
We extend our sincere gratitude to the following projects for their invaluable contributions and inspiration:
