Skip to content

OuyangKun10/Conan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence

🔥 Updates

🚀 February 20, 2026 - Conan got accepted by CVPR 2026.

🚀 November 17, 2025 — We release the Conan-91k dataset!

🚀 October 25, 2025 — We release the training framework of Conan!

🚀 October 23, 2025 — We release the paper!

🚀 October 20, 2025 — We are excited to release Conan-7B and its accompanying evaluation toolkit, Conan-Eval!

🚀 September 30, 2025Conan-SFT-7B has officially landed on Hugging Face!

Introduction

Conan is an innovative Multimodal Large Language Model (MLLM) designed with advanced reasoning capabilities inspired by a detective's investigative process. It excels in:

  1. Identifying multi-scale frames of visual evidence.
  2. Reasoning over cross-frame clues to connect information.
  3. Deciding plausible actions based on its deductions.

🏆 Performance Highlights Here's a glimpse of Conan's impressive capabilities:

⚙️ Environment Setup

  • Clone the Repository:
git clone https://github.com/OuyangKun10/Conan.git
cd Conan
  • Create and Activate Environment:
conda create --name Conan python=3.10
conda activate Conan
  • Install Dependencies:
cd ms-swift
pip install -e .

🏋️ Training

cd ms-swift/training_scripts
  1. Texual Reasoning
bash Conan-SFT-Stage1.sh
  1. Multimodal alignment Reasoning
bash Conan-SFT-Stage2.sh
  1. Vision-centric Reasoning
bash Conan-SFT-Stage3.sh
  1. AIR RLVR
bash Conan-server.sh
bash Conan-AIR-RLVR.sh

📊 Evaluation

Conan-Eval toolkit allows for comprehensive evaluation across various benchmarks:

  1. Multi-step Reasoning Benchmarks:

  2. Long-video Understanding Benchmarks:

  3. Usage: To run the evaluation, simply execute:

    bash run.sh

Acknowledgments

We extend our sincere gratitude to the following projects for their invaluable contributions and inspiration:

About

Multi-step reasoning MLLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages