This repository contains the official implementation of Rea²Seg, currently supporting two types of candidate mask generators for inference and evaluation:
- SESAME / LLaVA-v1.5-7B: trained from scratch based on see-say-segment/sesame.
- LENS / Qwen2.5-VL-3B: uses weights trained by hustvl/LENS, with facebook/sam2-hiera-large.
The mask evaluator is finetuned from OpenGVLab/InternVL3-8B.
- 📦 Checkpoints
- 📚 Benchmarks and Datasets
- 🛠️ Environments
- 🚀 Demo
- 📊 Evaluation
- 🙏 Acknowledgements
- 📖 Citation
Download the following checkpoints and place them at the target paths below.
| Component | Source | Target Path |
|---|---|---|
| SESAME generator | snowball521/Sesame_Generator | checkpoint/Sesame_Generator |
| CLIP vision tower (SESAME only) | openai/clip-vit-large-patch14-336 | checkpoint/clip-vit-large-patch14-336 |
| LENS CoT generator | OuyBin/LENS_ReasonSeg_CoT | checkpoint/LENS_ReasonSeg_CoT |
| SAM2 large (LENS only) | facebook/sam2-hiera-large | checkpoint/sam2-hiera-large |
| InternVL scorer | snowball521/Internvl3_Scorer_8B | checkpoint/Internvl3_Scorer_8B |
ReasonSeg-SGDR is a challenging benchmark for reasoning segmentation. It comprehensively evaluates a model’s perception, grounding, and reasoning abilities across multiple dimensions, including discriminative recognition, spatial reasoning, geometric reasoning, and multi-step reasoning, with fine-grained mask generation.
Download ReasonSeg-SGDR from snowball521/ReasonSeg-SGDR, then place it at:
dataset/reason_seg/ReasonSeg-SGDR
run preprocess_reasonseg_sgdr.py to extract it.
Download ReasonSeg from Google Drive, then place it at:
dataset/reason_seg/ReasonSeg
Rea²Seg-16K is a large-scale CoT-annotated training dataset for reasoning segmentation. It covers diverse reasoning types and mask granularities, spanning object-level to semantic-level.
Download Rea²Seg-16K from snowball521/Rea2Seg-16K, then place it at:
dataset/reason_seg/Rea2Seg-16K
run preprocess_rea2seg_16k.py to extract it.
Three conda environments are used:
rea2seg_sesame— SESAME mask generator and demo (Python 3.9).rea2seg_lens— LENS mask generator and demo (Python 3.10).internvl_scorer— InternVL mask scoring for evaluation (Python 3.10).
The demo scripts run entirely inside their respective generator environment.
The eval scripts automatically switch between the generator environment and internvl_scorer.
Adjust the PyTorch --index-url for your CUDA version if needed.
conda create -n rea2seg_sesame python=3.9 -y
conda activate rea2seg_sesame
pip install pybind11==2.11.1
pip install -r requirements_rea2seg_sesame.txtconda create -n rea2seg_lens python=3.10 -y
conda activate rea2seg_lens
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements_rea2seg_lens.txtconda create -n internvl_scorer python=3.10 -y
conda activate internvl_scorer
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements_internvl_scorer.txtconda activate rea2seg_sesame
python demo_rea2seg_sesame.py --save_logsconda activate rea2seg_lens
python demo_rea2seg_lens.py --save_logs--save_logs provides intermediate outputs for all candidate masks and visualizations of:
- Attention maps from prompt tokens to image tokens inside the mask generator;
- Visual feature similarity maps within SAM's ViT encoder.
Evaluation outputs are written to eval_log/ by default.
The evaluation covers:
- all ReasonSeg-SGDR categories:
discriminative,geometric,multi-step, andspatial; - ReasonSeg
testandvalsplits.
The script generates candidate masks in rea2seg_sesame and then switches to internvl_scorer for scoring:
bash eval_rea2seg_sesame.shThe script generates candidate masks in rea2seg_lens and then switches to internvl_scorer for scoring:
bash eval_rea2seg_lens.shThis project is based on the following open-source projects:
Feel free to open issues if you have any questions. If you find our project helpful, please give us a star and cite our work.
@misc{gao2026reasontwicesegmentationcandidate,
title={Reason Twice: Segmentation via Candidate Discovery and Comparative Reasoning},
author={Xinyan Gao and Haoran Hao and Xiangyu Yue},
year={2026},
eprint={2606.09303},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.09303},
}