Nithin Sivakumaran | Justin Chih-Yao Chen | David Wan | Yue Zhang | Jaehong Yoon | Elias Stengel-Eskin | Mohit Bansal
Figure: Previous work has explored using (A) multiple agents in debate to refine their reasoning, but this approach is limited to the abilities of the agents. Alternatively, some methods employ a (B) top‑down LLM agent that invokes vision tools, yet they plan tool usage based solely on the question and overlook the visual information itself.
In our method (C), we facilitate a discussion among multiple agents with targeted intervention from a pool of vision tools. These tools address disagreements detected in a debate of VLM agents, with their specialized vision outputs and agreement scores being used for future discussion.
We create an environment with Python 3.9 and install the required packages.
conda create --name dart python=3.9
conda activate dart
pip install -r requirements.txt
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python==0.2.45 --force-reinstall --no-cache-dir
pip install flash-attn --no-build-isolationNaturalBench and MMMU are automatically downloaded through the HuggingFace datasets package in dataset.py.
To set up A-OKVQA:
- Download the compressed annotation file: https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz
- Download MSCOCO 2017 validation set images: https://cocodataset.org/#download
- We provide an image ID to file name mapping to download : https://drive.google.com/file/d/1f2mXf06iMoUVDHIr3BWtxFY_L113WLHj/view?usp=sharing
- Update the corresponding file paths in the
aokvqa()function indataset.py.
To start an evaluation, run dart.py.
python dart.py \
--cfg configs/default.yaml \
--exp_name {exp_name} \
--output_file {output_file}.json--cfg(str, default:configs/default.yaml): Path to configuration file inconfigs/directory.--output_file(str, default:default.json): Name of the final JSON results file.--exp_name(str, default:aokvqa): Experiment name. Used only to organize results:- Outputs are saved to
results/<exp_name>/<timestamp>/.
- Outputs are saved to
--resume(default: 0): Index to resume from (0-based)
@article{sivaku2025dart,
title={DART: Leveraging Multi-Agent Disagreement forTool Recruitment in Multimodal Reasoning},
author={Nithin Sivakumaran and Justin Chih-Yao Chen and David Wan and Yue Zhang and Jaehong Yoon and Elias Stengel-Eskin and Mohit Bansal},
journal={arXiv preprint arXiv:2512.07132},
year={2025}
}