Skip to content

[EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"

Notifications You must be signed in to change notification settings

steven-ccq/VisualReasoner

Repository files navigation

VisualReasoner

Official repository for the EMNLP 2024 paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"


⚙️ Setup

git clone https://github.com/steven-ccq/VisualReasoner.git
cd VisualReasoner

Environment

# Python 3.8
pip install -r requirements.txt

Grounding DINO

cd tools
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

Planner Model

Download the adapter.

Merge it with llava-1.5-7b-hf to obtain the Planner model.

Rename the Planner model as planner and move it into models/.

🚀 Inference

First, download the corresponding test sets as guided in the data/ directory.

To facilitate usage, we have provided scripts for each test task:

# TextVQA
bash textvqa.sh
# TallyQA
bash tallyqa.sh
# ST-VQA
bash stvqa.sh
# GQA
bash gqa.sh

The parameters used in the scripts are described in the table below:

Argument Description
input Path to the input file
output Path to the output file
vlm_module Path to the Answer model
src Path to the image folder
model Path to the Planner model
grounding_basedir Path to the Grounding DINO

🎯 Evaluation

# TextVQA
python eval/eval_textvqa.py --input=textvqa.json
# TallyQA
python eval/eval_tallyqa.py --input=tallyqa.json
# ST-VQA
https://rrc.cvc.uab.es/?ch=11
# GQA
python eval/eval_gqa.py --input=gqa.json

🎈 Data

We also provide a 1M dataset synthesized using the least-to-most method. You can access this dataset through 🤗VisualReasoner-1M. We also release a variant of this dataset, which contains 30k end-to-end reasoning processes. You can access this dataset through 🤗VisualReasoner-30k.

About

[EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published