VisualReasoner

Official repository for the EMNLP 2024 paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"

⚙️ Setup

git clone https://github.com/steven-ccq/VisualReasoner.git
cd VisualReasoner

Environment

# Python 3.8
pip install -r requirements.txt

Grounding DINO

cd tools
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e .
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

Planner Model

Download the adapter.

Merge it with llava-1.5-7b-hf to obtain the Planner model.

Rename the Planner model as planner and move it into models/.

🚀 Inference

First, download the corresponding test sets as guided in the data/ directory.

To facilitate usage, we have provided scripts for each test task:

# TextVQA
bash textvqa.sh
# TallyQA
bash tallyqa.sh
# ST-VQA
bash stvqa.sh
# GQA
bash gqa.sh

The parameters used in the scripts are described in the table below:

Argument	Description
`input`	Path to the input file
`output`	Path to the output file
`vlm_module`	Path to the Answer model
`src`	Path to the image folder
`model`	Path to the Planner model
`grounding_basedir`	Path to the Grounding DINO

🎯 Evaluation

# TextVQA
python eval/eval_textvqa.py --input=textvqa.json
# TallyQA
python eval/eval_tallyqa.py --input=tallyqa.json
# ST-VQA
https://rrc.cvc.uab.es/?ch=11
# GQA
python eval/eval_gqa.py --input=gqa.json

🎈 Data

We also provide a 1M dataset synthesized using the least-to-most method. You can access this dataset through 🤗VisualReasoner-1M. We also release a variant of this dataset, which contains 30k end-to-end reasoning processes. You can access this dataset through 🤗VisualReasoner-30k.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisualReasoner

⚙️ Setup

Environment

Grounding DINO

Planner Model

🚀 Inference

🎯 Evaluation

🎈 Data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
eval		eval
imgs		imgs
.DS_Store		.DS_Store
README.md		README.md
gqa.py		gqa.py
gqa.sh		gqa.sh
requirements.txt		requirements.txt
stvqa.py		stvqa.py
stvqa.sh		stvqa.sh
tallyqa.py		tallyqa.py
tallyqa.sh		tallyqa.sh
textvqa.py		textvqa.py
textvqa.sh		textvqa.sh
utils.py		utils.py

steven-ccq/VisualReasoner

Folders and files

Latest commit

History

Repository files navigation

VisualReasoner

⚙️ Setup

Environment

Grounding DINO

Planner Model

🚀 Inference

🎯 Evaluation

🎈 Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages