Detect Every Thing with Few Examples

Update: This paper is resubmitted from ICLR2024 to another conference. I improved this work's presentation in the new draft and simplified the implementation. I will update the code later.

We present DE-ViT, an open-set object detector in this repository. In contrast to the popular open-vocabulary approach, we follow the few-shot formulation to represent each category with few support images rather than language. Our results shows potential for using images as category representation. DE-ViT establishes new state-of-the-art on open-vocabulary, few-shot, and one-shot object detection benchmark with COCO and LVIS.

Installation

git clone https://github.com/mlzxy/devit.git
conda create -n devit  python=3.9 
conda activate devit
pip install -r devit/requirements.txt
pip install -e ./devit

Next, check Downloads.md for instructions to setup datasets and model checkpoints.

Running Scripts

Download datasets and checkpoints before running scripts.

Demo

python3 ./demo/demo.py # will generate demo/output/ycb.out.jpg

The notebook demo/build_prototypes.ipynb builds prototypes for YCB objects using ViT-L/14 and our provided example images.

Training

vit=l task=ovd dataset=coco bash scripts/train.sh  # train open-vocabulary COCO with ViT-L

# task=ovd / fsod / osod
# dataset=coco / lvis
# vit=s / b / l

# few-shot env var `shot = 5 / 10 / 30`
vit=l task=fsod shot=10 bash scripts/train.sh 

# one-shot env var `split = 1 / 2 / 3 / 4`
vit=l task=osod split=1 bash script/train.sh

# detectron2 options can be provided through args, e.g.,
task=ovd dataset=lvis bash scripts/train.sh MODEL.MASK_ON True # train lvis with mask head

# another env var is `num_gpus = 1 / 2 ...`, used to control
# how many gpus are used

Evaluation

All evaluations can be run without training, as long as the checkpoints are downloaded.

The script-level environment variables are the same to training.

vit=l task=ovd dataset=coco bash scripts/eval.sh # evaluate COCO OVD with ViT-L/14

vit=l task=ovd dataset=lvis bash scripts/eval.sh DE.TOPK 3  MODEL.MASK_ON True  # evaluate LVIS OVD with ViT-L/14

RPN Training (COCO)

bash scripts/train_rpn.sh  ARG
# change ARG to ovd / os1 / os2 / os3 / os4 / fs14
# corresponds to open-vocabulary / one-shot splits 1-4 / few-shot

Check Tools.md for intructions to build prototype and prepare weights.

Acknowledgement

This repository was built on top of RegionCLIP and DINOv2. We thank the effort from our community.

Citation

@misc{zhang2023detect,
      title={Detect Every Thing with Few Examples}, 
      author={Xinyu Zhang and Yuting Wang and Abdeslam Boularias},
      year={2023},
      eprint={2309.12969},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
configs		configs
demo		demo
detectron2		detectron2
images		images
lib		lib
scripts		scripts
tools		tools
.gitignore		.gitignore
Downloads.md		Downloads.md
LICENSE		LICENSE
README.md		README.md
Tools.md		Tools.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

mlzxy/devit

Folders and files

Latest commit

History

Repository files navigation

Detect Every Thing with Few Examples

Installation

Running Scripts

Demo

Training

Evaluation

RPN Training (COCO)

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Languages