Skip to content

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Notifications You must be signed in to change notification settings

FoundationVision/GenerateU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generative Region-Language Pretraining for Open-Ended Object Detection

Monash University   ByteDance Inc. 
CVPR 2024

⭐ If GenerateU is helpful to your projects, please help star this repo. Thanks! 🤗


Highlight

  • GenerateU is accepted by CVPR2024.
  • We introduce generative open-ended object detection, which is a more general and practical setting where categorical information is not explicitly defined. Such a setting is especially meaningful for scenarios where users lack precise knowledge of object cate- gories during inference.
  • Our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference.

Results

Zero-shot domain transfer to LVIS

pseudo-label_examples

Visualizations

👨🏻‍🎨 Pseudo-label Examples

pseudo-label_examples

🎨 Zero-shot LVIS

pseudo-label_examples

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone https://github.com/clin1223/GenerateU.git
  2. Create Conda Environment and Install Dependencies

    # create new anaconda env
    conda create -n GenerateU python=3.8 -y
    conda activate GenerateU
    
    # install python dependencies
    pip3 install -e . --user
    pip3 install -r requirements.txt 
    
    # compile Deformable DETR
    cd projects/DDETRS/ddetrs/models/deformable_detr/ops
    bash make.sh
    • CUDA >= 11.3
    • PyTorch >= 1.10.0
    • Torchvision >= 0.11.1
    • Other required packages in requirements.txt

Get Started

Prepare pretrained models

Download our pretrained models from here to the weights folder. For training, prepare the backbone weight Swin-Tiny and Swin-Large following instruction in tools/convert-pretrained-swin-model-to-d2.py

The directory structure will be arranged as:

weights
   |- vg_swinT.pth
   |- vg_swinL.pth
   |- vg_grit5m_swinT.pth
   |- vg_grit5m_swinL.pth
   |- swin_tiny_patch4_window7_224.pkl
   |- swin_large_patch4_window12_384_22k.pkl

Dataset preparation

VG Dataset

LVIS Dataset

(Optional) GrIT-20M Dataset

Dataset strcture should look like:

|-- datasets
`-- |-- vg
    |-- |-- images/
    |-- |-- train_from_objects.json
 `-- |-- lvis
    |-- |-- val2017/
    |-- |-- lvis_v1_minival.json
    |-- |-- lvis_v1_clip_a+cname_ViT-H.npy

Training

By default, we train GenerateU using 16 A100 GPUs. You can also train on a single node, but this might prevent you from reproducing the results presented in the paper.

Single-Node Training

When pretraining with VG, single node is enough. On a single node with 8 GPUs, run

python3 launch.py --nn 1 --uni 1 \
--config-file projects/DDETRS/configs/vg_swinT.yaml OUTPUT_DIR outputs/${EXP_NAME}

Multiple-Node Training

# On node 0, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 0 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml  OUTPUT_DIR outputs/${EXP_NAME}
# On node 1, run
python3 launch.py --nn 2 --port <PORT> --worker_rank 1 --master_address <MASTER_ADDRESS> \
--uni 1 --config-file /path/to/config/name.yaml OUTPUT_DIR outputs/${EXP_NAME}

<MASTER_ADDRESS> should be the IP address of node 0. <PORT> should be the same among multiple nodes. If <PORT> is not specifed, programm will generate a random number as <PORT>.

Evaluation

To evaluate a model with a trained/ pretrained model, run

python3 launch.py --nn 1 --eval-only --uni 1 --config-file /path/to/config/name.yaml  \
OUTPUT_DIR outputs/${EXP_NAME}  MODEL.WEIGHTS /path/to/weight.pth

Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{lin2024generateu,
   title={Generative Region-Language Pretraining for Open-Ended Object Detection},
   author={Chuang, Lin and Yi, Jiang and Lizhen, Qu and Zehuan, Yuan and Jianfei, Cai},
   booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   year={2024}
}

Contact

If you have any questions, please feel free to reach me out at chuang.lin@monash.edu.

Acknowledgement

This code is based on UNINEXT. Some code are brought from FlanT5. Thanks for their awesome works.

Special thanks to Bin Yan and Junfeng Wu for their valuable contributions.

Releases

No releases published

Packages

No packages published