Code Implementation for CVPR2024 PAPER -- Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models (PnP-OVSS)

❗ Only the code for BLIP with Pascal Context is provided here.

System

CUDA Version: 11.7
GPU Memory: 49140MiB

Download LAVIS

Build LAVIS environment following the instruction here

conda create -n lavis python=3.8 
conda activate lavis 
pip install salesforce-lavis 
git clone https://github.com/salesforce/LAVIS.git 
cd LAVIS 
pip install -e .

This would download the newest torch, may need to modify torch version based on your cuda version.
Might also need to downgrade transformer pip install transformers==4.25

Download Gradient_Free_Optimizers_master

Download Gradient_Free_Optimizers_master and put it under LAVIS (This is for Random Search. Can ignore it for now)

Download pydensecrf

Git clone pydensecrf and put it under LAVIS

Download datasets

Pascal VOC
Pascal Context: Download dataset following instruction from mmsegmentation
COCO Object
COCO Stuff
ADE20K
Cityscapes: Download dataset following instruction from mmsegmentation

LAVIS
├── mmsegmentation
│   ├── VOCdevkit
│   │   ├── VOC2010
│   │   │   ├── JPEGImages
│   │   │   ├── SegmentationClassContext

Replace the config file and GradCAM file of BLIP

Download all the files in this repository and put them under LAVIS
Replace /home/user/LAVIS/lavis/models/blip_models/blip_image_text_matching.py with the file in this repository
Replace /home/user/LAVIS/lavis/configs/models/blip_itm_large.yaml with the file in this repository
Replace /home/user/LAVIS/lavis/models/med.py with the file in this repository
Replace /home/user/LAVIS/lavis/models/vit.py with the file in this repository
Replace /home/user/LAVIS/lavis/models/base_model.py with the file in this repository
Replace /home/user/LAVIS/lavis/processors/blip_processors.py with the file in this repository

Run scripts

For saving the GradCAM maps and index of patches to drop for each round of Salience Drop

For Pascal Context
bash PSC_halving.sh

For Gaussian Blur, Dense CRF and Mask Evalutaion

For COCO Object and COCO stuff
bash New_eval_cam_PSC.sh

The output would have the following structure

LAVIS
├── New_Cbatch_Eval_test_ddp_0126_768_flickrfinetune_zeroshot_halvingdrop_Cityscapes
│   ├── gradcam
│   │   ├── max_att_block_num8_del_patch_numsort_thresh005
│   │   │   ├── drop_iter0
│   │   │   │   ├──img_att_forclasses (attention map)
│   │   │   │   ├──Union_check0928    (visualization of attention map)
│   │   │   │   ├──highest_att_save   (index of patches to be dropped)
│   │   │   ├── drop_iter1
│   │   │   ├── drop_iter2
│   │   │   ├── drop_iter3
│   │   │   ├── drop_iter4

Modify Hyperparameters in bash files

CUDA_VISIBLE_DEVICES=3 python pnp_get_attention_textloc_weaklysupervised_search_Cityscapes.py \
--save_path New_Cbatch_Eval_test_ddp_0126_448_flickrfinetune_zeroshot_halvingdrop_Cityscapes \
--master_port 10990 --gen_multiplecap_withpnpvqa label --world_size 1 \
--del_patch_num sort_thresh005 \
--img_size 768 \
--batch_size 2 \
--max_att_block_num 8 --drop_iter 5 --prune_att_head 9 --sort_threshold 0.05

To change image size, you may also need to modify the image size in /home/user/LAVIS/lavis/configs/models/blip_itm_large.yaml
Remember to match the save_path in {xxx}_halving.sh with the cam_out_dir in New_eval_cam_{xx}.sh

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Files to replace for BLIP		Files to replace for BLIP
Cityscapes_halving.sh		Cityscapes_halving.sh
Dataset_Cityscapes.py		Dataset_Cityscapes.py
Dataset_PSC.py		Dataset_PSC.py
New_eval_cam_Cityscapes.py		New_eval_cam_Cityscapes.py
New_eval_cam_Cityscapes.sh		New_eval_cam_Cityscapes.sh
New_eval_cam_PSC.sh		New_eval_cam_PSC.sh
New_eval_cam_pascalcontext.py		New_eval_cam_pascalcontext.py
Pascalcontext_halving.sh		Pascalcontext_halving.sh
README.md		README.md
pnp_get_attention_textloc_weaklysupervised_search_Cityscapes.py		pnp_get_attention_textloc_weaklysupervised_search_Cityscapes.py
pnp_get_attention_textloc_weaklysupervised_search_PSC.py		pnp_get_attention_textloc_weaklysupervised_search_PSC.py
utils.py		utils.py

letitiabanana/PnP-OVSS

Folders and files

Latest commit

History

Repository files navigation

Code Implementation for CVPR2024 PAPER -- Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models (PnP-OVSS)

System

Download LAVIS

Download Gradient_Free_Optimizers_master

Download pydensecrf

Download datasets

Replace the config file and GradCAM file of BLIP

Run scripts

For saving the GradCAM maps and index of patches to drop for each round of Salience Drop

For Gaussian Blur, Dense CRF and Mask Evalutaion

Modify Hyperparameters in bash files

About

Resources

Stars

Watchers

Forks

Languages