Skip to content
/ MCC Public

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

Notifications You must be signed in to change notification settings

fwu11/MCC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract

This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.


Data Preparations

VOC dataset

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar

2. Download the augmented annotations

The augmented annotations can be downloaded from SBD dataset. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012.

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

COCO dataset

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

Unzip and place train and validation images under VOC directory structure style.

MSCOCO/
├── annotations
├── JPEGImages
│    ├── train2014
│    └── val2014
└── SegmentationClass
     ├── train2014
     └── val2014

2. Generating VOC style segmentation labels for COCO

To generate VOC style segmentation labels for COCO dataset, use parse_coco.py.

python ./datasets/parse_coco.py --split train --year 2014 --to-voc12 false --coco-path $coco_path
python ./datasets/parse_coco.py --split val --year 2014 --to-voc12 false --coco-path $coco_path

Create environment

Clone this repo

git clone https://github.com/fwu11/mcc.git
cd mcc

Install the dependencies

conda create -n py38 python==3.8
conda activate py38
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirement.txt

Build Reg Loss

To use the regularized loss, download and compile the python extension, see Here.

Create softlinks to the datasets

ln -s $your_dataset_path/VOCdevkit VOCdevkit
ln -s $your_dataset_path/MSCOCO MSCOCO

Train

## for VOC
CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4 --master_port=29501 scripts/dist_train_voc_seg_neg.py --work_dir work_dir_voc --spg 1
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 scripts/dist_train_coco_seg_neg.py --work_dir work_dir_coco --spg 1

Evalution

## for VOC
python tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val

Results

Here we report the performance on VOC and COCO dataset. MS+CRF denotes multi-scale test and CRF processing.

Dataset Backbone val Log Weights val (with MS+CRF) test (with MS+CRF)
VOC DeiT-B 68.8 log weights 70.3 71.2
COCO DeiT-B 41.1 log weights 42.3 --

Citation

@inproceedings{wu2024masked,
  title={Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation},
  author={Wu, Fangwen and He, Jingxuan and Yin, Yufei and Hao, Yanbin and Huang, Gang and Cheng, Lechao},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={862--871},
  year={2024}
}

Acknowledgement

Our code is developed based on ToCo. Also, we use the Regularized Loss and DenseCRF. We appreciate their great work.

About

Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages