Concordia University
Hasib Zunair and A. Ben Hamza
[Paper
] [Project
] [Demo
] [BibTeX
]
This is official code for our WACV 2024 paper:
Learning to Recognize Occluded and Small Objects with Partial Inputs
We propose a learning algorithm to explicitly focus on context from neighbouring regions around objects and learn a distribution of association across classes. Ideally to handle situations in-the-wild where only part of some object class is visible, but where us humans might readily use context to infer the classes presence.
This code requires Python 3.8.12 and CUDA 11.2. Create and activate the following conda envrionment.
conda update conda
conda env create -f environment.yml
conda activate msl
The VOC2007, COCO2014 and Wider-Attribute datasets are expected to have the following structure:
|- datasets/
|-- VOCdevkit/
|---- VOC2007/
|------ JPEGImages/
|------ Annotations/
|------ ImageSets/
......
|-- COCO2014/
|---- annotations/
|---- images/
|------ train2014/
|------ val2014/
......
|-- WIDER/
|---- Annotations/
|------ wider_attribute_test.json
|------ wider_attribute_trainval.json
|---- Image/
|------ train/
|------ val/
|------ test/
...
Then directly run the following command to generate json file of these datasets.
python utils/prepare/prepare_voc.py --data_path datasets/VOCdevkit
python utils/prepare/prepare_coco.py --data_path datasets/COCO2014
python utils/prepare/prepare_wider.py --data_path datasets/WIDER
which will automatically result in annotation json files in ./data/voc07, ./data/coco and ./data/wider. Finally, download the masks of random streaks and holes of arbitrary shapes from SCRIBBLES.zip and put in inside datasets
folder.
# MSL ResNet with CutMix
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl_rescm_voc --batch_size 6 --total_epoch 60 --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --cutmix data/resnet101_cutmix_pretrained.pth
# MSL ViT
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl_vitl_voc --model vit_L16_224 --img_size 224 --batch_size 6 --total_epoch 60 --num_heads 1 --lam 0.3 --dataset voc07 --num_cls 20
# MSL ResNet with CutMix
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl01_0.3,0.2,0.5_rescm_coco --batch_size 6 --total_epoch 60 --num_heads 6 --lam 0.4 --dataset coco --num_cls 80 --cutmix data/resnet101_cutmix_pretrained.pth
# MSL ViT
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl_vitl_coco --model vit_L16_224 --img_size 224 --batch_size 6 --total_epoch 40 --num_heads 8 --lam 1 --dataset coco --num_cls 80
# MSL ViT-L
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl_vitl_wider --model vit_L16_224 --img_size 224 --batch_size 6 --total_epoch 40 --num_heads 1 --lam 0.3 --dataset wider --num_cls 14
# MSL ViT-B
CUDA_VISIBLE_DEVICES=0 python train.py --exp_name msl_vitb_wider --model vit_B16_224 --img_size 224 --batch_size 6 --total_epoch 40 --num_heads 1 --lam 0.3 --dataset wider --num_cls 14
# MSL ResNet with CutMix
CUDA_VISIBLE_DEVICES=0 python val.py --num_heads 1 --lam 0.1 --dataset voc07 --num_cls 20 --load_from checkpoint/msl_c_voc.pth
# MSL ResNet with CutMix
CUDA_VISIBLE_DEVICES=0 python val.py --num_heads 6 --lam 0.4 --dataset coco --num_cls 80 --load_from checkpoint/msl_c_coco.pth
CUDA_VISIBLE_DEVICES=0 python val.py --model vit_B16_224 --img_size 224 --num_heads 1 --lam 0.3 --dataset wider --num_cls 14 --load_from checkpoint/msl_v_wider.pth
All experiments are conducted on a single NVIDIA 3080Ti GPU. For additional implementation details and results, please refer to the supplementary materials section in the paper.
We provide pretrained models on GitHub Releases for reproducibility.
Dataset | Backbone | mAP (%) | Download |
---|---|---|---|
VOC2007 | MSL-C | 86.4 | download |
COCO2014 | MSL-C | 96.1 | download |
Wider-Attribute | MSL-V | 90.6 | download |
We provide prediction demos of our models. The demo images (picked from VCO2007) have already been put into ./utils/demo_images/, you can simply run demo.py by using our MSL models pretrained on VOC2007:
CUDA_VISIBLE_DEVICES=0 python demo.py --model resnet101 --dataset voc07 --load_from checkpoint/msl_c_voc.pth --img_dir utils/demo_images
which will output like this:
utils/demo_images/000001.jpg prediction: dog,person,
utils/demo_images/000004.jpg prediction: car,
utils/demo_images/000002.jpg prediction: train,
...
A web demo is available here.
@inproceedings{zunair2024msl,
title={Learning to Recognize Occluded and Small Objects with Partial Inputs},
author={Zunair, Hasib and Hamza, A Ben},
booktitle={Proc. IEEE Winter Conference on Applications of Computer Vision},
year={2024}
}
My notes for reference
[Oct 24, 2023] Accepted to WACV 2024! Wohooo. :D
[Sept 24, 2023] Semantic segmentation scripts added in this repo, built on https://github.com/hasibzunair/masksup-segmentation. Results were not added in paper due to time. Keeping it here for future reference!
This repository was built on top of CSRA and our previous work MaskSup which explores masked supervision in semantic segmentation. Please, consider acknowledging these projects.