Our code is based on https://github.com/facebookresearch/maskrcnn-benchmark and developed with Python 3.6.5 & PyTorch 1.1.0.
Conventional deep learning based methods for object detection require a large amount of bounding box annotations for training, which is expensive to obtain such high quality annotated data. Few-shot object detection, which learns to adapt to novel classes with only a few annotated examples, is very challenging since the fine-grained feature of novel object can be easily overlooked with only a few data available. In this work, aiming to fully exploit features of annotated novel object and capture fine-grained features of query object, we propose Dense Relation Distillation with Context-aware Aggregation (DCNet) to tackle the few-shot detection problem. Built on the meta-learning based framework, Dense Relation Distillation module targets at fully exploiting support features, where support features and query feature are densely matched, covering all spatial locations in a feed-forward fashion. The abundant usage of the guidance information endows model the capability to handle common challenges such as appearance changes and occlusions. Moreover, to better capture scale-aware features, Context-aware Aggregation module adaptively harnesses features from different scales for a more comprehensive feature representation. Extensive experiments illustrate that our proposed approach achieves state-of-the-art results on PASCAL VOC and MS COCO datasets. For more details, please refer to our CVPR paper (arxiv).
Check INSTALL.md for installation instructions. Since maskrcnn-benchmark has been deprecated, please follow these instructions carefully (e.g. version of Python packages).
First, you need to download the VOC & COCO datasets.
We recommend to symlink the path of the datasets to datasets/
as follows
We use minival
and valminusminival
sets from Detectron (filelink).
mkdir -p datasets/coco
ln -s /path_to_coco_dataset/annotations datasets/coco/annotations
ln -s /path_to_coco_dataset/train2014 datasets/coco/train2014
ln -s /path_to_coco_dataset/test2014 datasets/coco/test2014
ln -s /path_to_coco_dataset/val2014 datasets/coco/val2014
ln -s /path_to_VOCdevkit_dir datasets/voc
For multiple runs, you need to specify the seed in the script.
bash tools/fewshot_exp/datasets/init_fs_dataset_standard.sh
This will also generate the datasets on base classes for base training.
Scripts for training and evaluation on PASCAL VOC dataset.
experiments/DRD/
├── prepare_dataset.sh
├── base_train.sh
├── fine_tune.sh
└── get_result.sh
Configurations of base & few-shot experiments are:
experiments/DRD/configs/
├── base
│ └── e2e_voc_split*_base.yaml
└── standard
└── e2e_voc_split*_*shot_finetune.yaml
Modify them if needed. If you have any question about these parameters (e.g. batchsize), please refer to maskrcnn-benchmark for quick solutions.
- Run the following for base training on 3 VOC splits
cd experiments/DRD
bash base_train.sh
This will generate base models (e.g. model_voc_split1_base.pth
) and corresponding pre-trained models (e.g. voc0712_split1base_pretrained.pth
).
- Run the following for few-shot fine-tuning
bash fine_tune.sh
This will perform evaluation on 1/2/3/5/10 shot of 3 splits.
Result folder is fs_exp/voc_standard_results
by default, and you can get a quick summary by:
bash get_result.sh
@inproceedings{hu2021dense,
title={Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection},
author={Hu, Hanzhe and Bai, Shuai and Li, Aoxue and Cui, Jinshi and Wang, Liwei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10185--10194},
year={2021}
}
- Context-aware Aggregation
- Training scripts on COCO dataset