AFI-GAN: Improving Feature Interpolation of Feature Pyramid Networks via Adversarial Training for Object Detection
A novel feature interpolator which can substitude for existing interpolation methods
This is the code page for our papers and implemented based on detectron2.
Recent convolutional object detectors learn strong semantic features by combining features propagated from multiple pathways. To combine features with different resolutions, coarser feature maps are upsampled by using the simple interpolation (e.g., nearest neighbor and bilinear) method. However, the simple interpolation incurs often noisy and blurred features. To resolve this, we propose a novel adversarially-trained interpolator, and which can substitute for the traditional interpolation effortlessly. In specific, we design AFI-GAN consisting of an AF interpolator and a feature patch discriminator. In addition, we present a progressive adversarial learning and AFI-GAN losses to generate multi-scale features for downstream detection tasks. However, we can also finetune the proposed AFI-GAN with the recent multi-scale detectors without the adversarial learning once a pre-trained AF interpolator is provided. We prove the effectiveness and flexibility of our AF interpolator, and achieve the better box and mask APs by 2.2% and 1.6% on average compared to using other interpolation. Moreover, we achieve an impressive detection score of 57.3% mAP on the MSCOCO dataset.
- Adversarially-trained feature interpolator: For robust multi-scale object detection, the AF interpolator generates high quality of up-sampled features. To this end, we learn this interpolator via adversarial learning.
- Progressive adversarial learning: In order to avoid networks overfitted and improve the interpolation ability of AFI-GAN step-by-step for a specific detection task, we present progressive adversarial learning.
- Substituting simple interpolation modules: Substituting a simple interpolation module of multi-scale feature extractor (e.g. FPN, PAFPN, and BiFPN) with the AF interpolator achieves the accuracy improvements over using simple interpolation modules.
- High flexibility over different backbones and detectors: In practical, the AF interpolator is feasible to reuse although it is trained with other backbones and detection heads (e.g. RetinaNet, Faster R-CNN, Mask R-CNN, Cascade R-CNN, FCOS, and CenterMask).
interpolation | Detection Head | Backbone | box AP | mask AP | Download |
---|---|---|---|---|---|
NN | FCOS | R-50-FPN | 39.7 | - | - |
NN | FCOS | R-50-BiFPN | 40.6 | - | - |
NN | Mask R-CNN | R-50-FPN | 39.0 | 35.5 | - |
NN | Mask R-CNN | R-50-PAFPN | 39.0 | 35.6 | model |
NN | CenterMask | R-50-BiFPN | 40.6 | 35.8 | - |
NN | Cascade R-CNN◊ | Swin-T-BiFPN | 48.3 | - | - |
NN | Cascade R-CNN | S-101-PAFPN | 48.6 | 41.9 | model |
AFI | FCOS | R-50-FPN | 42.6 | - | - |
AFI | FCOS | R-50-BiFPN | 43.9 | - | - |
AFI | Mask R-CNN | R-50-FPN | 41.5 | 37.4 | - |
AFI | Mask R-CNN | R-50-PAFPN | 40.9 | 36.9 | model |
AFI | CenterMask | R-50-BiFPN | 43.8 | 38.2 | - |
AFI | Cascade R-CNN | S-101-PAFPN | 49.4 | 42.6 | model |
AFI | Cascade R-CNN◊ | Swin-T-BiFPN | 51.7 | - | - |
AFI | Cascade R-CNN† | S-101-PAFPN | 51.6 | 44.7 | model |
AFI | Cascade R-CNNנ | Swin-L-BiFPN | 57.3 | - | model |
- NN and AFI denote a nearest-neighbor interpolation method and the proposed method, respectively.
- S and Swin denote ResNeSt backbone networks and swin-transfomer backbone networks, respectively.
- ◊ and † represent COCO unlabeled set via self-training and multi-scale testing results, respectively.
AFI-GAN is build on Detectorn2. We recommend to install Detectron2 v0.1.1.
Please install the detectron2 v0.1.1 in advance.
Prepare COCO dataset described below:
datasets/
coco/
{train,val,test}2017/
annotations/
instances_{train,val}2017.json
image_info_test2017.json
For running the swin transfromer backbone, additional python packages are needed as below:
pip install timm==0.5.4
pip install dataclasses==0.8
As we described, we improve the interpolation ability of AFI-GAN by using progessive training.
To this end, we provide three scripts: stage1_train.py
, stage2_train.py
, stage3_train.py
implemented for "Step1. AFI-GAN Training", "Step2. Multi-Scale AF Extractor Training", and "Step3. Target Detector Training", respectively.
We also provide three configuration files for our progressive adversarial learning. You can use them as references to train another models as you want.
For example, to train our AFI-GAN, firstly AFI-GAN Training run:
python3 stage1_train.py\
--num-gpus 4\
--config-file configs/step1_afigan_training/step1_AFI-GAN_training_mask_rcnn_R_50_FPN_1x.yaml
Then, Multi-Scale AF Extractor Training run:
python3 stage2_train.py\
--num-gpus 4\
--config-file configs/step2_af_extractor_training/step2_AF-Extractor_training_mask_rcnn_R_50_FPN_1x.yaml
Finally, Target Detector Training run:
python3 stage3_train.py\
--num-gpus 4\
--config-file configs/step3_target_detector_training/step3_AFI-GAN_maskrcnn_R_50_FPN_3x.yaml
Model evaluation can be done as below:
- if you want to inference with 1 batch
--num-gpus 1
--eval-only
MODEL.WEIGHTS path/to/the/model.pth
python3 run_net.py\
--num-gpus 1\
--eval-only\
--config-file path/to/the/config-file.yaml\
SOLVER.IMS_PER_BATCH 1\
MODEL.WEIGHTS path/to/the/model.pth\
OUTPUT_DIR path/to/the/directory
- if you want to inference with 8 batch
--num-gpus 8
python3 run_net.py\
--num-gpus 8\
--eval-only\
--config-file path/to/the/config-file.yaml\
SOLVER.IMS_PER_BATCH 8\
MODEL.WEIGHTS path/to/the/model.pth\
OUTPUT_DIR path/to/the/directory
Please cite our paper if you find this repo helpful:
@article{lee_2023_pr_afigan,
title = {AFI-GAN: Improving feature interpolation of feature pyramid networks via adversarial training for object detection},
journal = {Pattern Recognition},
volume = {138},
pages = {109365},
year = {2023},
issn = {0031-3203},
doi = {https://doi.org/10.1016/j.patcog.2023.109365},
url = {https://www.sciencedirect.com/science/article/pii/S0031320323000663},
author = {Seong-Ho Lee and Seung-Hwan Bae}
}
We referred below codes to conduct these experiments.