Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

RetinaNet with sampler

Kamil A. Kaczmarek edited this page Aug 9, 2018 · 4 revisions

Palm tree 🌴

palm tree 🌴

Introduction

It took us quite a lot of time to develop reasonable solution to this competition.

We have decided to work with RetinaNet and focal loss, described in this paper: Focal Loss for Dense Object Detection. If you are new to RetinaNet - I recommend to skim through blog post that describes The intuition behind RetinaNet.

Experiments

All our experiments are available here: Experiments 🚀

Dataset

GoogleAI object detection open-images-v4 dataset. It is large, it is difficult and I think it will stay with us for a little while so it is probably a good idea to get familiar with it. Check dataset_exploration notebook.

Part 1: Analysis of the problem

We quickly decided that this competition consist of two subproblems, each to be approached separately:

  • First subproblem is classes related to people and clothing, because the bboxes overlap a lot and there are multiple bboxes per image. Here, we have approximately 80 classes.
  • Second subproblem is remaining classes. Here, we take all these classes and divide it into 7 bins. Each bin is occupied by classes with similar frequency in the dataset. We need such bins to prepare proper epoch as described below.

Preprocessing for training

  • When we run training for the remaining classes, we make sure that each class (within an epoch) has similar number of occurrences -> we implemented sampler to do this work. Thanks to this we have more balanced problem. In practice we oversample rare classes and subsample frequent classes. 7 bins mentioned above are utilized here.
  • Next, we calculate aspect ratio and we prepare batches only for images with similar aspect ratio. We need this in the next step - resize. After resize all images are similarly squeezed - training signal is better balanced.
  • At this point we are ready to feed batch to the network. Images are with similar aspect ratio, classes within the epoch are balanced, so training signal is stronger.
  • Resulting experiment is like this one.

Other remarks

  • It is good to start experimenting with few classes (like 10) and get better feel of the problem. We also run training on 10 classes.
  • We noticed, that for rare classes augmentations are necessary :)

Open Questions

  • What to do with highly overlapping bboxes (people and clothing subproblem).

Part 2: Technical details

Code 💻

What you can see on master branch is training procedure on 10 classes -> you can extend it to work on the entire dataset. We worked with this code to quickly iterate over various ideas.

Preprocessing

Nothing fancy here. We are working with PyTorch abstraction:

  • Dataset
  • Dataloader
    • We preprocess the images with standard PyTorch values for pretrained models:
    MEAN = [0.485, 0.456, 0.406] 
    STD  = [0.229, 0.224, 0.225]

Model - RetinaNet 👁️

Validation 💎

  • It took a while but we managed to wrap competition metric calculation from tensorflow/models/research/object_detection in a nice and easy to use function. If you are interested in this part just go to the src/utils.py
  • We use the valid_ids provided by organizers but we added a little twist where we can choose which object classes we want to train/evaluate on. Good for debugging and lets you check if the network is learning anything.