Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

RetinaNet with sampler

Kamil A. Kaczmarek edited this page Aug 9, 2018 · 4 revisions

Palm tree 🌴

palm tree 🌴


It took us quite a lot of time to develop reasonable solution to this competition.

We have decided to work with RetinaNet and focal loss, described in this paper: Focal Loss for Dense Object Detection. If you are new to RetinaNet - I recommend to skim through blog post that describes The intuition behind RetinaNet.


All our experiments are available here: Experiments 🚀


GoogleAI object detection open-images-v4 dataset. It is large, it is difficult and I think it will stay with us for a little while so it is probably a good idea to get familiar with it. Check dataset_exploration notebook.

Part 1: Analysis of the problem

We quickly decided that this competition consist of two subproblems, each to be approached separately:

  • First subproblem is classes related to people and clothing, because the bboxes overlap a lot and there are multiple bboxes per image. Here, we have approximately 80 classes.
  • Second subproblem is remaining classes. Here, we take all these classes and divide it into 7 bins. Each bin is occupied by classes with similar frequency in the dataset. We need such bins to prepare proper epoch as described below.

Preprocessing for training

  • When we run training for the remaining classes, we make sure that each class (within an epoch) has similar number of occurrences -> we implemented sampler to do this work. Thanks to this we have more balanced problem. In practice we oversample rare classes and subsample frequent classes. 7 bins mentioned above are utilized here.
  • Next, we calculate aspect ratio and we prepare batches only for images with similar aspect ratio. We need this in the next step - resize. After resize all images are similarly squeezed - training signal is better balanced.
  • At this point we are ready to feed batch to the network. Images are with similar aspect ratio, classes within the epoch are balanced, so training signal is stronger.
  • Resulting experiment is like this one.

Other remarks

  • It is good to start experimenting with few classes (like 10) and get better feel of the problem. We also run training on 10 classes.
  • We noticed, that for rare classes augmentations are necessary :)

Open Questions

  • What to do with highly overlapping bboxes (people and clothing subproblem).

Part 2: Technical details

Code 💻

What you can see on master branch is training procedure on 10 classes -> you can extend it to work on the entire dataset. We worked with this code to quickly iterate over various ideas.


Nothing fancy here. We are working with PyTorch abstraction:

  • Dataset
  • Dataloader
    • We preprocess the images with standard PyTorch values for pretrained models:
    MEAN = [0.485, 0.456, 0.406] 
    STD  = [0.229, 0.224, 0.225]

Model - RetinaNet 👁️

Validation 💎

  • It took a while but we managed to wrap competition metric calculation from tensorflow/models/research/object_detection in a nice and easy to use function. If you are interested in this part just go to the src/
  • We use the valid_ids provided by organizers but we added a little twist where we can choose which object classes we want to train/evaluate on. Good for debugging and lets you check if the network is learning anything.