RetinaNet with sampler

Palm tree 🌴

Introduction

It took us quite a lot of time to develop reasonable solution to this competition.

We have decided to work with RetinaNet and focal loss, described in this paper: Focal Loss for Dense Object Detection. If you are new to RetinaNet - I recommend to skim through blog post that describes The intuition behind RetinaNet.

Experiments

All our experiments are available here: Experiments 🚀

Dataset

GoogleAI object detection open-images-v4 dataset. It is large, it is difficult and I think it will stay with us for a little while so it is probably a good idea to get familiar with it. Check dataset_exploration notebook.

Part 1: Analysis of the problem

We quickly decided that this competition consist of two subproblems, each to be approached separately:

First subproblem is classes related to people and clothing, because the bboxes overlap a lot and there are multiple bboxes per image. Here, we have approximately 80 classes.
Second subproblem is remaining classes. Here, we take all these classes and divide it into 7 bins. Each bin is occupied by classes with similar frequency in the dataset. We need such bins to prepare proper epoch as described below.

Preprocessing for training

When we run training for the remaining classes, we make sure that each class (within an epoch) has similar number of occurrences -> we implemented sampler to do this work. Thanks to this we have more balanced problem. In practice we oversample rare classes and subsample frequent classes. 7 bins mentioned above are utilized here.
Next, we calculate aspect ratio and we prepare batches only for images with similar aspect ratio. We need this in the next step - resize. After resize all images are similarly squeezed - training signal is better balanced.
At this point we are ready to feed batch to the network. Images are with similar aspect ratio, classes within the epoch are balanced, so training signal is stronger.
Resulting experiment is like this one.

Other remarks

It is good to start experimenting with few classes (like 10) and get better feel of the problem. We also run training on 10 classes.
We noticed, that for rare classes augmentations are necessary :)

Open Questions

What to do with highly overlapping bboxes (people and clothing subproblem).

Part 2: Technical details

Code 💻

What you can see on master branch is training procedure on 10 classes -> you can extend it to work on the entire dataset. We worked with this code to quickly iterate over various ideas.

Preprocessing

Nothing fancy here. We are working with PyTorch abstraction:

Dataset
Dataloader
- We preprocess the images with standard PyTorch values for pretrained models:
```
MEAN = [0.485, 0.456, 0.406] 
STD  = [0.229, 0.224, 0.225]
```

Model - RetinaNet 👁️

We took the implementation from https://github.com/kuangliu/pytorch-retinanet
Having read the Focal Loss for Dense Object Detection paper we realized there is some stuff missing so:
- we added initialization explained in there
- we added a functionality that lets you switch easily between different resnet versions (34, 50, 101, 152)
We made the non max suppression parallel to make sure that we can submit our solution before christmass

Validation 💎

It took a while but we managed to wrap competition metric calculation from tensorflow/models/research/object_detection in a nice and easy to use function. If you are interested in this part just go to the src/utils.py
We use the valid_ids provided by organizers but we added a little twist where we can choose which object classes we want to train/evaluate on. Good for debugging and lets you check if the network is learning anything.

check our GitHub organization https://github.com/neptune-ml for more cool stuff 😎

Kamil & Kuba, core contributors

Home

Open Solutions

palm-tree 🌴: RetinaNet with sampler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly