# Introduction

Motivation

## Evaluation
This competition is evaluated on the F2 Score at different intersection over union (IoU) thresholds. The IoU of a proposed set of object pixels and a set of true object pixels is calculated as:
$$IoU(A,B)=\frac{A∩B}{A∪B}.$$
The metric sweeps over a range of IoU thresholds, at each point calculating an F2 Score. The threshold values range from 0.5 to 0.95 with a step size of 0.05: (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). In other words, at a threshold of 0.5, a predicted object is considered a "hit" if its intersection over union with a ground truth object is greater than 0.5.

At each threshold value t, the F2 Score value is calculated based on the number of true positives (TP), false negatives (FN), and false positives (FP) resulting from comparing the predicted object to all ground truth objects. The following equation is equivalent to F2 Score when β is set to 2:
$$F_\beta(t)= \frac{(1+\beta^2)⋅TP(t)}{(1+\beta^2)⋅TP(t)+\beta^2⋅FN(t)+FP(t)}.$$
A true positive is counted when a single predicted object matches a ground truth object with an IoU above the threshold. A false positive indicates a predicted object had no associated ground truth object. A false negative indicates a ground truth object had no associated predicted object. The average F2 Score of a single image is then calculated as the mean of the above F2 Score values at each IoU threshold:
$$\frac{1}{|thresholds|} \sum_t F_2(t).$$
Lastly, the score returned by the competition metric is the mean taken over the individual average F2 Scores of each image in the test dataset.
https://www.kaggle.com/c/airbus-ship-detection#evaluation

## Prizes

- 1st place - \$25,000
- 2nd place - \$15,000
- 3rd place - \$5,000
- Algorithm Speed Prize (Post competition prize) - \$15,000

The goal of the Algorithm Speed Prize is to encourage participants to develop the fastest running algorithm that will still produce accurate results. The speed prize will be open to the top performing teams on the leaderboard.

## Timeline

* July 30, 2018 - Start Date.
* September 27, 2018 - Entry deadline. 
* September 27, 2018 - Team Merger deadline.
* October 4, 2018 - Final submission deadline.
* October 18, 2018 - Final submission deadline for the Algorithm Speed Prize.
* November 22, 2018 - Announcement of the winner of the Algorithm Speed Prize.

Because of the data leak the deadlines were extended. About the leak in details you can read in the next part.

# Data understanding

The original train data set contained 104700 sattelite images with 131030 ships on them. 
Test set contained 88486 images. 15 images was corrupted (1 from train and the rest from test). All images have resolution 768x768 px and were cut from some biger images. No location or time data were provided in the challenge. Kaggle had reosured that there are no overlapping in the images.

Besides ships and water the images also containe clouds, haze, wakes behind the ship (that will not count as part of the ship), coastal areas, docks, marinas, reflections, waves and other floating objects such as buoys, barges, wind turbines, etc. There are also some image cutting artefacts.

Most of the images are empty, only ~39% of them contains ships. You can see on the histogram that images with only one ship are domitaing.

After few weeks of runing a severe data leak in the challenge was found: test images occured to be just shifted train images, there were also overlaps inside train set. Then the challenge hosts provided masks also for test set, so it can be also used for training. The new test set should be provided on the first week of October. Untill that we have focused on training and validating the old data.

https://www.kaggle.com/c/airbus-ship-detection/discussion/64388

# Methodology

Since the idea of the challenge was not only to achive a good accuracy in ship detection and localization but also develop a quicker high performance algorithm, we have suggest to use some extra technic for this porpose. We wanted to compare two models, which are widely used for image segmentation and object localization: Mask R-CNN and U-net. 
U-Net was developed for the biomedical data but recently has shown good results for satellite images also. Mask R-CNN is a successor of Fast R-CNN and Faster R-CNN and now is officially a part of a Facebook's project Detectron.
Both of these models have a deep architecture and many specific heuristics to improve their precision. Both models are characterized by a big number of parameters, what makes them relatively slow in training and predictions. 
Thus, we have tryed to combine them with more simple and fast (pre)classification method. On the first stage this method should scan through the images and reject the empty ones, so on the second stage our large models will process only the selected images, which containe ships with some high propability. Consequently, we aimed to compare accuracy and performance of Mask R-CNN and U-net on the whole test data and only on the selected during the preclassification part.
Due to some technical difficultes we have not combined preclassification and localization blocks in one model, so the estimations of the performance improving are based on the ratio of images, selected during the preclassification, with the whole amount of the test data. That should be considered as a low bowndary of the performance improving because does not include possible parallel processing of the data through these two stages.

# Preclassification
We aimed to compare several different methods for preclassification but, because of some technical difficultes and shortage of time, we have focused only on simple models of convolutionally neural networks. We tryed different architectures and settled on two models with two convolution layers with kernels 5x5 and one linear layers:
1) 8 & 16 fetures (CNN_8)
2) 32 & 64 fetures (CNN_32)
Each convolutional layer were followed by ReLU function and max-pooling with kernel 2x2.
We have used cross-entropy as a loss function and stochastic gradient descent for optimization. After few tryes we choosed the learning rate 0.001, the momentum 0.9 and the weight decay 0.0005. We have trained the models until the loss stops descending, that was after about 50 epochs.

We have also compared the usage of these models with the different image resize factors and balance of positive and negative examples for the training. And in the end we have compared several of these combinations setting different tresholds on a softmax functions.
For this work we have used Pytorch

### Data preparation
There was no complete data loader for these challenge based on Pytorch, which includes all the required functions, so we developed our own using examples from the practice week and some Kaggle kernels. During the preprocessing the images were resized and augmented. The corrupted images and images with size smaller than 40 KB (~200 images) were dropped. 10% of the train data set was used only for validating during training.

### Resize factor
In order to accelerate the preclassification we have used instead of original 768x768 px images the resized versions of them:
* factor 2 (384x384 px)
* factor 4 (192x192 px)
* factor 8 (96x96 px)

Scince some ships takes only few percents of the original images square, with the resize factor 8 they will take only several pixels and will be lost by classification. So after a short check we rejected this factor for the detailed research.
With the resize factor 2 both models have converges to either always 1 or always 0. 

### Data balancing
One of the technics, which we have tryed, is the balancing the ratio of negative examples using only 30% of empty images from train data and all the images with ships on them. In this case the amount of positive and negative examples are approximately equal. That heuristic can be concidered as not clean in data science, because the structure of the train data supposed to be simmilar to test data, but in combination with a specific treshold on a softmax function it gived us a suitable result for the resize factor 4.
Unfortunatelly, for the resize factor 2 that did not help with models convergens to extream states.

### Training
Here we show some curves of the loss function and accuracy.
We have used a simple binar accuracy: 1 if the prediction is right and 0 else. For our task it is not the optimal metric, so we have mostly focused on the confusion matrizes. You can see how their values change during the training.
Here we have only two types of errors: 
* false positive, when the model predicts ships on empty images, 
* false negative, when we loose some images with ships in the next stage.

The first brings down efficiency of the algorithm and the second - its presision. So in any case, our task is to find a trade-off between these two characteristics. We assume that the minimization of the false negative error has here higher priority.
Previously we have predicted a class of an image just taking one of them with a maximal weight. For more precise models tuning we have passed the weights through the softmax function, which returns us values in [0,1], and compared results for different tresholds. The previus predictions will be simmilar to a treshold 0.5. We have desided that the optimal ratio between two errors gives us CNN_8 model with resize factor 2 trained on balanced data with the treshold 0.4.

### Predicting
After chosing the best model and resize factor we have ran the preclassification on the test data (~88k images) for two tresholds: 0.4 and 0.5. As we have previously described a data leak occured in the challenge, so the hosts have provided masks also for this test set. According to these data we have evaluated our predictions one more time more accurate. Using 0.4 treshold we loose only ~5% of the images with ships on them but rejecting ~35% of images, thus, the performance of the next stage can be improved minimum at 1.53 times comparing to processing the whole dataset.

### Run time
The model has run on an Amazon Web Service (AWS) p2.xlarge instance with the following specifications using GPU (CUDA) acceleration:
- 4 Core CPU (Intel Xeon E5‑2686 v4 Broadwell)
- 61 GB RAM
- 1 GPU (NVIDIA K80 with 2496 cores and 12 GB memory) 

One run on the test data set (88486 images) takes on average 9:25 (min:sec), which is much faster than need both models on the second stage.