# **ESRI DATA SCIENCE CHALLENGE 2019**


In this challenge, we have been given 224x224 images containing cars and swimming pools labelled in PASCAL VOC Format. As we'll be using RetinaNet for this challenge, we'll make use of keras-retinanet package from github. 
* [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) - Research paper describing RetinaNet and Focal Loss which it uses
* [keras-retinanet](https://github.com/fizyr/keras-retinanet) - Keras RetinaNet implimentation

In [5]:
!pip3 install opencv-python imgaug keras-retinanet

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [6]:
import xml.etree.ElementTree as ET
import os
import numpy as np
import keras
import math
import tensorflow as tf
import cv2
from os import listdir, walk
from os.path import join
from keras_retinanet.bin.train import create_generators,create_models,create_callbacks
from keras_retinanet.models import backbone,load_model,convert_model
from keras_retinanet.utils.config import read_config_file,parse_anchor_parameters
from keras_retinanet.utils.visualization import draw_boxes
from sklearn.model_selection import train_test_split
from imgaug import augmenters as iaa

tf.set_random_seed(31) # SEEDS MAKE RESULTS MORE REPRODUCABLE
np.random.seed(17)

Using TensorFlow backend.


TODO: Change classes to, at most, 1. Or maybe to 'Trash'

In [7]:
classes = ['1','2']

# Load and Convert Annotations

Here we load annotations given in PASCAL VOC Format and save them in CSV Format as required by keras-retinanet package

TODO: I don't thing I have my annotations in VOC format, so I think I can kill this and replace with another functions that writes to `out_file` appropriately.

def convert_annotation(image_id,filename):
    in_file = open('training_data/labels/%s.xml'%(image_id))
    out_file = open(filename, 'a')
    tree=ET.parse(in_file)
    root = tree.getroot()
    
    if root.iter('object') is not None:
        for obj in root.iter('object'):
            cls = obj.find('name').text
            if cls not in classes:
                continue
            cls_id = classes.index(cls)
            
            xmlbox = obj.find('bndbox')
            x1 = math.ceil(float(xmlbox.find('xmin').text))
            y1 = math.ceil(float(xmlbox.find('ymin').text))
            x2 = math.ceil(float(xmlbox.find('xmax').text))
            y2 = math.ceil(float(xmlbox.find('ymax').text))
            if x1 == x2 or y1 == y2:
                continue
                
            out_file.write(f'training_data/images/{image_id}.jpg,{x1},{y1},{x2},{y2},{cls}\n')
    else:
        out_file.write(f'training_data/images/{image_id}.jpg,,,,,\n')

# Training and Validation split

Normally we would have 10-30% of our images in validation set but as we want best possible score we'll use all our images to train, as we have quite few training images already. 

In [24]:
import csv 
import pandas as pd 
with open('training_data/images/deduped_annotations.csv') as f : 
    ls = list(csv.reader(f))[1:]
    
def to_annotation(l) : 
    # Note swap of 1st and 2nd Y
    return ("training_data/images/" + l[0][11:],l[1],l[4],l[3],l[2],1)

x_train,x_val = train_test_split([to_annotation(l) for l in ls], test_size=0.2)
pd.DataFrame(x_train).to_csv("annotations.csv",header=None,index=None)
pd.DataFrame(x_val).to_csv("val_annotations.csv",header=None,index=None)

In [25]:
with open('classes.csv','w') as f:
    #f.write('1,0\n2,1\n')
    f.write('1,0\n')

# Anchor Parameters

1. Anchor parameters are used to decide how anchor boxes will be generated for the model.
1. As we're dealing mostly small boxes with can be highly elongated, we'll change ratios and scales to fit our needs.
1. test_anchors.ipynb is used to visualize anchors on ground truth boxes

TODO: Well... our anchor boxes will be small, but not sure about the highly elongated. I imagine that might result in tweaks to ratios? Like maybe that specifies expected aspect ratios?

In [26]:
with open('config.ini','w') as f:
    f.write('[anchor_parameters]\nsizes   = 32 64 128 256 512\nstrides = 8 16 32 64 128\nratios  = 0.25 0.5 0.75 1 1.5 2 4 6 8 10\nscales  = 0.5 1 2\n')

# Some Hyperparameters

We will rescale our images to 672x672 for better precision

TODO: I dislike this. And what is 672 x 672 ? 

In [39]:
b = backbone('resnet50')
training_len = 5408
testing_len = 1353

class args:
    batch_size = 64
    config = read_config_file('config.ini')
    random_transform = True # Image augmentation
    annotations = 'annotations.csv'
    val_annotations = 'val_annotations.csv'
    classes = 'classes.csv'
    image_min_side = 672
    image_max_side = 672
    dataset_type = 'csv'
    tensorboard_dir = ''
    evaluation = False
    snapshots = True
    snapshot_path = "saved/"
    backbone = 'resnet50'
    epochs = 5
    steps = training_len//(batch_size)
    weighted_average = True

In [40]:
train_gen,valid_gen = create_generators(args,b.preprocess_image)

# Image Augmentation

In addition to augmentations already done by keras-retinanet [here](https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/bin/train.py#L227) , we'll use a package called imgaug to furthur augment the data.


In [41]:
sometimes = lambda aug: iaa.Sometimes(0.5, aug)
# Define our sequence of augmentation steps that will be applied to every image.
seq = iaa.Sequential(
    [
        #
        # Execute 1 to 9 of the following (less important) augmenters per
        # image. Don't execute all of them, as that would often be way too
        # strong.
        #
        iaa.SomeOf((1, 9),
            [

                        # Blur each image with varying strength using
                        # gaussian blur (sigma between 0 and .5),
                        # average/uniform blur (kernel size 1x1)
                        # median blur (kernel size 1x1).
                        iaa.OneOf([
                            iaa.GaussianBlur((0,0.5)),
                            iaa.AverageBlur(k=(1)),
                            iaa.MedianBlur(k=(1)),
                        ]),

                        # Sharpen each image, overlay the result with the original
                        # image using an alpha between 0 (no sharpening) and 1
                        # (full sharpening effect).
                        iaa.Sharpen(alpha=(0, 0.25), lightness=(0.75, 1.5)),

                        # Add gaussian noise to some images.
                        # In 50% of these cases, the noise is randomly sampled per
                        # channel and pixel.
                        # In the other 50% of all cases it is sampled once per
                        # pixel (i.e. brightness change).
                        iaa.AdditiveGaussianNoise(
                            loc=0, scale=(0.0, 0.01*255), per_channel=0.5
                        ),

                        # Either drop randomly 1 to 10% of all pixels (i.e. set
                        # them to black) or drop them on an image with 2-5% percent
                        # of the original size, leading to large dropped
                        # rectangles.
                        iaa.OneOf([
                            iaa.Dropout((0.01, 0.1), per_channel=0.5),
                            iaa.CoarseDropout(
                                (0.03, 0.15), size_percent=(0.02, 0.05),
                                per_channel=0.2
                            ),
                        ]),

                        # Add a value of -5 to 5 to each pixel.
                        iaa.Add((-5, 5), per_channel=0.5),

                        # Change brightness of images (85-115% of original value).
                        iaa.Multiply((0.85, 1.15), per_channel=0.5),

                        # Improve or worsen the contrast of images.
                        iaa.ContrastNormalization((0.75, 1.25), per_channel=0.5),

                        # Convert each image to grayscale and then overlay the
                        # result with the original with random alpha. I.e. remove
                        # colors with varying strengths.
                        iaa.Grayscale(alpha=(0.0, 0.25)),

                        # In some images distort local areas with varying strength.
                        sometimes(iaa.PiecewiseAffine(scale=(0.001, 0.01)))
                    ],
            # do all of the above augmentations in random order
            random_order=True
        )
    ],
    # do all of the above augmentations in random order
    random_order=True
)

In [42]:
def augment_train_gen(train_gen,visualize=False):
    '''
    Creates a generator using another generator with applied image augmentation.
    Args
        train_gen  : keras-retinanet generator object.
        visualize  : Boolean; False will convert bounding boxes to their anchor box targets for the model.
    '''
    imgs = []
    boxes = []
    targets = []
    size = train_gen.size()
    idx = 0
    while True:
        while len(imgs) < args.batch_size:
            image       = train_gen.load_image(idx % size)
            annotations = train_gen.load_annotations(idx % size)
            image,annotations = train_gen.random_transform_group_entry(image,annotations)
            imgs.append(image)            
            boxes.append(annotations['bboxes'])
            targets.append(annotations)
            idx += 1
        if visualize:
            imgs = seq.augment_images(imgs)
            im2 = np.array(imgs)
            boxes = np.array(boxes)
            yield im2,boxes
        else:
            imgs = seq.augment_images(imgs)
            imgs,targets = train_gen.preprocess_group(imgs,targets)
            imgs = train_gen.compute_inputs(imgs)
            targets = train_gen.compute_targets(imgs,targets)
            im2 = np.array(imgs)
            yield im2,targets
        imgs = []
        boxes = []
        targets = []
        

# Visualize augmentations

import matplotlib.pyplot as plt

skip_batches = 5
i = 0

for imgs,boxes in augment_train_gen(train_gen,visualize=True):
    if i > skip_batches:
        fig=plt.figure(figsize=(24,96))
        columns = 2
        rows = 8
        for i in range(1, columns*rows + 1):
            draw_boxes(imgs[i], boxes[i], (0, 255, 0), thickness=1)
            fig.add_subplot(rows, columns, i)
            plt.imshow(cv2.cvtColor(imgs[i],cv2.COLOR_BGR2RGB))
        plt.show()
        break
    else:
        i += 1


# More Hyperparameters

we'll use learning rate of 0.001 and freeze weights for the backbone

In [43]:
model, training_model, prediction_model = create_models(
            backbone_retinanet=b.retinanet,
            num_classes=train_gen.num_classes(),
            weights=None,
            multi_gpu=False,
            freeze_backbone=True,
            lr=1e-3,
            config=args.config
        )

In [44]:
callbacks = create_callbacks(
    model,
    training_model,
    prediction_model,
    valid_gen,
    args,
)

# Download pretrained model

We download a pretrained model on COCO dataset and load it's weights, we'll skip loading the weights for the few last layers

In [45]:
#!wget https://github.com/fizyr/keras-retinanet/releases/download/0.5.1/resnet50_coco_best_v2.1.0.h5

In [46]:
training_model.load_weights('resnet50_coco_best_v2.1.0.h5',skip_mismatch=True,by_name=True)

  weight_values[i].shape))
  weight_values[i].shape))
  weight_values[i].shape))
  weight_values[i].shape))


# Train the model

We will train for 70 epochs

In [None]:
training_model.fit_generator(generator=augment_train_gen(train_gen),
        steps_per_epoch=args.steps,
        epochs=args.epochs,
        verbose=1,
        callbacks=callbacks,)

Epoch 1/5
