# Improved Object Detection

This is a slightly improved version of the naive object detection approach.
Here the sliding window is run over the output of the last (non-dense layer)
of vgg16 (called `block5_pool`).
This is also called the _bottom_ model, and it can be retrieved
by using `include_top=False` when loading the model from
the `vgg16` module.

One other effect is that then the model is independent of the input size.
By choosing a double as large image, the output of the last layer of the
bottom model just gets twice as large as well.

Our approach is therefore:

* Separate bottom and top model
* Run the (double-size) image through the bottom model
* The output (14x14x512) will then be split into regions of 7x7x512 (sliding window)
* These regions will be fed into the top model

The output is essentially the same - but we are running the big picture
only once through the bottom model, and only the top model (having just a couple
of layers) is run for each regio.

The code is almost literally taken from https://github.com/DOsinga/deep_learning_cookbook.

In [None]:
from keras.applications import vgg16
from keras import backend as K
from keras.preprocessing.image import load_img, img_to_array
from keras.models import Model, load_model
from keras.layers import Flatten, Dense, Input, TimeDistributed
import numpy as np
from collections import Counter, defaultdict
from keras.preprocessing import image
from PIL import ImageDraw

from scipy.misc import imread, imresize, imsave, fromimage, toimage

try:
    from io import BytesIO
except ImportError:
    from StringIO import StringIO as BytesIO
import PIL
from IPython.display import clear_output, Image, display, HTML

## Helper Routines

Some helper routines to pre-process an image, and to show a pre-processed image again.

In [None]:
def showarray(a, fmt='jpeg'):
    f = BytesIO()
    PIL.Image.fromarray(a).save(f, fmt)
    display(Image(data=f.getvalue()))

def preprocess_image(image_path, target_size=None):
    img = load_img(image_path, target_size=target_size)
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg16.preprocess_input(img)
    return img

def deprocess_image(x, w, h):
    x = x.copy()
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, w, h))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((w, h, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

## Loading pretrained classifier

In [None]:
base_model = vgg16.VGG16(weights='imagenet', include_top=True)
base_model.summary()

## Getting Top Model

For getting the _top_ model we have to recreate its layers manually, and then copy over the trained
weights from `vgg16`.

In [None]:
def create_top_model(base_model):
    inputs = Input(shape=(7, 7, 512), name='input')
    flatten = Flatten(name='flatten')(inputs)
    fc1 = Dense(4096, activation='relu', name='fc1')(flatten)
    fc2 = Dense(4096, activation='relu', name='fc2')(fc1)
    predictions = Dense(1000, activation='softmax', name='predictions')(fc2)
    model = Model(inputs, predictions, name='top_model')
    for layer in model.layers:
        if layer.name != 'input':
            print(layer.name)
            layer.set_weights(base_model.get_layer(layer.name).get_weights())
    return model

top_model = create_top_model(base_model)
top_model.summary()

## Getting Bottom Model

For getting the _bottom_ model we just call `vgg16.VGG16` but this time with `include_top=False`.

In [None]:
bottom_model = vgg16.VGG16(weights='imagenet', include_top=False)
bottom_model.summary()

## Object Detection

### Loading image

Loading an image, preprocess it, and output the preprocessed image for verification

In [None]:
cat_dog2 = preprocess_image('data/cat_dog.jpg', target_size=(448, 448))
showarray(deprocess_image(cat_dog2, 448, 448))

## Run the Bottom Model

We now run the bottom model on the whole image.

In [None]:
bottom_out = bottom_model.predict(cat_dog2)
bottom_out.shape

### Creating regions

Using a sliding 7x7 window, create 49 regions which will then be run throgh the top model.

In [None]:
crops = []
rects = []
for x in range(7):
    for y in range(7):
        crops.append(bottom_out[0, x: x + 7, y: y + 7, :])
        rects.append((y * 32, x * 32, 224 + y * 32, 224 + x * 32))
crops = np.asarray(crops)

### Run Top Model on Regions

And show top results

In [None]:
preds = top_model.predict(crops)
crop_scores = defaultdict(list)
for idx, pred in enumerate(vgg16.decode_predictions(preds, top=1)):
    _, label, weight = pred[0]
    crop_scores[label].append((idx, weight))
crop_scores.keys()

### Show top results

Using manually selected classes, show the top regions for the classes

In [None]:
def draw_best_region_for_label(l, draw, label, color=(0,0,0)):
    idx = max(l[label], key=lambda t:t[1])[0]
    draw.rectangle(rects[idx], outline=color)
    
cat_dog_img = image.load_img('data/cat_dog.jpg', target_size=(448, 448))
draw = ImageDraw.Draw(cat_dog_img)
draw_best_region_for_label(crop_scores, draw, 'tabby', (255,0,0))
draw_best_region_for_label(crop_scores, draw, 'golden_retriever', (0,255,0))
cat_dog_img