<a href="https://colab.research.google.com/github/patbaa/demo_notebooks/blob/master/cnn_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CNN inference notebook

ImageNet consists of more than 1 million images separated into 1000 different categories. ImageNet is the _default_ dataset for computer vision (especially classification) banchmarks in 2010s.
Most of the state-of-the-art models are trained on the ImageNet data to prove their effectiveness. Luckily these models are available in tensorflow along their trained weights.

So to try out a trained convolutional neural network we can download these weights, we do not need to train the models for hours/days.

In this notebook we will use VGG16.

In [0]:
%tensorflow_version 2.x

In [0]:
import cv2
import numpy as np
from PIL import Image
from tensorflow.keras.applications.vgg16 import *
%pylab inline

### Loading VGG16 model
 - include top $\to$ if include the fully connected layers at the end
 - weights $\to$ we want to use the weights that were trained via the ImageNet dataset
 - input_tensor, input_shape $\to$ optinal. If top is True, then  it must be (224, 224, 3)
 - pooling $\to$ if top is False then we can set the last pooling layer
 - classes $\to$ when top is True and weights is None we can set #classes
 
**In modern CNNs the last pooling layer is often global maxpooling, so the result of that layer is (1x1xC) regardless the input image dimension.**
 
For the first time running the cell below the weights are downloaded, later they are fetched from a local file!

In [0]:
model = VGG16(include_top=True,
              weights='imagenet',
              input_tensor=None,
              input_shape=None,
              pooling=None,
              classes=1000)

### Shape: (batch_size, x, y, color_channel)
Also you can use channel_first format is needed (batch_size, color_channel, x, y)

In [0]:
model.summary()

In [0]:
!wget https://scx1.b-cdn.net/csz/news/800/2018/2-dog.jpg -O dog.jpg -q    
# Credit: CC0 Public Domain
pil_dog = Image.open('dog.jpg')
cv2_dog = cv2.imread('dog.jpg')

plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(pil_dog)
plt.axis('off')

plt.subplot(122)
plt.imshow(cv2_dog)
plt.axis('off')
plt.show()

# Heh!?

## Python and images

There are two main libararies in Python to handle images, namely the PIL (pillow) and CV2 (opencv). Both are fine, but remember:
 - **PIL** uses **RGB** as color channel order
 - **cv2** uses **BGR** as color channel order
 
This can lead to significant confusion! One might train the model using cv2 for loading the image, the customer may use the model with PIL loaded images and the performance will be different! 

In [0]:
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(pil_dog)
plt.axis('off')

plt.subplot(122)
plt.imshow(cv2_dog[...,::-1]) # revert color order
plt.axis('off')
plt.show()

## Image preprocessing

Usually images are preprocessed before feeding them to a neural network. We have seen before that scaling the data helps for many machine learning models. It is similar to convolutional neural networks. There are many ways to preprocess the data including:
 - color channel ordering
 - mean modification
 - std modification
 
If we want to use a pre-trained model and keep its accuracy we must preprocess our images the same way as the training images were preprocessed.

For VGG16 in tf.keras the following preprocessing were done during training:
 - mean removal
 - color channel order is BGR
 
The BGR means are 103.939, 116.779, 123.68, which were calculated from the ImageNet training data. For these information one needs to read the API (in lucky cases) or check the source code.  
Also, often preprocess_input functions are provided, but be careful, it is not possible to retreive color channel ordering from a 3D numpy array!

In [0]:
def preprocess(img, RGB=True):
    '''
    For RGB images (PIL) use RGB=True
    For BGR images (cv2) use RGB=False
    '''
    img  = np.array(img.resize((224, 224))).astype(float)
    if RGB:
        img = img[...,::-1]
    img -= [103.939, 116.779, 123.68]
    
    return img[None, ...] # to form a batch with batch_size of 1

# Generating the predictions
 - we need 224x224 pixel images
   - crop
   - resize
 - the predictions are a 1000 long vectors
 - the provided decode_predictions function shows the top predictions in human readable format

## Resize

In [0]:
pil_dog.resize((224, 224))

In [0]:
preds = model.predict(preprocess(pil_dog))
preds[0][:10]

In [0]:
preds = model.predict(preprocess(pil_dog))
decode_predictions(preds)

Wrong color channel order:

In [0]:
preds = model.predict(preprocess(pil_dog, RGB=False))
decode_predictions(preds)

## Crop

In [0]:
dx = 80
dy = 0
img2 = pil_dog.resize((int(pil_dog.size[0]*0.5), int(pil_dog.size[1]*0.5))).crop((dx, dy, dx+224, dy+224))
img2

In [0]:
preds = model.predict(preprocess(img2))
decode_predictions(preds)

# Flipping images

In [0]:
pil_dog.transpose(Image.FLIP_LEFT_RIGHT).resize((224, 224))

In [0]:
preds = model.predict(preprocess(pil_dog.transpose(Image.FLIP_LEFT_RIGHT)))
decode_predictions(preds)

The model did not see top-bottom flipped images during training!

In [0]:
pil_dog.transpose(Image.FLIP_TOP_BOTTOM).resize((224, 224))

In [0]:
preds = model.predict(preprocess(pil_dog.transpose(Image.FLIP_TOP_BOTTOM)))
decode_predictions(preds)

Upside down & wrong color channel order $\to$ pretty bad 

In [0]:
plt.imshow(np.array(pil_dog.transpose(Image.FLIP_TOP_BOTTOM).resize((224, 224)))[...,::-1])
plt.axis('off')
plt.show()

In [0]:
preds = model.predict(preprocess(pil_dog.transpose(Image.FLIP_TOP_BOTTOM), RGB=False))
decode_predictions(preds)

# Other examples

In [0]:
!wget -q -O eagle.jpg https://3.bp.blogspot.com/-ZgrHOoWo8Bs/WFCLpESwZ8I/AAAAAAAAT64/h25mc-NsUPA6qqLR0PqX1xN3DqA_M-PCQCEw/s1600/DSC_0452.JPG
# credits to https://seasonsinthevalley.blogspot.com/
eagle = Image.open('eagle.jpg')
eagle.resize((600, 400))

In [21]:
preds = model.predict(preprocess(eagle))
decode_predictions(preds)

[[('n01616318', 'vulture', 0.41468802),
  ('n01614925', 'bald_eagle', 0.40434498),
  ('n01608432', 'kite', 0.095860355),
  ('n01582220', 'magpie', 0.017725317),
  ('n01829413', 'hornbill', 0.01550091)]]

In [0]:
eagle = eagle.crop((500, 250, 800, 550))
eagle

In [0]:
preds = model.predict(preprocess(eagle))
decode_predictions(preds)

With a proper cropping the **bald_eagle probability significantly improved**! Further possibility: multi-crop images and keep the average of the predictions / the most probable tile's prediction etc..  
Cropping an image with a too small object does not make sene (99 crop background, 1 crop object).   
Also cropping a good and large image does not makes not much sense (individual images gets too little information: eyeball, ear, ..etc)