<a href="https://colab.research.google.com/github/robotics-upo/rva-course-material/blob/master/deeplearningbasics/deploying_imageclassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this session, we will see how to use a model already trained in our code with OpenCV.

We will consider the model we trained in the former session, and also how to use models from TensorFlow Hub, both as is and for transfer learning.

* **TensorFlow**: open source library for machine learning https://www.tensorflow.org/
* **Keras** high-level API for TensorFlow
* **TensorFlow Hub**: repository of trained models https://www.tensorflow.org/hub?hl=es-419
* **OpenCV**: http://opencv.org

We will use it to detect objects from the CIFAR categories in images.

In [None]:
#Numpy module
import numpy as np

#Import OpenCV
import cv2
#Import tensorflow
import tensorflow as tf

#We can use OpenCV in Colab, but not its functions for creating plots
#We use matplotlib for generating plots
from matplotlib import pyplot as plt
from matplotlib import cm

#We use the library scikit to read images from url 
#In OpenCV, the function to read from file is cv2.imread
from skimage import io


# Loading data and models

First, we will load the model we trained in the previous session from file. We can mount in the Colab machine Google Drive and load the file from there.

You can load the models and files from your local folders if you are not using Colab



In [None]:
#Mount the drive
from google.colab import drive
drive.mount('/content/drive')

I am assuming you have a folder called **colabfiles** in your Google Drive **root** folder. 

Within the folder, we have the model **classif.h5**.

If the folders and files are called differently, change the paths and names.


In [None]:
#Let's load the CNN model using the Keras API

model = tf.keras.models.load_model('/content/drive/My Drive/colabfiles/classif.h5')
print(model.summary())

Once we have finished with Google Drive we can unmount it. If you need it for further tasks, leave it mounted.

In [None]:
#Unmount the drive if we are not using it more
drive.flush_and_unmount()

Let's load an example image and show it.

In [None]:
#Let's load an image
imrgb = io.imread('https://robotics.upo.es/~lmercab/rva/test.jpg')

plt.imshow(imrgb, cmap=plt.cm.binary)

# Using our CNN for prediction

We can process patches through our network. Recall our network receives as inputs 32x32 images (normalized between 0 and 1) and outputs probabilities for each of the 10 classes of CIFAR-10.


In the next example, we select a 32x32 patch corresponding to one of the cars.

Notice that the method `predict` expects a batch of images (that is, for this network a tensor of (N, 32, 32,3), where N is the size of the batch. That is why we augment the dimension of the patch. This can be done in NumPy in several ways:

* Using `expand_dims`: https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html
* Using NumPy indexing and slicing and the `newaxis` object: https://numpy.org/doc/stable/reference/arrays.indexing.html

In [None]:
cifar10_labels = ['airplane', 'automobile', 'bird','cat','deer','dog','frog',
'horse','ship', 'truck']

#32x32 image patch, normalized
im_patch = imrgb[140:140+32,75:75+32]/255.0
plt.imshow(im_patch, cmap=plt.cm.binary)

#The network expect as inputs sets of images. We have to expand the dimension of
#the image
prediction = model.predict(im_patch[np.newaxis,...])

#The former operation is equivalent to the following two:
#im_patch_final = np.expand_dims(im_patch, 0)
#prediction = model.predict(im_patch_final)

print('Network output', prediction)
print("Predicción del modelo: ", cifar10_labels[np.argmax(prediction[0])] )


We can use our trained network to look for cars in the image. Also, we want to detect multiple cars, not only one.

The idea is to cover the whole image, extracting patches of 32x32 and taking those ones in which the network predicts as automobile (class 1 in the CIFAR-10 dataset).

In [None]:
#We make a copy to draw on it
im_result = imrgb.copy()

prob_threshold = 0.7

#Loop over the image looking for cars
#We extract 32x32 patches each time. We go in steps of
#32. This could be reduced to cover densely the image, at the cost of time
for r in range(0,imrgb.shape[0] - 32, 32):
  for c in range(0,imrgb.shape[1] - 32, 32):
    im_patch = imrgb[r:r+32,c:c+32]/255.0
    prediction = model.predict(im_patch[np.newaxis,...])
    #Draw a rectangle on the original image if the probability of car
    #(class 1) is over a threshold
    if(np.argmax(prediction[0]) == 1 and prediction[0][1]>prob_threshold):
      upper_left = (c, r)
      bottom_right = (c + 32, r+32)
      #Draw a red rectangle
      cv2.rectangle(im_result,upper_left, bottom_right, (255,0,0), 2)
     
plt.imshow(im_result, cmap=plt.cm.binary)

# Searching at different scales

The main problem is that our network expects cars as 32x32 patches. Of course, in our image there are cars larger than that. 

We need to search for cars in multiple scales in the image. For that, we can use image pyramids as we have seen in former sessions

* https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_pyramids/py_pyramids.html


In [None]:
#Create an image pyramid
#Dowsample image by 2
imrgb_2 = cv2.pyrDown(imrgb)
plt.imshow(imrgb_2, cmap=plt.cm.binary)


In [None]:
#Downsample the former by 2 (so the original by 4)
imrgb_4 = cv2.pyrDown(imrgb_2)
plt.imshow(imrgb_4, cmap=plt.cm.binary)


In [None]:
#Search in downsampled images

#im_result_4 = imrgb_4.copy()
for r in range(0,imrgb_4.shape[0] - 32, 8):
  for c in range(0,imrgb_4.shape[1] - 32, 8):
    im_patch = imrgb_4[r:r+32,c:c+32]/255.0
    prediction = model.predict(im_patch[np.newaxis,...])
    #Draw a rectangle on the original image if the probability of car
    #(class 1) is over a threshold
    if(np.argmax(prediction[0]) == 1 and prediction[0][1]>prob_threshold):
      #Draw in the original scale image
      upper_left = (c*4, r*4)
      bottom_right = (c*4 + 32*4, r*4+32*4)
      cv2.rectangle(im_result,upper_left, bottom_right, (0,255,0), 2)

for r in range(0,imrgb_2.shape[0] - 32, 16):
  for c in range(0,imrgb_2.shape[1] - 32, 16):
    im_patch = imrgb_2[r:r+32,c:c+32]/255.0
    prediction = model.predict(im_patch[np.newaxis,...])
    #Draw a rectangle on the original image if the probability of car
    #(class 1) is over a threshold
    if(np.argmax(prediction[0]) == 1 and prediction[0][1]>prob_threshold):
      #Draw in the original scale image
      upper_left = (c*2, r*2)
      bottom_right = (c*2 + 32*2, r*2+32*2)
      cv2.rectangle(im_result,upper_left, bottom_right, (0,0,255), 2)
     
plt.imshow(im_result, cmap=plt.cm.binary)

As you can see, the process of searching over the image for the objects is costly the way we do it. 

Actually, one can design **object detection networks** desgined to output the bounding boxes and classes for the objects directly from the image efficiently, reusing a lot of operations. Recall that CNNs intrinsically also search at different scales, etc.

#Using pre-trained models

Actually, TensorFlow Hub (and frameworks have similar things) contains many pre-trained models that you can use in your code.

For instance, the Mobilenetv2 model for image classification, trained in the ImageNet dataset

* TF Hub Model: https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4
* ImageNet: https://www.image-net.org/

In [None]:
#Utilities to use TensorFlow Hub
import tensorflow_hub as hub

classifier_model ="https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4"

IMAGE_SHAPE = (224, 224)

#We create a sequential network using Keras with "just one layer", which is the
#Mobilenetv2 model itself 
classifier = tf.keras.Sequential([
    hub.KerasLayer(classifier_model, input_shape=IMAGE_SHAPE+(3,))
])

#We download here the labels for the ImageNet classes. Imagenet considers
#1001 different classes
labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
imagenet_labels = np.array(open(labels_path).read().splitlines())


Let's try some examples. The network requires as inputs images of size (224,224). You can resize an image using `tf.image.resize`

In [None]:
im_input = io.imread('https://robotics.upo.es/~lmercab/rva/coach.jpg')

im_input = im_input/255.0

plt.figure(figsize=(10,10))
plt.imshow(im_input)

Using the method `predict` we can obtain the vector of probabilities. With `np.argmax`, the class with highest probability.

In [None]:
print('Original size:', im_input.shape)
res = tf.image.resize(im_input, IMAGE_SHAPE)

print('New size:', res.shape)

#The network returns a 1001-size vector of probabilities for the different classes
result = classifier.predict(res[np.newaxis, ...])
print('Output shape',result.shape)

predicted_class = np.argmax(result[0], axis=-1)
print('Predicted class',predicted_class)


Recall that in `imagenet_labels` we have the textual labels for the different classes

In [None]:
plt.figure()  
plt.imshow(res)
plt.axis('off')
predicted_class_name = imagenet_labels[predicted_class]
_ = plt.title("Prediction: " + predicted_class_name.title())

# Transfer Learning

One very interesting applications of pre-trained models is transfer-learning. That is, using a model trained in a given dataset to apply to a related task. 

The idea behind is that if the features that the network has learned to extract are general enough, they should also work in the new related task.

In this case, we can just train the last "classification layers" and keep the weights of the convolutional layers intact.

We are goint to use this to address the CIFAR-10 classification problem using the Mobilenetv2 model trained in the much larger ImageNet dataset.   

First, let's download the CIFAR-10 dataset as we did in the last session

In [None]:
from tensorflow.keras.utils import to_categorical

#Load train dataset for CIFAR10
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

plt.imshow(x_test[100], cmap=plt.cm.binary)

#Normalize data
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# Convert class vectors to binary class matrices.
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)


For transfer learning, instead of getting the whole Mobilenetv2 model, we obtain all its layers except the classification part. 

TensorFlow Hub has available this at: https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4

Thus, we create a sequential model, as we have done before. We use the "feature network" from Mobilenetv2 as the first layer (remember, this internally consists of many layers). But now we add to the output of the features a dense layer with sigmoid activation to obtain the 10 classes of CIFAR-10

In [None]:
#Create the Keras model
#The feature network provides 1280-dimensional "flattened" vectors.
#We add after that a fully connected layer to obtain the output
transfer_model = tf.keras.Sequential([
    hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4", output_shape=[1280],
                   trainable=False),  # Can be True.
    tf.keras.layers.Dense(10, activation='softmax')
])

transfer_model.build([None, 224, 224, 3])  # Batch input shape.

print(transfer_model.summary())

Now, we can train the former network with the CIFAR-10 training set. 

Pay attention to the parameter `trainable=False` for the Mobilenetv2 feature network. This means that, when training, we actually will be adjusting the weights of the last classification layer, and not the rest. So we have a very big network, but thanks to transfer learning we will reuse many weight values.

Alterantively you can set `trainable` to `True`. In this case, you can also fine tune the weights of that part.

Before training, we should resize the images to the expected input size of the Mobilenetv2 network. We will only use the first 10000 images of CIFAR due to memory constraints of the Colab machine.

In [None]:
x_resized = tf.image.resize(x_train[0:10000,:,:,:], IMAGE_SHAPE)

print(x_resized.shape)
print(y_train.shape)

Let's train the new model

In [None]:
transfer_model.compile(loss="categorical_crossentropy",
              optimizer="sgd",
              metrics = ['accuracy'])

_ = transfer_model.fit(x_resized, y_train[0:10000,:], batch_size=32, validation_split=0.1, epochs=10, verbose = 1)

We can see the accuracy of the new model on the test set (again, we just use 1000 images of the 10000 of the test set due to RAM restrictions. A better handling of memory in our code can solve this)

In [None]:
x_resized = tf.image.resize(x_test[2000:3000,:,:,:], IMAGE_SHAPE)

_ , test_acc = transfer_model.evaluate(x_resized, y_test[2000:3000,:])

print('Accuracy:', test_acc*100)

Let's use it to see the prediction on the same image as above. We select the same patch. Compare the values of the probabilities.

In [None]:
#32x32 image patch, normalized
im_patch = imrgb[140:140+32,75:75+32]/255.0

#Resize the patch to the expected input size
im_patch = tf.image.resize(im_patch, IMAGE_SHAPE)

plt.imshow(im_patch, cmap=plt.cm.binary)

#The network expect as inputs sets of images. We have to expand the dimension of
#the image
prediction = transfer_model.predict(im_patch[np.newaxis,...])

print('Network output:', prediction)
print('Model prediction:', cifar10_labels[np.argmax(prediction[0])] )