# How to use pre-trained models to classify objects in photos #

Convolutional neural networks now outperform human eyes on some computer vision tasks, such as image classification.
<br><br>
That is, given a photo of an object, we can ask the computer to answer the question of which of the 1000 specific types of object this photo is.
<br><br>
Models for image classification with weights trained on ImageNet:
* VGG16
* VGG19
* ResNet50
* InceptionV3
* InceptionResNetV2
* Xception
* MobileNet
<br><br>
![imagenet](https://imgur.com/am6MnJe.png)

In [1]:
import platform
import tensorflow
import keras
print("Platform: {}".format(platform.platform()))
print("Tensorflow version: {}".format(tensorflow.__version__))
print("Keras version: {}".format(keras.__version__))

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from IPython.display import Image

Using TensorFlow backend.


Platform: Windows-10-10.0.16299-SP0
Tensorflow version: 1.4.0
Keras version: 2.1.2


# Develop a simple photo classifier #
## VGG16 ##
### 1.Get the sample image ###

First, we need an image that we can categorize.

I use:
![image](https://imgur.com/mLTMI9Q.jpg)

### 2.Load the VGG model
Load the weight model file for KeGG's pre-trained VGG-16.

In [2]:
from keras.applications.vgg16 import VGG16

model_vgg16 = VGG16()

### 3.Load and prepare the image

Next, we can load the image in and convert to the tensor specifications required by the pretraining network.

Keras provides some tools to help with this step.

First, we can load the image using the load_img () function and resize it to the size of 224x224 pixels.

In [3]:
from keras.preprocessing.image import load_img

# load image
img_file = 'cat.jpg'
image = load_img(img_file, target_size=(224, 224)) # Because the model input for VGG16 is 224x224

Next, we can convert the pixel to a NumPy array so that we can use it in Keras. We can use this img_to_array () function.

In [4]:
from keras.preprocessing.image import img_to_array

image = img_to_array(image) # RGB

print("image.shape:", image.shape)
print(type(image))

image.shape: (224, 224, 3)
<class 'numpy.ndarray'>


<br><br>
VGG16 networks expect single-color(gray) or multi-color imagery (rgb) as input; this means that the input array needs to be transformed into four dimensions:

(Image batch size, image height, image width, image color scale) -> (batch_size, img_height, img_width, img_channels)

We have only one sample (one image). We can resize the array by calling reshape () and add additional dimensions.

In [5]:
# Adjust the dimension of the tensor
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

print("image.shape:", image.shape)

image.shape: (1, 224, 224, 3)


<br><br>
Next, we need to preprocess the image in the same way that VGG trains ImageNet data. Specifically speaking, from the thesis:

> The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

Keras provides a function called preprocess_input () to prepare a new image input for the VGG network.

In [6]:
from keras.applications.vgg16 import preprocess_input

# Prepare the image of the VGG model
image = preprocess_input(image)

### 4.Make a prediction

We can call the predict () function in the model to predict the probability that the image will belong to 1000 known object types

In [7]:
# Probability of all output categories

y_pred = model_vgg16.predict(image)
y_pred.shape

(1, 1000)

### 5.Explain the prediction

Keras provides a function to explain the probability called decode_predictions ().

It can return a list of categories and the probability of each category, for the sake of simplicity, we will only show the first species of the highest probability.

In [8]:
from keras.applications.vgg16 import decode_predictions

# Convert probability to category label
label = decode_predictions(y_pred)

print(label)

# Retrieve the most likely result, such as the highest probability
label = label[0][0]

print("{} ({:.2f}%)".format(label[1], label[2]*100))

[[('n02124075', 'Egyptian_cat', 0.41167232), ('n02123045', 'tabby', 0.16184787), ('n02123159', 'tiger_cat', 0.14059816), ('n04589890', 'window_screen', 0.06338048), ('n04209239', 'shower_curtain', 0.014743416)]]
Egyptian_cat (41.17%)


## ResNet50

In [9]:
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input
from keras.applications.resnet50 import decode_predictions

# Load weight
model_resnet50 = ResNet50(weights='imagenet')

img_file = 'cat.jpg'
image = load_img(img_file, target_size=(224, 224)) 
image = img_to_array(image) # RGB
print("image.shape:", image.shape)

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
print("image.shape:", image.shape)

image = preprocess_input(image)

y_pred = model_resnet50.predict(image)
label = decode_predictions(y_pred)
print(label)

label = label[0][0]

print("{} ({:.2f}%)".format(label[1], label[2]*100))

image.shape: (224, 224, 3)
image.shape: (1, 224, 224, 3)
[[('n02124075', 'Egyptian_cat', 0.52090442), ('n02123045', 'tabby', 0.13889943), ('n02342885', 'hamster', 0.10467245), ('n02123159', 'tiger_cat', 0.079205178), ('n02127052', 'lynx', 0.018718716)]]
Egyptian_cat (52.09%)


## InceptionV3

In [10]:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.applications.inception_v3 import preprocess_input
from keras.applications.inception_v3 import decode_predictions

model_inception_v3 = InceptionV3(weights='imagenet')
img_file = 'cat.jpg'

# The input for the model of InceptionV3 is 299x299
image = load_img(img_file, target_size=(299, 299)) 
image = img_to_array(image) # RGB
print("image.shape:", image.shape)

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
print("image.shape:", image.shape)

image = preprocess_input(image)
y_pred = model_inception_v3.predict(image)
label = decode_predictions(y_pred)
print(label)

label = label[0][0]

print("{} ({:.2f}%)".format(label[1], label[2]*100))

image.shape: (299, 299, 3)
image.shape: (1, 299, 299, 3)
[[('n02123159', 'tiger_cat', 0.53256637), ('n02124075', 'Egyptian_cat', 0.25048947), ('n02123045', 'tabby', 0.12913629), ('n02127052', 'lynx', 0.011345633), ('n02971356', 'carton', 0.0025923138)]]
tiger_cat (53.26%)


## InceptionResNetV2

In [11]:
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.preprocessing import image
from keras.applications.inception_resnet_v2 import preprocess_input
from keras.applications.inception_resnet_v2 import decode_predictions

model_inception_resnet_v2 = InceptionResNetV2(weights='imagenet')
img_file = 'cat.jpg'

# The model for InceptionResNetV2 has an input of 299x299
image = load_img(img_file, target_size=(299, 299)) 

image = img_to_array(image) # RGB
print("image.shape:", image.shape)

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
print("image.shape:", image.shape)

image = preprocess_input(image)
y_pred = model_inception_resnet_v2.predict(image)
label = decode_predictions(y_pred)
print(label)

label = label[0][0]
print("{} ({:.2f}%)".format(label[1], label[2]*100))

image.shape: (299, 299, 3)
image.shape: (1, 299, 299, 3)
[[('n02123159', 'tiger_cat', 0.50081289), ('n02123045', 'tabby', 0.35746354), ('n02124075', 'Egyptian_cat', 0.061717406), ('n02127052', 'lynx', 0.00906057), ('n03657121', 'lens_cap', 0.00097865588)]]
tiger_cat (50.08%)


## MobileNet

In [12]:
from keras.applications.mobilenet import MobileNet
from keras.preprocessing import image
from keras.applications.mobilenet import preprocess_input
from keras.applications.mobilenet import decode_predictions

model_mobilenet = MobileNet(weights='imagenet')
img_file = 'cat.jpg'

image = load_img(img_file, target_size=(224, 224)) 
image = img_to_array(image) # RGB
print("image.shape:", image.shape)

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
print("image.shape:", image.shape)

image = preprocess_input(image)
y_pred = model_mobilenet.predict(image)

label = decode_predictions(y_pred)
print(label)

label = label[0][0]
print("{} ({:.2f}%)".format(label[1], label[2]*100))

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
image.shape: (224, 224, 3)
image.shape: (1, 224, 224, 3)
[[('n02123045', 'tabby', 0.34371877), ('n02124075', 'Egyptian_cat', 0.31817058), ('n02123159', 'tiger_cat', 0.26132542), ('n02127052', 'lynx', 0.015763907), ('n03657121', 'lens_cap', 0.01001161)]]
tabby (34.37%)


# Conclusion

Need to understand the structure and input tensors for each type of network identified by advanced image recognition

Understanding the amount of training variable and pre-training weights for different advanced image recognition networks can effectively help with image recognition type tasks

<br><br><br>
Reference:
* [How to Use The Pre-Trained VGG Model to Classify Objects in Photographs](https://machinelearningmastery.com/use-pre-trained-vgg-model-classify-objects-photographs/)
* [Keras Available models](https://keras.io/applications/)