# Image Recognition Models
- VGG (University of Oxford) - very standard design
- ResNet-50 (Microsoft Research) - more complex design
- Inception v3 (Google) - even more complex design
- MobileNet (Google) - low resource usage for mobile devices
- NASNet (Google) - designed by algorithms

## Two Uses
- Use the **trained model directly to do image recognition**
- **Transfer Learning**: Adapt existing model to recoginize new types of objects instead of starting from strach

--------

# 1) Using Pre-trained network for image recognition

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
from tensorflow.keras.preprocessing import  image
from tensorflow.keras.applications import vgg16

In [19]:
file_path = '/content/drive/MyDrive/Deeplearning_image_recognition/'

In [None]:
# Load VGG 16 model that was pre-trained against ImageNet database
model = vgg16.VGG16()

# Load the image file, resizing it to 224x224 pixels (required by this VGG model)
img = image.load_img(file_path + 'bay.jpg', target_size=(224, 224))

# Convert the image to a numpy array
x = image.img_to_array(img)

# Add a forth dimension to the image (since Keras expects a bunch of images, not a single image)
x = np.expand_dims(x, axis=0)

# Normalize the input image's pixel values to the range used when training the neural network
# VGG16 has built in function, so we can use it
x = vgg16.preprocess_input(x)

# Run the image through the deep neural network to make a prediction
# the predictions we get for 1000 images, that model is trained to recognized
predictions = model.predict(x)

# Look up the names of the predicted classes. Index zero is the results for the first image.
# but we can set the parameter top to get how many top predictions that we want to get
predicted_classes = vgg16.decode_predictions(predictions, top=9)

print("Top predictions for this image:")

for imagenet_id, name, likelihood in predicted_classes[0]:
    print("Prediction: {} - {:2f}".format(name, likelihood))

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
Top predictions for this image:
Prediction: seashore - 0.395212
Prediction: promontory - 0.326130
Prediction: lakeside - 0.119613
Prediction: breakwater - 0.062801
Prediction: sandbar - 0.045267
Prediction: cliff - 0.011845
Prediction: dock - 0.009196
Prediction: boathouse - 0.003278
Prediction: valley - 0.003194


-------

# 2) Transfer Learning as an alternative to training a new neural network
- using a model trained on one set of data as a starting point for modeling a new set of data.

To understand how transfer learning works, let's take a look at how a convolutional neural network processes an image layer by layer. A typical convolutional neural network is structured like below. The network is made up of a series of convolutional layers and the training process teaches each of those layers to be activated when it sees certain patterns in the input image. Those layers learn to tell images apart by looking for those unique patterns.

![1.png](img/1.png)

For convolution layer 1, we can see that it is looking for very basic patterns,  like splotches of color and lines in an image. 

![2.png](img/2.png)

For convolution layer 2, the patterns are starting to get a little more complex.

![3.png](img/3.png)

![4.png](img/4.png)

![5.png](img/5.png)

![6.png](img/6.png)

---------

**The basic idea is that neural networks learn to detect simple patterns in the top layer, and then the next layer uses that information to look for slightly more complex patterns and so on, down through all the convolutional layers. But the final layer of the neural network is a densely connected layer that uses the information from the convolutional layers to decide which object is in the image.**


With transfer learning, we're gonna start with a neural network that's already been trained to recognize objects from a large dataset like ImageNet. 
- To reuse this neural network with new data, we can simply slice off the last layer. We'll keep all the layers that detect patterns, but remove the part that maps those patterns to specific objects.  We'll call this pre-trained neural network a feature extractor because we're using it to extract training features from images. 
- Next, we'll create a new neural network to replace the last layer in the original network. This is the only part that we'll have to train ourselves. 
- When we build our new image recognition system, we'll pass our new training images through the feature extractor and save the results for each training image to a file. 
- Then, we'll use those extracted features to train the new neural network. Since we're using the feature extractor to recognize shapes and patterns, our new neural network only has to learn to tell which patterns map to which objects. Since this new neural network isn't doing much work, it can learn to do it with a small amount of training data. And here's how we'll do predictions with transfer learning. 

![7.png](img/7.png)

![8.png](img/8.png)

![9.png](img/9.png)

![10.png](img/10.png)

![11.png](img/11.png)

When we wanna test the new image, we have to first pass it through the same feature extractor. Then we can use those extracted features as input to our newly-trained neural network, which will give us the final prediction. 

![12.png](img/12.png)


## 2.1) When to use Transfer Learning
- Always try it first, because it's quick !
- Very useful when you don't have a lot of training data but already have a model that sovles a similar problem.

Training a neural network from scratch is sort of like teaching a baby to read. The baby has to learn about letters and words and sentences before it can read and understand anything. Transfer learning is more like asking an adult that already knows how to read to learn something new. Since the adult already knows how to read, they need less material to learn a new topic. They don't need alphabet flashcards and spelling tests. The same basic idea applies to neural networks. If we only have a few hundred training images for our image recognition system, we don't have enough data to teach our model from scratch, so it makes sense to start with a model trained for something else and adapt it to our problem.

-----

# 3) Extracting features with a pre-trained neural network
- we gonna use pre-trained model to train dog images.
- then extract the features and save those features as file.

## VGG16 pre-trained model creation
- `weights`: dataset that we want to pre-trained on such as `imagenet` 
- `include_top`: `False` if we are using pre-trained model for feature extraction. False means we will chop of the last layer of neural network (in Keras terminology `top` means last layer). So by saying include_top=False, we told keras to give us neural network without the last layer attached.
- `input_shape`: image shape of image that we want to use. If we want to use larget image size, we can bump it up here.


In [2]:
from pathlib import Path
import numpy as np
import joblib

from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications import vgg16

In [3]:
# Path to folder with training data
folder_path = '/content/drive/MyDrive/Deeplearning_image_recognition/training_data/'

dog_path = Path(folder_path) / 'dogs'
not_dog_path = Path(folder_path) / 'not_dogs'

In [4]:
images = []
labels = []

# Load all the not dogs images
for i in not_dog_path.glob('*.png'):
  # load the image from disk
  img = image.load_img(i)

  # convert image into a numpy array
  img_array = image.img_to_array(img)

  # add the image to the list of images
  images.append(img_array)

  # for each 'not dog' image, the expected value should be 0
  labels.append(0)

# Load all dogs images
for i in dog_path.glob('*.png'):
  # load the image from disk
  img = image.load_img(i)

  # convert image to a numpy array
  img_array = image.img_to_array(img)

  # add the image to the list of images
  images.append(img_array)

  # for each 'dog' image, the expected value should be 1
  labels.append(1)

In [5]:
# Create a single numpy array with all the images we loaded
# Keras expected numpy array instead of normal python list
x_train = np.array(images)

# also Convert the labels to a numpy array
y_train = np.array(labels)

# Normalize image data to 0-1 range
x_train = vgg16.preprocess_input(x_train)

# Load a pre-trained neural network to use as a feature extractor
pretrained_nn = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(64, 64, 3))

# Extract features for each image (all in one pass)
features_x = pretrained_nn.predict(x_train)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


In [6]:
# Save the array of extracted features to a file
joblib.dump(features_x, 'x_train.dat')

# Save the matching array of expected values to a file
joblib.dump(y_train, 'y_train.dat')

['y_train.dat']

--------

# 4) Training a new neural network with extracted features
For the transfer learning, in our case we used VGG16 to extract features.  As a result,
- when loading data, we don't need to load Raw Data. Instead we will load **extracted features of X and y(labels)**.
- when we create a neural network, we don't need to use Convolution Layer anymore. **We only need to create the final Dense layer of the network and need to retrain.**
- As our problem is to predict dog or not (classificaiton problem), we will use binary_crossentropy.


In [11]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten

from pathlib import Path
import joblib

In [7]:
# Load data set
# Here instead of loading 
x_train = joblib.load('x_train.dat')
y_train = joblib.load('y_train.dat')

In [8]:
x_train.shape, y_train.shape

((58, 2, 2, 512), (58,))

In [9]:
x_train.shape[1:]

(2, 2, 512)

In [12]:
# Create a model and add layers
model = Sequential()

model.add(Flatten(input_shape=x_train.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(
    loss = 'binary_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

In [13]:
# Train the model
model.fit(
    x_train,
    y_train,
    epochs=10,
    shuffle=True
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fafc806f890>

In [14]:
# Save the neural network structure
model_structure = model.to_json()
f = Path('transfer_learning_model_structure.json')
f.write_text(model_structure)

# Save neural network's trained weights
model.save_weights('transfer_learning_model_weights.h5')


-------


# 5) Making predictions with transfer learning

In [15]:
from tensorflow.keras.models import model_from_json
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications import VGG16

import numpy as np
from pathlib import Path

In [16]:
# Load json file that contains model's structure
f = Path('transfer_learning_model_structure.json')
model_structure = f.read_text()

In [17]:
# Recreate keras model from json structure
model = model_from_json(model_structure)

# Reload model's trained weights
model.load_weights('transfer_learning_model_weights.h5')

In [39]:
# Load the image file to test, resize to 64x64 pixels (required by this model)
img = image.load_img(file_path + 'dog.png', target_size=(64, 64, 3))

# Convert image to numpy array
img = image.img_to_array(img)

# Add fourth dimension to the image (since Keras expects a bunch of images, not as single one)
images = np.expand_dims(img, axis=0)

# Normalize the data
images = vgg16.preprocess_input(images)

In [40]:
# Use the pre-trained neural network to extract features from our test image (the same way we did to train the model)
# Basically get the features of test image so that we can pass it to the model for final prediciton
feature_extraction_model = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(64, 64, 3))
features = feature_extraction_model.predict(images)



In [41]:
# Given the extracted features, make a final prediction using our own model
results = model.predict(features)


In [42]:
# Since we are only testing one image with possible class, we only need to check the first result's first element
single_result = results[0][0]

# Print the result
print("Likelihood that this image contains a dog: {}%".format(int(single_result * 100)))

Likelihood that this image contains a dog: 100%
