# Computer Vision and Transfer Learning

## CNN Explainer

CNN Explainer is a great learning utility which visually demonstrates how CNN Forward Propagation Works. 

[https://poloclub.github.io/cnn-explainer/](https://poloclub.github.io/cnn-explainer/)

## Trainable Parameters

- Input layer: Input layer has nothing to learn, at it’s core, what it does is just provide the input image’s shape. So no learnable parameters here. Thus number of parameters = 0.
- CONV layer: This is where CNN learns, so certainly we’ll have weight matrices. To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width m, height n, previous layer’s filters d and account for all such filters k in the current layer. Don’t forget the bias term for each of the filter. Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters). Where the term “filter” refer to the number of filters in the current layer.
- POOL layer: This has got no learnable parameters because all it does is calculate a specific number, no backprop learning involved! Thus number of parameters = 0.
- Fully Connected Layer (FC): This certainly has learnable parameters, matter of fact, in comparison to the other layers, this category of layers has the highest number of parameters, why? because, every neuron is connected to every other neuron! So, how to calculate the number of parameters here? You probably know, it is the product of the number of neurons in the current layer c and the number of neurons on the previous layer p and as always, do not forget the bias term. Thus number of parameters here are: ((current layer neurons c * previous layer neurons p)+1*c).

Lets take a simple network:
> model = Sequential()

> model.add(Conv2D(filters = 8, kernel_size = (5,5),padding = 'Same', 
>                 activation ='relu', input_shape = (28,28,1)))

> model.add(MaxPool2D(pool_size=(2,2)))

> model.add(Dropout(0.25))

> model.add(Conv2D(filters = 16, kernel_size = (3,3),padding = 'Same', 
>                 activation ='relu'))

> model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))

> model.add(Dropout(0.25))

> model.add(Flatten())

> model.add(Dense(256, activation = "relu"))

> model.add(Dropout(0.5))

> model.add(Dense(10, activation = "softmax"))

![Model](Model.jpg)

- Input Size: 28,28,1 (Gray Scale Image, hence one channel and image resolution is 28x28 pixels)
- Conv Layer1 - 8 Kernels, Kernel Size: 5,5, Stride = 1
    - ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)
    - (5 * 5 * 1 + 1) * 8 = 208
    - Output Shape: 28, 28, 8
- Pooling Layer - Pool Size: 2,2 No trainable params
    - Output Shape: 14, 14, 8
- Dropout Layer - No trainable params
    - Output Shape: 14, 14, 8 
- Conv Layer2 - 16 kernels, Kernel Size: 3,3, Stride = 1
    - (3 * 3 * 8 +1) * 16 = 1168
    - Output Shape: 14, 14, 16
- Pooling Layer - Pool Size: 2,2, Strides = 2,2
    - Output Shape: 7,7, 16
- Dropout:
    - Output Shape: 7,7, 16
- Flatten Layer - No trainable params
    - Output Shape: 784
- Dense Layer - 256 Neurons
    - 784 * 256 + 256 = 200704 + 256 = 200960
    - Output Shape: 256
- Dropout:
    - Output Shape: 256
- Dense Layer - 10 Neurons
    - 256 * 10 + 10 = 2570

Total Trainable Params = 208 + 1168 + 200960 + 2570 = 204906

## CNN Backpropogation

[https://becominghuman.ai/back-propagation-in-convolutional-neural-networks-intuition-and-code-714ef1c38199](https://becominghuman.ai/back-propagation-in-convolutional-neural-networks-intuition-and-code-714ef1c38199)

[https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c](https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c)

### Understand Chain Rule

![Chain Rule](ChainRule.png)

The forward pass on the left calculates z as a function f(x,y) using the input variables x and y. The right side of the figures shows the backward pass. Receiving dL/dz, the gradient of the loss function with respect to z from above, the gradients of x and y on the loss function can be calculate by applying the chain rule

Forward Pass:
![Forward Pass](CNNFPass.gif)

Backward Pass:
![Backward Pass](CNNBPass.gif)

## Transfer Learning

Computer Vision or Language Processing can take weeks to train a neural network on large datasets. Transfer learning is about leveraging feature representations from a pre-trained model, so you don’t have to train a new model from scratch. 

The pre-trained models are usually trained on massive datasets that are a standard benchmark in the computer vision frontier. The weights obtained from the models can be reused in other computer vision tasks. 

These models can be used directly in making predictions on new tasks or integrated into the process of training a new model. Including the pre-trained models in a new model leads to lower training time and lower generalization error.  

Transfer learning is particularly very useful when you have a small training dataset. In this case, you can, for example, use the weights from the pre-trained models to initialize the weights of the new model. As you will see later, transfer learning can also be applied to natural language processing problems. 

### Why
Given a new application, one looks at opportunities for re-using knowledge (e.g. architectures and weights) from similar learning problems which were trained with large amounts of data

Transfer Learning!

Humans are great at transfer learning
(e.g. Bicycle bike, Tennis, Badminton, Language skills)

### How it works
![Transfer Learning](TransferLearning.png)

## Sample Program

In [17]:
# example of loading the vgg16 model
from tensorflow.keras.applications.vgg16 import VGG16
# load model
vgg_model = VGG16()
# summarize the model
vgg_model.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_13 (InputLayer)       [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

In [18]:
# example of loading the inception v3 model
from tensorflow.keras.applications.inception_v3 import InceptionV3
# load model
inception_model = InceptionV3()
# summarize the model
inception_model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_14 (InputLayer)          [(None, 299, 299, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d_188 (Conv2D)            (None, 149, 149, 32  864         ['input_14[0][0]']               
                                )                                                                 
                                                                                                  
 batch_normalization_188 (Batch  (None, 149, 149, 32  96         ['conv2d_188[0][0]']             
 Normalization)                 )                                                      

In [19]:
# example of loading the resnet50 model
from tensorflow.keras.applications.resnet50 import ResNet50
# load model
resnet_model = ResNet50()
# summarize the model
resnet_model.summary()

Model: "resnet50"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_15 (InputLayer)          [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_15[0][0]']               
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                           

In [20]:
# example of using a pre-trained model as a classifier
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = VGG16()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))

Doberman (35.42%)


In [21]:
# example of using the vgg16 model as a feature extraction model
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model
from pickle import dump
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load model
model = VGG16()
# remove the output layer
model = Model(inputs=model.inputs, outputs=model.layers[-2].output)
# get extracted features
features = model.predict(image)
print(features.shape)
# save to file
dump(features, open('dog.pkl', 'wb'))

(1, 4096)


In [23]:
model.layers[-2].output

<KerasTensor: shape=(None, 4096) dtype=float32 (created by layer 'fc1')>

In [29]:
features[0][100:]

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

In [30]:
# example of tending the vgg16 model
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
# load model without classifier layers
model = VGG16(include_top=False, input_shape=(300, 300, 3))
# add new classifier layers
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(1024, activation='relu')(flat1)
output = Dense(10, activation='softmax')(class1)
# define new model
model = Model(inputs=model.inputs, outputs=output)
# summarize
model.summary()
# ...

Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_18 (InputLayer)       [(None, 300, 300, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 300, 300, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 300, 300, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 150, 150, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 150, 150, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 150, 150, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 75, 75, 128)       0   

In [31]:
# load model without classifier layers
model = VGG16(include_top=False, input_shape=(300, 300, 3))
# mark some layers as not trainable
model.get_layer('block1_conv1').trainable = False
model.get_layer('block1_conv2').trainable = False
model.get_layer('block2_conv1').trainable = False
model.get_layer('block2_conv2').trainable = False

In [32]:
model.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_19 (InputLayer)       [(None, 300, 300, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 300, 300, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 300, 300, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 150, 150, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 150, 150, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 150, 150, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 75, 75, 128)       0     