<a href="https://colab.research.google.com/github/esamalqudah/CNN_CIFAR_10/blob/main/Transfer_Learning_CNN_(4)_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transfer Learning using MNIST data
To demonstrate transfer learning, a CNN will learn to identify digits 5,6,7, 8, and 9 and only the last layer will be trained to identify digits 0,1,2,3, and 4. This will determine how well training on the set 5-9 will help with identifying the set 0-4.




In [None]:
import datetime
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

In [None]:
# Used to help timing functions
now = datetime.datetime.now

In [None]:
# Set parameters
batch_size = 128
num_classes = 5
epochs = 5

In [None]:
# More parameters
img_rows, img_cols = 28, 28
filters = 32
pool_size = 2
kernel_size = 3

In [None]:
## As input, function takes a model, training set, test set, and the number of classes
## Inside the model object will be the state about which layers to freeze and which to train
# Reshape data
# Normalize data
# One hot encode the target label
# Compile model
# Train model on the training data
# Evaluate model on the testing data

def train_model(model, train, test, num_classes, input_shape):
  x_train = train[0].reshape((train[0].shape[0],) + input_shape)
  x_test = test[0].reshape((test[0].shape[0],) + input_shape)
  x_train = x_train.astype('float32')
  x_test = x_test.astype('float32')
  x_train /= 255
  x_test /= 255
  print('x_train shape:', x_train.shape)
  print(x_train.shape[0], 'train samples')
  print(x_test.shape[0], 'test samples')

  #convert class vectors to binary class matrices
  y_train = keras.utils.to_categorical(train[1], num_classes)
  y_test = keras.utils.to_categorical(test[1], num_classes)

  model.compile(loss='categorical_crossentropy',
                optimizer='adadelta',
                metrics=['accuracy'])

  t =now ()
  model.fit(x_train, y_train,
            batch_size=batch_size,
            epochs=epochs,
            verbose=1,
            validation_data=(x_test,y_test))
  print('Training time; %s' % (now() - t))

  score = model.evaluate(x_test, y_test, verbose=0)
  print('Test score: ',score[0])
  print('Test accuracy ', score[1])

In [None]:
# Load the Mnist data and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Create two datasets: one with digits below 5 and one with 5 and above
x_train_lt5 = x_train[y_train < 5]
y_train_lt5 = y_train[y_train < 5]
x_test_lt5 = x_test[y_test < 5]
y_test_lt5 = y_test[y_test < 5]

x_train_gte5 = x_train[y_train >= 5]
y_train_gte5 = y_train[y_train >= 5] - 5
x_test_gte5 = x_test[y_test >= 5]
y_test_gte5 = y_test[y_test >= 5] - 5


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
# Define "feature" layers
# Add convolution layers
# Add max pool layer
# Add dropout layer
# Add flatten layer
# Early layers may transfer, this will be frozen later


input_shape = [28,28,1]
feature_layers = [
    Conv2D(filters, kernel_size,
           padding='valid',
           input_shape = input_shape),
    Activation('relu'),
    Conv2D(filters, kernel_size),
    Activation('relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(0.25),
    Flatten(),

]

In [None]:
# Define classificaiton layers
# Add dense layer
# This layer will predict classes from the features learned by previous layer. This part of the model will need to be retrained.

classification_layers = [
    Dense(128),
    Activation('relu'), Dropout(0.5), Dense(num_classes), Activation('softmax')

]

In [None]:
# Create model by combining both layers:
model = Sequential(feature_layers + classification_layers)

In [None]:
# Print model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_2 (Activation)   (None, 26, 26, 32)        0         
                                                                 
 conv2d_3 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_3 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 12, 12, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

In [None]:
# Train model on the digits 5,6,7,8,9
train_model(model, (x_train_gte5, y_train_gte5), (x_test_gte5, y_test_gte5), num_classes, input_shape=(28, 28, 1))


x_train shape: (29404, 28, 28, 1)
29404 train samples
4861 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time; 0:05:23.853652
Test score:  1.4260905981063843
Test accuracy  0.7899609208106995


### Freeze Layers


In [None]:
# Freeze only the feature layers
for l in feature_layers:
  l.trainable = False

Observe below the differences between the number of *total params*, *trainable params*, and *non-trainable params*.

In [None]:
# print model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 activation_2 (Activation)   (None, 26, 26, 32)        0         
                                                                 
 conv2d_3 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 activation_3 (Activation)   (None, 24, 24, 32)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 12, 12, 32)       0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 12, 12, 32)        0         
                                                        

In [None]:
# Train model on the digits 0,1,2,3,4
train_model(model, (x_train_lt5, y_train_lt5), (x_test_lt5, y_test_lt5), num_classes, input_shape=(28, 28, 1))

x_train shape: (30596, 28, 28, 1)
30596 train samples
5139 test samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training time; 0:02:22.724829
Test score:  1.3185744285583496
Test accuracy  0.7581241726875305


Note that results on classifying 0-4 are comparable to those achieved on 5-9 after 5 full epochs.  This is despite the fact that the last layer of the network is only getting fine tuned, and all the early layers have never seen what the digits 0-4 look like.

