# Transfer Learning
* Training neural network model_A on data related to 6 classes, 
* Then train another neural network model_B on remaining 2 classes.
* Pre-trained weights are used to tune last layer so as to classify these 2 classes (Transfer learning), 
* And compare results of 2 models - model_B (normal training) and model_B_on_A (transfer learning)

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

In [8]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

In [9]:
X_train_full = X_train_full[:30000]
y_train_full = y_train_full[:30000]

In [10]:
X_test = X_test[:5000]
y_test = y_test[:5000]

In [11]:
#Normalising
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

In [12]:
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]

In [13]:
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

**Why are we doing this?**

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (sneakers, ankle boots, coats, t-shirts, etc.) are somewhat similar to classes in set B (sandals and shirts). However, since we are using Dense layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image

In [23]:
def split_dataset(X, y):
    y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts
    y_A = y[~y_5_or_6]
    y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7
    y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?
    return ((X[~y_5_or_6], y_A), (X[y_5_or_6], y_B))

In [24]:
(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)

In [25]:
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)

In [26]:
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)

In [27]:
tf.random.set_seed(42)
np.random.seed(42)

In [43]:
model_A = keras.models.Sequential()
model_A.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_A.add(keras.layers.Dense(n_hidden, activation="selu"))
model_A.add(keras.layers.Dense(8, activation="softmax"))

In [49]:
model_A.compile(loss= "sparse_categorical_crossentropy",
    optimizer= keras.optimizers.SGD(learning_rate=1e-3),
    metrics=["accuracy"])

In [50]:
history = model_A.fit(X_train_A, y_train_A, epochs=5,
            validation_data=(X_valid_A, y_valid_A))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [51]:
model_A.save("my_model_A.h5")

In [54]:
# Build and Fit the Model B

In [55]:
model_B = keras.models.Sequential()
model_B.add(keras.layers.Flatten(input_shape=[28, 28]))
for n_hidden in (300, 100, 50, 50, 50):
    model_B.add(keras.layers.Dense(n_hidden, activation="selu"))
model_B.add(keras.layers.Dense(1, activation="softmax"))

In [56]:
model_B.compile(loss= "binary_crossentropy",
    optimizer= keras.optimizers.SGD(lr=1e-3),
    metrics=["accuracy"])

  "The `lr` argument is deprecated, use `learning_rate` instead.")


In [58]:
history = model_B.fit(X_train_B, y_train_B, epochs=5,
            validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Creating new model based on existing model A
check number of trainable parameters for previously trained model_B.

Create a new model model_B_on_A which has the pre-trained parameters of model_A but customized final dense layer with only 1 neuron.

Finally, Compare the performance of both the models - model_B and model_B_on_A

In [61]:
model_B.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 300)               235500    
_________________________________________________________________
dense_7 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_8 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_9 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_10 (Dense)             (None, 50)                2550      
_________________________________________________________________
dense_11 (Dense)             (None, 1)                

In [62]:
model_A_clone = keras.models.clone_model(model_A)

In [63]:
model_A_clone.set_weights(model_A.get_weights())

In [64]:
model_A_clone.get_weights()

[array([[ 0.02448623, -0.00877784, -0.0218912 , ..., -0.02766109,
          0.03859095, -0.06889368],
        [ 0.0047635 , -0.03105379, -0.05866694, ...,  0.00603038,
         -0.02763411, -0.04165162],
        [-0.06191033, -0.0690196 ,  0.07101057, ..., -0.04237805,
          0.07121388, -0.07332943],
        ...,
        [-0.03043905,  0.02141099, -0.05357783, ..., -0.0005746 ,
          0.00270009,  0.05603222],
        [ 0.07061882, -0.06976722,  0.07059094, ..., -0.00371876,
          0.00033509,  0.02873355],
        [-0.06023801,  0.01576762, -0.02584785, ..., -0.00528044,
          0.00270329, -0.067969  ]], dtype=float32),
 array([-4.23583295e-03,  4.98862588e-04, -5.68915624e-03, -1.54783111e-03,
        -1.88093865e-03, -6.74478943e-03,  4.95258765e-03, -4.44602128e-03,
        -1.30842754e-03,  3.82664357e-03, -3.83113627e-03, -4.57089022e-03,
         1.22007693e-03,  1.67986451e-04, -4.97296173e-03, -1.06554257e-03,
        -4.71460586e-03, -3.02621326e-03, -7.97565561e

In [65]:
model_B_on_A = keras.models.Sequential(model_A.layers[:-1])

In [66]:
model_B_on_A.add(keras.layers.Dense(1, activation="sigmoid"))

In [68]:
model_B_on_A.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                

In [69]:
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

In [70]:
model_B_on_A.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 300)               235500    
_________________________________________________________________
dense_1 (Dense)              (None, 100)               30100     
_________________________________________________________________
dense_2 (Dense)              (None, 50)                5050      
_________________________________________________________________
dense_3 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_4 (Dense)              (None, 50)                2550      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                

In [73]:
model_B_on_A.compile(loss="binary_crossentropy",
                    optimizer=keras.optimizers.SGD(learning_rate=1e-3),
                    metrics=["accuracy"])

In [74]:
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=5,
                   validation_data=(X_valid_B, y_valid_B))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Evaluating the Models

In [77]:
model_B.evaluate(X_test_B,y_test_B)



[0.03187718987464905, 0.49844881892204285]

In [78]:
model_B_on_A.evaluate(X_test_B,y_test_B)



[0.0845453068614006, 0.9906928539276123]

**Observation**:
* Accuracy of model_B: 0.4984
* Accuracy of model_B_on_A: 0.9907

* **Even if acuuracy was similar, performance/training time/resources of model_B_on_A, are vastly reduced when compared with model_B**