This is the second part of NN-Training Breakout Activity (LP24)

_This notebook contains code from chapter 11 (**Training Deep Neural Networks**) of the texbook._

## Tackling training problems

In this activity, you will be experimenting with:

*   **Applying transfer learning**



Run the set-up code cell below.

In [None]:
import sys
assert sys.version_info >= (3, 7)

from packaging import version
import tensorflow as tf
assert version.parse(tf.__version__) >= version.parse("2.8.0")
import matplotlib.pyplot as plt
import numpy as np

tf.random.set_seed(42)

**Let's load the fashion-MNIST dataset for our transfer learning experiments**

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

pixel_means = X_train.mean(axis=0, keepdims=True)
pixel_stds = X_train.std(axis=0, keepdims=True)
X_train_scaled = (X_train - pixel_means) / pixel_stds
X_valid_scaled = (X_valid - pixel_means) / pixel_stds
X_test_scaled = (X_test - pixel_means) / pixel_stds

## Transfer learning

Transfer learning in the context of neural networks refers to a technique where a model trained on one task or dataset is reused or adapted as the starting point for a new model trained on a different but related task or dataset. Instead of starting the training of a neural network model from scratch, transfer learning allows leveraging knowledge gained from solving one problem and applying it to a different but related problem.

The basic idea behind transfer learning is that features learned by a model while solving one task can be valuable for solving a different task, especially if the tasks share some underlying structure or patterns.

In the figure below, the top model is trained on the large ImageNet dataset and later finetuned for a separate binary classification task. The assumption is that the shapes/patterns learned by the network would be also useful for the second task.

![Transfer Learning](https://www.mdpi.com/sensors/sensors-23-00570/article_deploy/html/images/sensors-23-00570-g001-550.jpg)

To demonstrate the idea summarized above, we will split fashion-MNIST into two tasks:

*   Task A: All fashion-MNIST except T-shirts/tops and pullovers (8 classes)
*   Task B: Only T-shirts/tops and pullovers (2 classes) and only 200 images for each class



* `X_train_A`: all images of all items except for T-shirts/tops and pullovers (classes 0 and 2).
* `X_train_B`: a much smaller training set of just the first 200 images of T-shirts/tops and pullovers.

The validation set and the test set are also split this way, but without restricting the number of images.

We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (trousers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots) are somewhat similar to classes in set B (T-shirts/tops and pullovers). However, since we are using `Dense` layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the chapter 14).

**Task B:** only 400 images are segmented, we want to train with this small labeled dataset and test with the whole test data in fashion-MNIST (for these 2 classes of course)

**Preparing data for Task A and Task B**

In [None]:
# Split Fashion MNIST into tasks A and B, then train and save
#              model A to "my_model_A".

pos_class_id = class_names.index("Pullover")
neg_class_id = class_names.index("T-shirt/top")

def split_dataset(X, y):
    y_for_B = (y == pos_class_id) | (y == neg_class_id)
    y_A = y[~y_for_B]
    y_B = (y[y_for_B] == pos_class_id).astype(np.float32)
    old_class_ids = list(set(range(10)) - set([neg_class_id, pos_class_id]))
    for old_class_id, new_class_id in zip(old_class_ids, range(8)):
        y_A[y_A == old_class_id] = new_class_id  # reorder class ids for A
    return ((X[~y_for_B], y_A), (X[y_for_B], y_B))

(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)
(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)
(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)

# Restricting train data in task B to just 200 samples
X_train_B = X_train_B[:200]
y_train_B = y_train_B[:200]



**Defining and training a model for task A, saving the trained model in a file**

In [None]:
tf.random.set_seed(42)

model_A = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(8, activation="softmax")
])

model_A.compile(loss="sparse_categorical_crossentropy",
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])
history = model_A.fit(X_train_A, y_train_A, epochs=20,
                      validation_data=(X_valid_A, y_valid_A))
model_A.save("my_model_A")

**We can first try to handle task B from scratch without applying transfer learning. Implemented below**

In [None]:
tf.random.set_seed(42)
model_B = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=[28, 28]),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(100, activation="relu",
                          kernel_initializer="he_normal"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model_B.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
                metrics=["accuracy"])
history = model_B.fit(X_train_B, y_train_B, epochs=20,
                      validation_data=(X_valid_B, y_valid_B))

print('--------------------------')
model_B_perf = model_B.evaluate(X_test_B, y_test_B)
print('Accuracy on test data:',model_B_perf[1])

Note down the performance you obtained. We will now apply transfer learning and compare.

**Transfer learning steps:**
*   Loading model trained on task A
*   Removing the output layer (8 classes)
*   Adding a new output layer (2 classes)
*   Train this model with data for task B (400 samples only)

We will test this model with all the test samples in these 2 classes in fashion-MNIST, not just a subset of the 400.

Note that `model_B_on_A` and `model_A` actually share layers now, so when we train one, it will update both models. If we want to avoid that, we need to build `model_B_on_A` on top of a *clone* of `model_A`:

In [None]:
# Load model from file
model_A = tf.keras.models.load_model("my_model_A")
# Make a clone not to destroy model A
model_A_clone = tf.keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())
# Create model B from model A, just removing the output layer
model_B_on_A = tf.keras.Sequential(model_A_clone.layers[:-1])
# Adding a new output layer
tf.random.set_seed(42)
model_B_on_A.add(tf.keras.layers.Dense(1, activation="sigmoid"))

In [None]:
# You often prefer to freeze the first layers, just train the last layer first
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = False

# Compiling and running training
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
model_B_on_A.compile(loss="binary_crossentropy", optimizer=optimizer,
                     metrics=["accuracy"])

history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,
                           validation_data=(X_valid_B, y_valid_B))

In [None]:
# You may like to reset all layers to trainable again as below
for layer in model_B_on_A.layers[:-1]:
    layer.trainable = True

optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
model_B_on_A.compile(loss="binary_crossentropy", optimizer=optimizer,
                     metrics=["accuracy"])
history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16,
                           validation_data=(X_valid_B, y_valid_B))

So, what's the performance after applying transfer learning?

In [None]:
model_B_on_A_perf = model_B_on_A.evaluate(X_test_B, y_test_B)
print('Accuracy on test data:',model_B_on_A_perf[1])

**Task 1:** Compute the amount of improvement in terms of percentage reduction in error.

**Final task:**
When you finish, go back to the first notebook, put a summary of these experiements at the end of that notebook. It is sufficient that you only upload the first notebook.