<a href="https://colab.research.google.com/github/naskinovai/TL/blob/main/transfer_learning_006.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Agenda:
- how to train an image classifier was tf.keras
- how to load and preprocess data
- what are the various options of model building with tf.keras
- how the model training is evaluated

What is image classification?
Image classification is the task where we train a model to classify it within an image belongs to one or multiple classes.

How to Train an Image Classifier

Step 1. Data Processing
Step 2. Model Building
Step 3. Train and evaluate

Transfer Learning Process
Transfer learning with a pre-trained Resnet50 ImageNet model.
The following data augmentations: Image resizing, random cropping, and
horizontal and vertical axis image flipping.
Fit one cycle method to optimise learning rate selection for our training.
Discriminative learning rates to fine-tune.

optimisations and regularisation techniques in our training:
Dropout
Weight decay
Batch normalisation
Average and Max-pooling
Adam Optimisers
ReLU Activations

Data Source
PatchCam (Kaggle)
PCam was prepared by Bas Veeling, a Phd student in machine learning for health from the Netherlands, specifically to help machine learning practitioners interested in working on this particular problem. It consists of 327,680, 96x96 colour images. An excellent overview of the dataset can be found here: http://basveeling.nl/posts/pcam/, and also available via download on github where there is further information on the data: https://github.com/basveeling/pcam

Each image is annoted with a binary label indicating presence of metastatic tissue.
Each image is of size [96, 96,3] (colored images) with a binary label of either 0 or 1.The data-set is having a train, validation and test split.

From the author’s words:
PCam packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST. Models can easily be trained on a single GPU in a couple hours, and achieve competitive scores in the Camelyon16 tasks of tumor detection and whole-slide image diagnosis. Furthermore, the balance between task-difficulty and tractability makes it a prime suspect for fundamental machine learning research on topics as active learning, model uncertainty, and explainability.

Loading the Data-set
To load the data-set the first thing we will need to do is import the necessary libraries. We will then use the tfds.load() to load (downloads and then load on the first time)our data-set while setting with _info and as_supervised to True.

the tfds.load() on line 9, setting with_info = True returns information about our data-set which is then stored in the variable we declared

as_supervised = True loads our data-set as a(image, label) tuple structure.

Our data-set consists of train, validation and test splits. line 12,13, 14 assigns each individual split in their respective variables

PREPROCESS THE DATA
Before feeding our data into the CNN it will have to go through some form of preprocessing.
Each pixels of the image in our data-set ranges from 0 to 255 which we will scale to between 0 and 1.
After then we batch and prefetch train, validation and test data. 
Prefetch will grab the next dataset item "in advance" while the current one is being processed to reduce latency. Since dataset operations are usually done on the CPU, the idea is that you can take advantage of the unused CPU power (and I/O bandwidth) while the GPU is processing a batch. Most dataset input pipelines should end with a call to prefetch. This allows later elements to be prepared while the current element is being processed. This often improves latency and throughput, at the cost of using additional memory to store prefetched elements.
It can be used to decouple the time when data is produced from the time when data is consumed. In particular, the transformation uses a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested.  

Transfer learning + Fine-tuning = Better Generalisation
Transfer learning alone brings us much further than training our network from scratch. But this method is prone to optimisation difficulties present between fragile co-adpated layers when connecting a per-trained network. We counter this by fine-tuning our model; making the all layers of our network, including the pre-trained Resnet50 layers, to be trainable. When we unfreeze we train across all of our layers. (See [6])

This leads to better results and a better ability to generalise to new examples.

In [1]:
#Transfer Learning with a Pretrained ConvNet

In [2]:
%tensorflow_version 2.x

In [3]:
import tensorflow as tf

In [4]:
tf.version.VERSION

'2.6.0'

In [5]:

import matplotlib.pyplot as plt
import numpy as np
import os

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models, layers, optimizers
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import tensorflow_datasets as tfds

# Set random seed
from numpy.random import seed
seed(1)
from tensorflow.random import set_seed
set_seed(1)

#Load data

In [6]:
from google.colab import drive

drive.mount('/content/gdrive')
root_path = '/content/gdrive/My Drive/Patch_Camelyon_tfds/'

data, info = tfds.load("patch_camelyon", with_info=True, data_dir=root_path, shuffle_files=True)

train_ds = data['train']
valid_ds = data['validation']
test_ds = data['test']

Mounted at /content/gdrive
[1mDownloading and preparing dataset patch_camelyon/2.0.0 (download: 7.48 GiB, generated: Unknown size, total: 7.48 GiB) to /content/gdrive/My Drive/Patch_Camelyon_tfds/patch_camelyon/2.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]






0 examples [00:00, ? examples/s]

Shuffling and writing examples to /content/gdrive/My Drive/Patch_Camelyon_tfds/patch_camelyon/2.0.0.incomplete028PH9/patch_camelyon-test.tfrecord


  0%|          | 0/32768 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /content/gdrive/My Drive/Patch_Camelyon_tfds/patch_camelyon/2.0.0.incomplete028PH9/patch_camelyon-train.tfrecord


  0%|          | 0/262144 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /content/gdrive/My Drive/Patch_Camelyon_tfds/patch_camelyon/2.0.0.incomplete028PH9/patch_camelyon-validation.tfrecord


  0%|          | 0/32768 [00:00<?, ? examples/s]

[1mDataset patch_camelyon downloaded and prepared to /content/gdrive/My Drive/Patch_Camelyon_tfds/patch_camelyon/2.0.0. Subsequent calls will reuse this data.[0m


In [7]:

# Need to split each instance of the dataset in image and label
def split_data(data):
  return data["image"], data["label"]

BATCH_SIZE = 32
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.map(split_data, num_parallel_calls=AUTOTUNE).shuffle(1024).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
valid_ds = valid_ds.map(split_data, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE).cache().prefetch(buffer_size=AUTOTUNE)
test_ds = test_ds.map(split_data, num_parallel_calls=AUTOTUNE).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)

# Small portion of the dataset used during the inital tests
# small_train_ds = train_ds.map(split_data, num_parallel_calls=AUTOTUNE).take(1024).cache().shuffle(1024).batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
# small_valid_ds = valid_ds.map(split_data, num_parallel_calls=AUTOTUNE).take(1024).batch(BATCH_SIZE).cache().prefetch(buffer_size=AUTOTUNE)

#Visualize data

In [8]:
# !pip install --upgrade tensorflow-datasets
# fig = tfds.show_examples(train_ds, info)

In [9]:
print(info)

tfds.core.DatasetInfo(
    name='patch_camelyon',
    version=2.0.0,
    description='The PatchCamelyon benchmark is a new and challenging image classification
dataset. It consists of 327.680 color images (96 x 96px) extracted from
histopathologic scans of lymph node sections. Each image is annoted with a
binary label indicating presence of metastatic tissue. PCam provides a new
benchmark for machine learning models: bigger than CIFAR10, smaller than
Imagenet, trainable on a single GPU.',
    homepage='https://patchcamelyon.grand-challenge.org/',
    features=FeaturesDict({
        'id': Text(shape=(), dtype=tf.string),
        'image': Image(shape=(96, 96, 3), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
    }),
    total_num_examples=327680,
    splits={
        'test': 32768,
        'train': 262144,
        'validation': 32768,
    },
    supervised_keys=('image', 'label'),
    citation="""@misc{b_s_veeling_j_linmans_j_winkens_t_cohen_2018_

In [11]:
# Show 16 examples of images with their labels
i = 0
plt.figure(figsize=(15,15))
for data in train_ds.take(16):
  img = data["image"]
  label = data["label"]
  i += 1
  plt.subplot(4,4,i)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(img, cmap=plt.cm.binary)
  plt.title(label.numpy())
plt.show()

TypeError: ignored

<Figure size 1080x1080 with 0 Axes>

#Data Preprocessing


In [None]:

data_augmentation = tf.keras.Sequential([
  layers.experimental.preprocessing.RandomZoom((-0.125,0), fill_mode='reflect'),
  layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
  layers.experimental.preprocessing.RandomRotation((-0.5,0.5)),
  layers.experimental.preprocessing.RandomContrast(0.4)
], name="data_augmentation")

def random_brightness_layer(factor=0.4):
  return layers.Lambda(lambda x: tf.image.random_brightness(x, factor))

def random_hue_layer(factor=0.2):
  return layers.Lambda(lambda x: tf.image.random_hue(x, factor))

def random_saturation_layer(lower=0.8, upper=1.2):
  return layers.Lambda(lambda x: tf.image.random_saturation(x, lower, upper))

random_brightness_layer = random_brightness_layer()
random_hue_layer = random_hue_layer()
random_saturation_layer = random_saturation_layer()

In [None]:
def show(image, label):
  plt.figure()
  plt.imshow(image)
  plt.title(int(label.numpy()))
  plt.axis('off')

# Show examples of images after applying data augmentation
for image, label in train_ds.take(2):
  image = layers.experimental.preprocessing.Rescaling(1./255)(image)
  image = random_brightness_layer(image)
  image = random_hue_layer(image)
  image = random_saturation_layer(image)
  image = data_augmentation(image)
  show(image[0], label[0])
  print(image[0].dtype)

#Data pipeline

Our first layer consist of 256 number of filters each of size 3 (i.e 3 x 3) and our input_shape variable also goes to the first layer
The next layer consist of maxpooling with a pool size of 2
The same was repeated, with an increase in the number of filters 256,512, 1024
for the layers having (padding = same)each image will be padded
Kernel_initializer is use to randomness weight at the start of the training
activation function used was ‘relu’
flatten layer flattens the shape of our images before passing on to the Dense layer
We have three Dense hidden layer with hidden units of 1024, 512, 128
We expects our model to return a single output i.e 0 or 1 so our output layer contains just 1 hidden unit with a ‘sigmoid’ activation function

Adam optimizer was used with a learning rate of 0.0001
since our labels are already encoded to 0’s and 1’s we are going to use ‘binary_crossentropy’ has our loss function
Early stopping callback was also implemented to monitor our val_loss and to stop the model if there is no increase in val_loss for 10 epochs
model.fit() will start the training of our model

In [None]:
model_base = tf.keras.Sequential([
  tf.keras.Input((96, 96, 3)),
  layers.experimental.preprocessing.Rescaling(1./255),
  data_augmentation,
  random_brightness_layer,
  random_hue_layer,
  random_saturation_layer,

  layers.Conv2D(16, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.Conv2D(16, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

  layers.Conv2D(32, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.Conv2D(32, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

  layers.Conv2D(64, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.Conv2D(64, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

  layers.Flatten(),
  layers.Dense(256, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.Dropout(0.5),
  layers.Dense(128, activation='relu', kernel_regularizer=keras.regularizers.l2(0.0001)),
  layers.Dropout(0.5),
  layers.Dense(1, activation='sigmoid')
])

model_base.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

model_base.summary()

In [None]:

# Set name of the path where to save the model
folder = os.path.join(root_path,'models_NN',"PCam_NN_Final2")
savename = "PCam_NN_Final2_lr0_0005_L20_0001_30epochs.h5"

In [None]:

epochs=30
callbacks = [ModelCheckpoint(os.path.join(folder,savename), save_best_only=True, verbose=1)]

history = model_base.fit(
  train_ds,
  validation_data = valid_ds,
  epochs = epochs,
  callbacks = callbacks
)

#Load Model

In [None]:
model = models.load_model(os.path.join(folder,savename))

In [None]:
os.path.join(folder,savename)

In [None]:
model.summary()

In [None]:
test_loss, test_accuracy = model.evaluate(test_ds)

In [None]:

#Evaluate and plot the graphs of accuracy and loss.
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure()
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')
plt.xlabel('epoch')
plt.grid()
plt.savefig(os.path.join(folder,"accuracy.png"))

plt.figure()
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
# plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.savefig(os.path.join(folder,"loss.png"))
plt.grid()
plt.show()

# Save some info about the best model
f = open(os.path.join(folder,"info.txt"), "w")
best_epoch = np.argmin(val_loss)
f.writelines("Best epoch: " + str(best_epoch + 1) + "\n")
f.write("Training accuracy: " + str(acc[best_epoch]) + "\n")
f.write("Validation accuracy: " + str(val_acc[best_epoch]) + "\n")
f.write("Training loss: " + str(loss[best_epoch]) + "\n")
f.write("Validation loss: " + str(val_loss[best_epoch]) + "\n\n")
f.write("Test accuracy: " + str(test_accuracy) + "\n")
f.write("Test loss: " + str(test_loss) + "\n")
f.close()

In [None]:
import IPython
!pip install -q -U keras-tuner
import kerastuner as kt

In [None]:
# Define hypermodel
def model_builder(hp):
  hp_l2_regularizer = hp.Choice('l2_regularizer', values = [1e-2, 1e-3, 1e-4]) 
  hp_units = hp.Choice('units', values = [32, 64, 128, 256, 512])
  hp_learning_rate = hp.Choice('learning_rate', values = [1e-2, 5e-3, 1e-3, 5e-4, 1e-4]) 

  model = tf.keras.Sequential([
    tf.keras.Input((96, 96, 3)),
    layers.experimental.preprocessing.Rescaling(1./255),
    data_augmentation,
    random_brightness_layer,
    random_hue_layer,
    random_saturation_layer,

    layers.Conv2D(16, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.Conv2D(16, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

    layers.Conv2D(32, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.Conv2D(32, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

    layers.Conv2D(64, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.Conv2D(64, 3, padding='valid', activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.MaxPooling2D(pool_size=(2,2), strides=(2,2)),

    layers.Flatten(),
    layers.Dense(hp_units, activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.Dropout(0.5),
    layers.Dense(hp_units, activation='relu', kernel_regularizer=keras.regularizers.l2(hp_l2_regularizer)),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
  ])

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss=tf.keras.losses.BinaryCrossentropy(),
                metrics=['accuracy'])

  return model

In [None]:

# Search for the best hyperparameters using hyperband
tuner = kt.Hyperband(model_builder,
                     objective = 'val_loss', 
                     max_epochs = 10)   

class ClearTrainingOutput(tf.keras.callbacks.Callback):
  def on_train_end(*args, **kwargs):
    IPython.display.clear_output(wait = True)

tuner.search(train_ds, epochs = 10, validation_data = valid_ds, callbacks = [ClearTrainingOutput()])

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials = 1)[0]

print(f"""
Units: {best_hps.get('units')} \n
Optimal learning rate for the optimizer: {best_hps.get('learning_rate')} \n
L2 regularization factor: {best_hps.get('l2_regularizer')}
""")

#Transfer Learning


In [None]:
from keras.applications.nasnet import preprocess_input
from keras.callbacks import ModelCheckpoint
from keras.applications.nasnet import NASNetMobile
from keras.layers import Dense, Input, Dropout, Concatenate, GlobalMaxPooling2D, GlobalAveragePooling2D
from keras.layers import Flatten
from keras.losses import binary_crossentropy
from keras.models import Model
from keras.optimizers import Adam

In [None]:
def get_model_classif_nasnet():
    inputs = Input((96, 96, 3))
    base_model = NASNetMobile(include_top=False, input_shape=(96, 96, 3))#, weights=None
    x = base_model(inputs)
    out1 = GlobalMaxPooling2D()(x)
    out2 = GlobalAveragePooling2D()(x)
    out3 = Flatten()(x)
    out = Concatenate(axis=-1)([out1, out2, out3])
    out = Dropout(0.5)(out)
    out = Dense(1, activation="sigmoid", name="3_")(out)
    model = Model(inputs, out)
    model.compile(optimizer=Adam(0.0001), loss=binary_crossentropy, metrics=['acc'])
    model.summary()

    return model
model = get_model_classif_nasnet()

In [None]:
batch_size=32
root_path = '/content/gdrive/My Drive/Patch_Camelyon_mdls/'
h5_path = "model_nasnet.h5"
checkpoint = ModelCheckpoint(f'{root_path}{h5_path}', monitor='val_acc', verbose=1, save_best_only=True, mode='max')

history = model.fit(
    train_ds,
    validation_data = valid_ds,
    epochs=2, verbose=1,
    callbacks=[checkpoint])
batch_size=64
history = model.fit(
    train_ds,
    validation_data = valid_ds,
    epochs=6, verbose=1,
    callbacks=[checkpoint])

model.load_weights(h5_path)