### Transfer Learning
Transfer Learning refers to utilising an already pre-trained model, modifying it to suit a different custom task. 

A base model is frozen, modify the last (top) few layers called the 'head' of the model, and train on it.

- Large & different dataset: train the whole model
- Large & similar dataset: do Fine Tuning
- Small & different dataset: do Fine Tuning
- Small & similar dataset: Transfer Learning

The base model's lower levels has lower level features which are mostly general.

### Fine Tuning
Fine Tuning is used when the dataset is larger, and the base model is not entirely frozen to allow the model to learn information about the new task.
- Should not use Fine Tuning on the whole network; only a few top layers are enough. Fine Tuning is meant to adopt that specific part of the network for our dataset.
- Do it only after the transfer learning step is completed, or else the gradients will have a lot of differences between the custom head layer and a few of the unfrozen layers from the base model.

In [1]:
# Run this cell to download the dataset - zip file (66,999 KB) will be downloaded into a folder 'dataset'

import requests, zipfile
from pathlib import Path

dataset_url = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'

Path('./dataset').mkdir(exist_ok=True, parents=True)

if (Path('./dataset/cats_and_dogs_filtered').exists()):
    print('Dataset available and extracted.')
else:
    print('Dataset not available.\nDownloading...')

    with open(Path('./dataset/cats_and_dogs_filtered.zip'), mode='wb') as f:
        f.write(requests.get(dataset_url).content)
        
    print('Extracting...')
    with zipfile.ZipFile(file=Path('./dataset/cats_and_dogs_filtered.zip'), mode='r') as f:
        f.extractall(Path('./dataset/'))

if (Path('./dataset/cats_and_dogs_filtered.zip').exists()):    
    Path('./dataset/cats_and_dogs_filtered.zip').unlink()
    print('Deleted .zip file')

print('OK GO JER')

Dataset available and extracted.
OK GO JER


In [2]:
import numpy as np
import matplotlib.pyplot as plt

from tqdm import tqdm_notebook
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [3]:
train_dir = Path('./dataset/cats_and_dogs_filtered/train')
validation_dir = Path('./dataset/cats_and_dogs_filtered/validation')

#### Building the Model
We load a pre-trained model - MobileNetV2

In [4]:
from tensorflow.keras.applications import MobileNetV2

# MobileNet supports (96, 96), (128, 128), (160, 160), (192, 192), (224, 224)
IMG_SHAPE = (128, 128, 3)
base_model = MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')

base_model.summary()

Model: "mobilenetv2_1.00_128"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 128, 3) 0                                            
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D)       (None, 129, 129, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 64, 64, 32)   864         Conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 64, 64, 32)   128         Conv1[0][0]                      
_______________________________________________________________________________

In [5]:
base_model.output

<tf.Tensor 'out_relu/Identity:0' shape=(None, 4, 4, 1280) dtype=float32>

#### Defining the Custom Head

In [6]:
from tensorflow.keras.layers import GlobalAveragePooling2D

global_average_layer = GlobalAveragePooling2D()(base_model.output)
global_average_layer

<tf.Tensor 'global_average_pooling2d/Identity:0' shape=(None, 1280) dtype=float32>

In [7]:
from tensorflow.keras.layers import Dense

# output layer should have the same amount of units as number of classes in dataset; 2 classes == binary classification
prediction_layer = Dense(units=1, activation='sigmoid')(global_average_layer)

In [8]:
from tensorflow.keras.models import Model

model = Model(inputs=base_model.input, outputs=prediction_layer)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 128, 3) 0                                            
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D)       (None, 129, 129, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 64, 64, 32)   864         Conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 64, 64, 32)   128         Conv1[0][0]                      
______________________________________________________________________________________________

#### Defining the Model
We combine the two networks - the base model MobileNetV2 and the custom head prediction layer

In [9]:
from tensorflow.keras.optimizers import RMSprop

# manually set learning rate
model.compile(optimizer=RMSprop(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 128, 128, 3) 0                                            
__________________________________________________________________________________________________
Conv1_pad (ZeroPadding2D)       (None, 129, 129, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 64, 64, 32)   864         Conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 64, 64, 32)   128         Conv1[0][0]                      
______________________________________________________________________________________________

#### Preprocess data

In [10]:
data_gen_train = ImageDataGenerator(rescale=1/255.0)
data_gen_valid = ImageDataGenerator(rescale=1/255.0)

train_generator = data_gen_train.flow_from_directory(train_dir, target_size=(128, 128), batch_size=128, class_mode='binary')
valid_generator = data_gen_valid.flow_from_directory(validation_dir, target_size=(128, 128), batch_size=128, class_mode='binary')

from tensorflow.keras.callbacks import EarlyStopping

callbacks = [
    EarlyStopping(patience=3),
]
model.fit(train_generator, epochs=25, validation_data=valid_generator, callbacks=callbacks)

Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
  ...
    to  
  ['...']
  ...
    to  
  ['...']
Train for 16 steps, validate for 8 steps
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25


<tensorflow.python.keras.callbacks.History at 0x2a2d3faef98>

The validation loss seems to be increasing, indicating overfitting. Early Stopping can be implemented.

In [11]:
valid_loss, valid_accuracy = model.evaluate(valid_generator)

  ...
    to  
  ['...']


In [12]:
print(f'Validation accuracy: {valid_accuracy}')

Validation accuracy: 0.9750000238418579


### Fine Tuning

In [13]:
# Unfreeze the top few layers
base_model.trainable = True

print(f'No. of layers in the base model: {len(base_model.layers)}')

No. of layers in the base model: 155


In [14]:
# Fine tune layers after 100 (100-155)
for layer in base_model.layers[:100]:
    layer.trainable = False # Freeze layers 0-99

In [15]:
# Recompile the model 
model.compile(optimizer=RMSprop(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy'])

callbacks = [
    EarlyStopping(patience=3),
]
model.fit(train_generator, epochs=25, validation_data=valid_generator, callbacks=callbacks)

  ...
    to  
  ['...']
  ...
    to  
  ['...']
Train for 16 steps, validate for 8 steps
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25


<tensorflow.python.keras.callbacks.History at 0x2a2d2a28b70>

In [16]:
valid_loss, valid_accuracy = model.evaluate(valid_generator)

print(f'Validation accuracy after fine tuning: {valid_accuracy}')

  ...
    to  
  ['...']
Validation accuracy after fine tuning: 0.9700000286102295


Model probably overfitted on dataset; fine tuning should be done on larger datasets, or datasets with greater differences