# Lecture Notes: Transfer Learning

## I. Introduction and Definition

Transfer Learning is a machine learning technique used to solve problems. It involves training a model on one dataset and then applying that model to run on a second, different dataset.

*   **Core Concept:** Transfer Learning focuses on **storing knowledge** gained while solving a previous problem and applying that knowledge to solve a **different but related problem**.
*   **Significance:** This method has become highly famous in the last five years. Leading figures in deep learning suggest that Transfer Learning **"will be the next big thing"** and is projected to advance machine learning in the industry beyond supervised learning.
*   **Inspiration:** Transfer Learning is inspired by real-life learning, where knowledge from one domain is applied to another (e.g., balancing skills learned on a bicycle making it easier to learn a motorcycle).

## II. Motivation: Why Use Transfer Learning?

Training custom Deep Learning (DL) models presents significant challenges, making Transfer Learning an appealing solution.

1.  **Data Hunger:** DL models are inherently **data hungry**, requiring a large amount of labeled data (e.g., 10,000 images) to train a custom model.
2.  **Cost and Effort of Labeling:** Acquiring large amounts of images is possible (e.g., by scraping), but manually labeling that data (e.g., identifying whether an image contains a cat or a dog) requires manual labor, is difficult, and is **costly**.
3.  **Time Required for Training:** Training DL models, especially on big datasets, takes a **lot of time**.

**Solution:** If training a custom DL model is difficult due to these issues, the common solution is to use **pre-trained models**.

## III. Pre-Trained Models and ImageNet

The foundation of Transfer Learning relies on models already trained on massive datasets.

*   **Definition:** A pre-trained model is a Convolutional Neural Network (CNN) model that has been trained on a large dataset.
*   **ImageNet:** A very large dataset of daily objects and animals used for training these models. It contains approximately **1.4 million images** belonging to **1000 classes** (including different dog and cat breeds).
*   **Competition:** ImageNet hosted a competition called ILSVRC (ImageNet Scale Visual Recognition Challenge). Models that won this competition (like VGG, ResNet, and InceptionNet) are highly effective and can be used in other projects.
*   **The Problem Transfer Learning Solves:** While pre-trained models are useful, they may not know the specific classes required for a new project (e.g., ImageNet's 1000 classes may not include "phone" or "tablet" needed for a specific classification task). Transfer Learning addresses how to leverage the pre-trained knowledge even when the new task involves unknown classes.

## IV. How Knowledge Transfer Works in CNNs

Transfer learning applies the knowledge learned from a previous task to a new task.

*   **CNN Architecture Breakdown:** A pre-trained CNN model (like VGG16) is typically divided into two main parts:
    1.  **Convolutional Base (Conv Base):** This part consists of the convolutional layers. Its function is to **extract features** and spatial information from the image pixels.
    2.  **Fully Connected Layers (FC Layers):** This part consists of the dense layers and the output layer. Its function is **classification**.
*   **Feature Extraction Progression:** The early convolutional layers extract **primitive features** (like edges). As the network progresses to deeper layers, it extracts more complex patterns and features (like shapes).
*   **Rationale for Transfer:** Since the primitive and general features of real-world objects are similar (they are "common"), the knowledge learned by the Conv Base on a large dataset like ImageNet is universally useful. Therefore, there is **no need to re-invent the wheel** by retraining those basic feature extraction layers. The core idea is to use the existing knowledge (the learned weights in the Conv Base) and apply it to the new task.

## V. Two Techniques for Transfer Learning

Transfer Learning is typically applied using one of two primary methods: Feature Extraction or Fine-Tuning.

### 1. Feature Extraction

This technique utilizes the pre-trained Conv Base as a fixed feature extractor.

*   **Process:**
    *   The original FC/Dance layers are removed.
    *   New Dance layers and a new output layer (e.g., a single neuron with sigmoid for binary classification) are attached.
    *   The **Conv Base is frozen** by setting its `trainable` value to `False`. The weights in the Conv Base do not change during training.
    *   Only the newly added FC layers are trained on the specific data.
*   **Use Case:** Feature Extraction is suitable when the new image classification task (e.g., Cat/Dog classification) has labels that are **similar** to the data the model was originally trained on (e.g., ImageNet, which already includes many animals).

### 2. Fine-Tuning

This technique allows for partial retraining of the pre-trained model.

*   **Process:**
    *   The FC layers are replaced, similar to Feature Extraction.
    *   Crucially, **the last few convolutional layers** of the Conv Base are **unfrozen** (`trainable=True`) and retrained, while the very early layers remain frozen.
    *   The entire path (unfrozen Conv layers + new Dance layers) is then trained on the new data.
*   **Use Case:** Fine-Tuning is necessary when the new problem is **very different** from the original dataset (e.g., classifying phones versus tablets, if ImageNet contained no relevant data for these specific objects). Since the knowledge of the later Conv layers might be too specific to the old task, unfreezing them allows the model to better adapt.
*   **Implementation Note:** When performing Fine-Tuning, it is advisable to use a very low learning rate with an optimizer like RMSprop.
---

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Feature extractition

In [4]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.9.23-py2.py3-none-any.whl.metadata (875 bytes)
Collecting google_pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting tensorboard~=2.20.0 (from tensorflow)
  Downloading tensorboard-2.20.0-py3-none-any.whl.metadata (1.8 kB)
Collecting wheel<1.0,>=0.23.0 (from astunparse>=1.6.0->tensorflow)
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard~=2.20.0->tensorflow)
  Downloading tensorboard_data_server-0.

In [2]:
import tensorflow
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense,Flatten
from keras.applications.vgg16 import VGG16

In [3]:
conv_base = VGG16(
    weights='imagenet',
    include_top = False,
    input_shape=(150,150,3)
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [4]:
conv_base.summary()

In [5]:
model = Sequential()

model.add(conv_base)
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

In [6]:
model.summary()

In [7]:
conv_base.trainable = False

In [8]:
model.summary()

In [9]:
train_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/training_set/training_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

validation_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/test_set/test_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

Found 8006 files belonging to 2 classes.
Found 2023 files belonging to 2 classes.


In [10]:
# Normalize
def process(image,label):
    image = tensorflow.cast(image/255. ,tensorflow.float32)
    return image,label

train_ds = train_ds.map(process)
validation_ds = validation_ds.map(process)

In [11]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
history = model.fit(train_ds,epochs=10,validation_data=validation_ds)

Epoch 1/10
[1m 73/251[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m10:19[0m 3s/step - accuracy: 0.6865 - loss: 0.9580

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'],color='red',label='train')
plt.plot(history.history['val_accuracy'],color='blue',label='validation')
plt.legend()
plt.show()

## Data Augmentation in feature extraction

In [None]:
from keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,BatchNormalization,Dropout
from tensorflow.keras import layers, models

data_augmentation = Sequential([
    layers.RandomFlip("horizontal",input_shape=(150,150,3)),    # random horizontal flips
    layers.RandomRotation(0.1),         # small rotations
    layers.RandomZoom(0.1),             # zoom
])

In [None]:
train_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/training_set/training_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

validation_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/test_set/test_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

In [None]:
model.add(data_augmentation)
model.add(layers.Rescaling(1./255))

model.add(conv_base)
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

In [None]:
conv_base.trainable = False

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
history = model.fit(train_ds,epochs=10,validation_data=validation_ds)

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'],color='red',label='train')
plt.plot(history.history['val_accuracy'],color='blue',label='validation')
plt.legend()
plt.show()

# Fine Tunning

In [None]:
# Fine tunning block5 conuvlations layers and dense layers only

In [None]:
conv_base.trainable = True

set_trainable = False

for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

for layer in conv_base.layers:
  print(layer.name,layer.trainable)

In [None]:
conv_base.summary()

In [None]:
model = Sequential()

model.add(conv_base)
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

In [None]:
train_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/training_set/training_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

validation_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/test_set/test_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

In [None]:
def process(image,label):
    image = tensorflow.cast(image/255. ,tensorflow.float32)
    return image,label

train_ds = train_ds.map(process)
validation_ds = validation_ds.map(process)

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
history = model.fit(train_ds,epochs=10,validation_data=validation_ds)

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'],color='red',label='train')
plt.plot(history.history['val_accuracy'],color='blue',label='validation')
plt.legend()
plt.show()

## Fine tunning with data aug

In [None]:
from keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,BatchNormalization,Dropout
from tensorflow.keras import layers, models

data_augmentation = Sequential([
    layers.RandomFlip("horizontal",input_shape=(150,150,3)),    # random horizontal flips
    layers.RandomRotation(0.1),         # small rotations
    layers.RandomZoom(0.1),             # zoom
])

In [None]:
train_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/training_set/training_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

validation_ds = keras.utils.image_dataset_from_directory(
    directory = '/content/drive/MyDrive/dag_cot/test_set/test_set',
    labels='inferred',
    label_mode = 'int',
    batch_size=32,
    image_size=(150,150)
)

In [None]:
model.add(data_augmentation)
model.add(layers.Rescaling(1./255))

model.add(conv_base)
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(1,activation='sigmoid'))

In [None]:
conv_base.trainable = True

set_trainable = False

for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

for layer in conv_base.layers:
  print(layer.name,layer.trainable)

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

In [None]:
history = model.fit(train_ds,epochs=10,validation_data=validation_ds)

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'],color='red',label='train')
plt.plot(history.history['val_accuracy'],color='blue',label='validation')
plt.legend()
plt.show()