# <font color="#418FDE" size="6.5" uppercase>**Transfer Learning**</font>

>Last update: 20260126.
    
By the end of this Lecture, you will be able to:
- Load and adapt pretrained CNN backbones from tf.keras.applications for new classification tasks. 
- Configure layer freezing and unfreezing strategies to balance feature reuse and task-specific learning. 
- Implement a staged training schedule for fine-tuning and evaluate its impact on performance. 


## **1. Working With Pretrained Models**

### **1.1. Using tf.keras.applications**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_01_01.jpg?v=1769414305" width="250">



>* Use ready-made pretrained CNNs from keras applications
>* Reuse rich learned features instead of training scratch

>* Choose weights, heads, and input size options
>* Use backbone features, add custom task-specific layers

>* Pretrained backbones plug into larger models easily
>* Modularity enables multi-task heads and rapid experimentation



In [None]:
#@title Python Code - Using tf.keras.applications

# This script shows basic transfer learning usage.
# We use tf.keras.applications for pretrained models.
# All steps are small and beginner friendly.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras modules.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load CIFAR10 dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Select a small subset for quick demonstration.
train_samples = 1000
test_samples = 200
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data to keep runtime small.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Define target image size for backbone.
img_height, img_width = 96, 96

# Resize images to match backbone expectations.
x_train_resized = tf.image.resize(x_train, (img_height, img_width))
x_test_resized = tf.image.resize(x_test, (img_height, img_width))

# Confirm resized shapes before building model.
print("Train batch shape:", x_train_resized.shape)
print("Test batch shape:", x_test_resized.shape)

# Choose number of target classes here.
num_classes = 10

# Create input layer matching resized images.
inputs = keras.Input(shape=(img_height, img_width, 3))

# Use preprocessing layer for MobileNetV2.
preprocess = keras.applications.mobilenet_v2.preprocess_input
x = layers.Lambda(preprocess)(inputs)

# Load MobileNetV2 backbone without top classifier.
base_model = keras.applications.MobileNetV2(
    input_shape=(img_height, img_width, 3),
    include_top=False,
    weights="imagenet",
)

# Freeze backbone weights for initial training.
base_model.trainable = False

# Pass preprocessed inputs through backbone.
features = base_model(x, training=False)

# Apply global pooling to reduce feature maps.
x = layers.GlobalAveragePooling2D()(features)

# Add small dense layer for better representation.
x = layers.Dense(64, activation="relu")(x)

# Add final classification layer for CIFAR10.
outputs = layers.Dense(num_classes, activation="softmax")(x)

# Build full transfer learning model.
model = keras.Model(inputs=inputs, outputs=outputs)

# Compile model with simple optimizer and loss.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only new head while backbone is frozen.
history_frozen = model.fit(
    x_train_resized,
    y_train,
    epochs=2,
    batch_size=32,
    validation_split=0.2,
    verbose=0,
)

# Evaluate frozen model on small test subset.
loss_frozen, acc_frozen = model.evaluate(
    x_test_resized,
    y_test,
    verbose=0,
)

# Unfreeze some deeper layers for fine tuning.
for layer in base_model.layers[-20:]:
    layer.trainable = True

# Recompile with lower learning rate for safety.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Fine tune model briefly with unfrozen layers.
history_finetune = model.fit(
    x_train_resized,
    y_train,
    epochs=1,
    batch_size=32,
    validation_split=0.2,
    verbose=0,
)

# Evaluate fine tuned model on test subset.
loss_finetune, acc_finetune = model.evaluate(
    x_test_resized,
    y_test,
    verbose=0,
)

# Print concise summary of both evaluation stages.
print("Frozen backbone accuracy:", round(acc_frozen, 4))
print("Fine tuned accuracy:", round(acc_finetune, 4))
print("Backbone trainable layers:", sum(l.trainable for l in base_model.layers))




### **1.2. Input Preprocessing Essentials**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_01_02.jpg?v=1769414364" width="250">



>* Match image size, channels, and scaling exactly
>* Mismatched preprocessing breaks features and hurts transfer

>* Models expect specific pixel ranges and normalization
>* Wrong scaling harms activations, stability, and accuracy

>* Match backbone preprocessing, then add domain tweaks
>* Keep the entire preprocessing pipeline consistent everywhere



In [None]:
#@title Python Code - Input Preprocessing Essentials

# This script shows essential input preprocessing concepts.
# We compare raw and model specific preprocessing for images.
# Focus is on beginner friendly TensorFlow vision workflows.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
np.random.seed(42)
tf.random.set_seed(42)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select a small pretrained backbone from applications.
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

# Load CIFAR10 dataset for simple image examples.
(x_train, y_train), _ = tf.keras.datasets.cifar10.load_data()

# Take a tiny subset to keep runtime small.
x_sample = x_train[:8]
y_sample = y_train[:8]

# Check basic shape to ensure expected dimensions.
print("Original sample shape:", x_sample.shape)

# Define target size expected by MobileNetV2.
target_height, target_width = 224, 224

# Create a simple resizing layer for images.
resize_layer = tf.keras.layers.Resizing(
    target_height, target_width
)

# Convert sample images to float32 for preprocessing.
x_sample_float = x_sample.astype("float32")

# Resize images to match backbone input size.
x_resized = resize_layer(x_sample_float)

# Confirm resized shape before further processing.
print("Resized sample shape:", x_resized.shape)

# Show raw pixel range before any normalization.
print("Raw min, max:", x_resized.numpy().min(), x_resized.numpy().max())

# Apply MobileNetV2 specific preprocessing function.
x_preprocessed = preprocess_input(x_resized)

# Show new pixel range after model preprocessing.
print("Preprocessed min, max:", x_preprocessed.numpy().min(), x_preprocessed.numpy().max())

# Build a minimal model using the pretrained backbone.
base_model = MobileNetV2(
    include_top=False,
    weights="imagenet",
    input_shape=(target_height, target_width, 3)
)

# Freeze backbone to focus on preprocessing effects.
base_model.trainable = False

# Add simple pooling and classification head.
inputs = tf.keras.Input(shape=(target_height, target_width, 3))
x = base_model(inputs, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
outputs = tf.keras.layers.Dense(10, activation="softmax")(x)
model = tf.keras.Model(inputs, outputs)

# Compile model with simple configuration.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

# Run a forward pass with correctly preprocessed images.
correct_preds = model.predict(x_preprocessed, verbose=0)

# Run a forward pass with only resized raw images.
raw_scaled = x_resized / 255.0
raw_preds = model.predict(raw_scaled, verbose=0)

# Compute average confidence for both preprocessing choices.
correct_conf = float(np.mean(np.max(correct_preds, axis=1)))
raw_conf = float(np.mean(np.max(raw_preds, axis=1)))

# Print a short comparison of average confidences.
print("Avg confidence with correct preprocessing:", round(correct_conf, 4))
print("Avg confidence with raw scaled images:", round(raw_conf, 4))

# Final line prints reminder about matching preprocessing expectations.
print("Remember to match size and normalization to backbone.")



### **1.3. Selecting CNN Backbones**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_01_03.jpg?v=1769414408" width="250">



>* Choose backbones balancing accuracy, speed, and memory
>* Match model size to task needs and hardware

>* Check if pretrained features match your domain
>* Prefer backbones proven on similar specialized data

>* Match architecture details to training and deployment
>* Consider connections, efficiency, and input resolution trade-offs



## **2. Layer Freezing Techniques**

### **2.1. Controlling Layer Trainability**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_02_01.jpg?v=1769414438" width="250">



>* Choose which pretrained layers can still learn
>* Freeze generic features, adapt deeper layers to task

>* Freezing layers cuts compute cost and stabilizes features
>* Selective unfreezing adds domain-specific details while preserving generality

>* Match freezing depth to task and data
>* Treat trainability as gradual, tuning layers for adaptation



In [None]:
#@title Python Code - Controlling Layer Trainability

# This script shows controlling layer trainability.
# We use a small pretrained backbone example.
# Focus on freezing and unfreezing convolutional layers.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Set deterministic seeds for reproducibility.
random.seed(7)
np.random.seed(7)

# Import TensorFlow and Keras modules.
import tensorflow as tf
from tensorflow import keras

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Select device based on GPU availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_type = "GPU"
else:
    device_type = "CPU"

# Print which device type will be used.
print("Using device type:", device_type)

# Load CIFAR10 dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Confirm dataset shapes before subsampling.
print("Train shape:", x_train.shape, y_train.shape)

# Select a small subset for quick training.
subset_size = 2000
x_train_small = x_train[:subset_size]
y_train_small = y_train[:subset_size]

# Normalize pixel values to range zero one.
x_train_small = x_train_small.astype("float32") / 255.0
x_test_small = x_test.astype("float32") / 255.0

# Convert labels to integer vectors.
y_train_small = y_train_small.flatten()
y_test_small = y_test.flatten()

# Define input shape for the model.
input_shape = (32, 32, 3)

# Create a simple pretrained like backbone.
backbone_input = keras.Input(shape=input_shape)

# Add convolutional feature extractor layers.
x = keras.layers.Conv2D(32, (3, 3), activation="relu")(
    backbone_input
)

# Add a second convolutional block layer.
x = keras.layers.Conv2D(64, (3, 3), activation="relu")(
    x
)

# Add pooling to reduce spatial dimensions.
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(
    x
)

# Add another convolutional block layer.
x = keras.layers.Conv2D(128, (3, 3), activation="relu")(
    x
)

# Add global average pooling for features.
features = keras.layers.GlobalAveragePooling2D()(x)

# Build the backbone model object.
backbone = keras.Model(backbone_input, features, name="toy_backbone")

# Pretend backbone is pretrained and freeze.
for layer in backbone.layers:
    layer.trainable = False

# Show how many layers are trainable now.
trainable_count = np.sum([layer.trainable for layer in backbone.layers])
print("Trainable layers after freeze:", int(trainable_count))

# Create new classification head for CIFAR10.
head_input = keras.Input(shape=input_shape)

# Pass images through the frozen backbone.
h = backbone(head_input, training=False)

# Add a small dense classification head.
h = keras.layers.Dense(64, activation="relu")(h)
outputs = keras.layers.Dense(10, activation="softmax")(h)

# Build the full transfer learning model.
model = keras.Model(head_input, outputs, name="transfer_model")

# Compile model with simple optimizer.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only the new head while backbone frozen.
history_frozen = model.fit(
    x_train_small,
    y_train_small,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate model performance with frozen backbone.
loss_frozen, acc_frozen = model.evaluate(
    x_test_small[:1000],
    y_test_small[:1000],
    verbose=0,
)

# Unfreeze last convolutional block for fine tuning.
for layer in backbone.layers[-3:]:
    layer.trainable = True

# Recompile with lower learning rate now.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train again with partially trainable backbone.
history_unfrozen = model.fit(
    x_train_small,
    y_train_small,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate model performance after unfreezing.
loss_unfrozen, acc_unfrozen = model.evaluate(
    x_test_small[:1000],
    y_test_small[:1000],
    verbose=0,
)

# Print concise comparison of both training stages.
print("Frozen backbone accuracy:", round(acc_frozen, 4))
print("Unfrozen block accuracy:", round(acc_unfrozen, 4))
print("Trainable layers after unfreeze:", np.sum([layer.trainable for layer in backbone.layers]))




### **2.2. Training Classifier Head Only**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_02_02.jpg?v=1769414490" width="250">



>* Freeze pretrained backbone as fixed feature extractor
>* Train small head for new task efficiently

>* Works best when tasks share visual patterns
>* Gives strong, fast baseline and comparison point

>* Head-only training regularizes and reduces overfitting
>* Use as baseline before progressively unfreezing layers



In [None]:
#@title Python Code - Training Classifier Head Only

# This script shows training classifier head only.
# We freeze a pretrained backbone and train top layers.
# This demonstrates conservative transfer learning strategy.

# !pip install tensorflow==2.20.0.

# Import required libraries for TensorFlow and NumPy.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducible behavior.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one concise line.
print("TensorFlow version:", tf.__version__)

# Select device preference based on GPU availability.
physical_gpus = tf.config.list_physical_devices("GPU")
if physical_gpus:
    device_name = "GPU"
else:
    device_name = "CPU"

# Print which device type will likely be used.
print("Using device type:", device_name)

# Load CIFAR10 dataset using Keras utilities.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values to range zero to one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Reduce dataset size for faster demonstration.
train_samples = 2000
test_samples = 500
x_train_small = x_train[:train_samples]
y_train_small = y_train[:train_samples]

# Create small test subset for quick evaluation.
x_test_small = x_test[:test_samples]
y_test_small = y_test[:test_samples]

# Validate shapes to avoid unexpected broadcasting.
print("Train subset shape:", x_train_small.shape)
print("Test subset shape:", x_test_small.shape)

# Define image size expected by MobileNetV2 backbone.
img_height, img_width = 96, 96
num_classes = 10

# Resize images using TensorFlow image operations.
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
x_train_resized = resize_layer(x_train_small)
x_test_resized = resize_layer(x_test_small)

# Confirm resized shapes before building model.
print("Resized train shape:", x_train_resized.shape)

# Create base model from pretrained MobileNetV2.
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(img_height, img_width, 3),
    include_top=False,
    weights="imagenet",
)

# Freeze all backbone layers to act as feature extractor.
for layer in base_model.layers:
    layer.trainable = False

# Show how many layers are frozen in backbone.
print("Backbone layers frozen:", len(base_model.layers))

# Create input layer matching resized image shape.
inputs = tf.keras.Input(shape=(img_height, img_width, 3))

# Apply preprocessing specific to MobileNetV2.
preprocessed = tf.keras.applications.mobilenet_v2.preprocess_input(inputs)

# Pass preprocessed images through frozen backbone.
features = base_model(preprocessed, training=False)

# Pool spatial dimensions to single feature vector.
pooled = tf.keras.layers.GlobalAveragePooling2D()(features)

# Add small dense layer as classifier head.
outputs = tf.keras.layers.Dense(num_classes, activation="softmax")(pooled)

# Build final model combining backbone and head.
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Confirm only classifier head parameters are trainable.
trainable_count = np.sum([np.prod(v.shape) for v in model.trainable_weights])
non_trainable_count = np.sum([np.prod(v.shape) for v in model.non_trainable_weights])

# Print parameter counts for clarity and comparison.
print("Trainable params (head only):", int(trainable_count))
print("Non-trainable params (backbone):", int(non_trainable_count))

# Compile model with simple optimizer and loss.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only classifier head using small subset.
history = model.fit(
    x_train_resized,
    y_train_small,
    epochs=3,
    batch_size=32,
    validation_split=0.1,
    verbose=0,
)

# Evaluate performance on small test subset.
loss, acc = model.evaluate(
    x_test_resized,
    y_test_small,
    verbose=0,
)

# Print concise summary of evaluation results.
print("Test loss with frozen backbone:", round(float(loss), 4))
print("Test accuracy with frozen backbone:", round(float(acc), 4))




### **2.3. Progressive Layer Unfreezing**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_02_03.jpg?v=1769414556" width="250">



>* Start training with backbone frozen, head trainable
>* Gradually unfreeze deeper layers to avoid forgetting

>* Different CNN layers capture increasingly abstract features
>* Unfreeze higher layers first; keep generic features

>* Adjust learning rates when unfreezing deeper layers
>* Track validation metrics to balance stability and adaptation



In [None]:
#@title Python Code - Progressive Layer Unfreezing

# This script demonstrates progressive layer unfreezing.
# We use a small pretrained CNN on CIFAR10 images.
# Focus on freezing and unfreezing backbone layers.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Detect available device type.
device_type = tf.config.list_physical_devices("GPU")
print("Using GPU:" if device_type else "Using CPU:", bool(device_type))

# Load CIFAR10 dataset from keras datasets.
(cifar_x_train, cifar_y_train), (cifar_x_test, cifar_y_test) = (
    tf.keras.datasets.cifar10.load_data()
)

# Use a very small subset for quick training.
train_samples = 200
test_samples = 100
x_train = cifar_x_train[:train_samples]
y_train = cifar_y_train[:train_samples]

# Slice test subset for evaluation.
x_test = cifar_x_test[:test_samples]
y_test = cifar_y_test[:test_samples]

# Normalize images to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Validate shapes before building model.
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)

# Define image size expected by backbone.
img_height, img_width = 96, 96
num_classes = 10

# Resize images using tf.image.resize.
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
x_train_resized = resize_layer(x_train)
x_test_resized = resize_layer(x_test)

# Confirm resized shapes are correct.
print("Resized train shape:", x_train_resized.shape)
print("Resized test shape:", x_test_resized.shape)

# Create a small pretrained backbone model.
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(img_height, img_width, 3),
    include_top=False,
    weights="imagenet",
)

# Freeze all backbone layers initially.
for layer in base_model.layers:
    layer.trainable = False

# Add global pooling and dense classifier.
inputs = tf.keras.Input(shape=(img_height, img_width, 3))
x = tf.keras.applications.mobilenet_v2.preprocess_input(inputs)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
outputs = tf.keras.layers.Dense(num_classes, activation="softmax")(x)

# Build the full transfer learning model.
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Compile model for initial head training.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only classifier head with frozen backbone.
history_head = model.fit(
    x_train_resized,
    y_train,
    epochs=2,
    batch_size=32,
    validation_data=(x_test_resized, y_test),
    verbose=0,
)

# Evaluate after head training phase.
loss_head, acc_head = model.evaluate(
    x_test_resized,
    y_test,
    verbose=0,
)

# Progressively unfreeze top backbone layers.
for layer in base_model.layers[-20:]:
    layer.trainable = True

# Recompile with lower learning rate for fine tuning.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train again with partially unfrozen backbone.
history_fine = model.fit(
    x_train_resized,
    y_train,
    epochs=2,
    batch_size=32,
    validation_data=(x_test_resized, y_test),
    verbose=0,
)

# Evaluate after progressive unfreezing.
loss_fine, acc_fine = model.evaluate(
    x_test_resized,
    y_test,
    verbose=0,
)

# Print concise comparison of accuracies.
print("Accuracy after head training:", round(acc_head, 4))
print("Accuracy after fine tuning:", round(acc_fine, 4))




## **3. Staged Fine Tuning**

### **3.1. Two Stage Training**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_03_01.jpg?v=1769414618" width="250">



>* Freeze backbone, train only new classification head
>* Stabilizes training, preserves features, saves data

>* Unfreeze backbone layers and train model end-to-end
>* Refine features for new domain, boosting performance

>* Two stages balance speed, resources, and risk
>* Compare metrics to judge benefits of deeper fine tuning



In [None]:
#@title Python Code - Two Stage Training

# This script demonstrates two stage training.
# We use a small pretrained CNN backbone.
# We keep runtime short and outputs minimal.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import tensorflow and keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Detect available device type for information.
physical_gpus = tf.config.list_physical_devices("GPU")
print("GPUs available:", len(physical_gpus))

# Load CIFAR10 dataset from keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Use a small subset for quick demonstration.
train_samples = 2000
test_samples = 1000

# Slice the dataset to the chosen subset.
x_train_small = x_train[:train_samples]
y_train_small = y_train[:train_samples]

# Slice the test dataset subset similarly.
x_test_small = x_test[:test_samples]
y_test_small = y_test[:test_samples]

# Normalize images to range zero one.
x_train_small = x_train_small.astype("float32") / 255.0
x_test_small = x_test_small.astype("float32") / 255.0

# Confirm shapes are as expected.
print("Train subset shape:", x_train_small.shape)
print("Test subset shape:", x_test_small.shape)

# Define image size expected by backbone.
img_height, img_width = 32, 32
num_classes = 10

# Resize images to match backbone input.
resize_layer = keras.Sequential([
    layers.Resizing(96, 96),
])

# Apply resizing to training subset.
x_train_resized = resize_layer(x_train_small)
x_test_resized = resize_layer(x_test_small)

# Validate resized shapes before modeling.
print("Resized train shape:", x_train_resized.shape)

# Choose a small pretrained backbone model.
backbone = keras.applications.MobileNetV2(
    input_shape=(96, 96, 3),
    include_top=False,
    weights="imagenet",
)

# Freeze backbone initially for stage one.
backbone.trainable = False
print("Backbone trainable stage1:", backbone.trainable)

# Build classification head on top of backbone.
inputs = keras.Input(shape=(96, 96, 3))
x = backbone(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)

# Create the full transfer learning model.
model = keras.Model(inputs, outputs)

# Compile model for first training stage.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only the classification head first.
history_stage1 = model.fit(
    x_train_resized,
    y_train_small,
    epochs=2,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Evaluate performance after first stage.
loss1, acc1 = model.evaluate(
    x_test_resized,
    y_test_small,
    verbose=0,
)

# Print concise metrics for stage one.
print("Stage1 test accuracy:", round(float(acc1), 4))

# Unfreeze some backbone layers for stage two.
backbone.trainable = True
for layer in backbone.layers[:80]:
    layer.trainable = False

# Confirm backbone is now partially trainable.
trainable_count = np.sum([layer.trainable for layer in backbone.layers])
print("Backbone trainable layers:", int(trainable_count))

# Recompile with lower learning rate for fine tuning.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train both head and selected backbone layers.
history_stage2 = model.fit(
    x_train_resized,
    y_train_small,
    epochs=2,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Evaluate performance after second stage.
loss2, acc2 = model.evaluate(
    x_test_resized,
    y_test_small,
    verbose=0,
)

# Print concise metrics for stage two.
print("Stage2 test accuracy:", round(float(acc2), 4))

# Show simple comparison of both training stages.
print("Accuracy gain:", round(float(acc2 - acc1), 4))



### **3.2. Gentle Backbone Learning Rates**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_03_02.jpg?v=1769414687" width="250">



>* Backbone needs very small learning rate
>* Preserve learned features while gently adapting to task

>* Use smaller learning rate for pretrained backbone
>* Faster head, slower backbone improves stability, generalization

>* Start with tiny learning rates when unfreezing
>* Adjust or decay rates to improve stable generalization



In [None]:
#@title Python Code - Gentle Backbone Learning Rates

# This script shows gentle backbone learning rates.
# We fine tune a pretrained CNN with staged training.
# Focus on safe small learning rates for backbones.

# !pip install tensorflow==2.20.0.

# Import required libraries for TensorFlow training.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducible behavior.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one concise line.
print("TensorFlow version:", tf.__version__)

# Detect available device type for information only.
physical_gpus = tf.config.list_physical_devices("GPU")
print("GPUs available:", len(physical_gpus))

# Load CIFAR10 dataset from keras built in source.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Select a small subset for quick demonstration.
train_samples = 2000
test_samples = 500
x_train = x_train[:train_samples]

# Slice labels and test images to chosen subset.
y_train = y_train[:train_samples]
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize images to float range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Resize images to match MobileNetV2 expected size.
resize_layer = tf.keras.layers.Resizing(160, 160)
x_train_resized = resize_layer(x_train)

# Apply same resizing operation to test images.
x_test_resized = resize_layer(x_test)

# Confirm shapes are as expected before modeling.
print("Train shape:", x_train_resized.shape)
print("Test shape:", x_test_resized.shape)

# Define number of target classes for CIFAR10.
num_classes = 10

# Load MobileNetV2 backbone without top classifier.
backbone = tf.keras.applications.MobileNetV2(
    input_shape=(160, 160, 3),
    include_top=False,
    weights="imagenet",
)

# Freeze backbone initially for head only training.
backbone.trainable = False

# Build classification head on top of backbone.
inputs = tf.keras.Input(shape=(160, 160, 3))
scaled = tf.keras.applications.mobilenet_v2.preprocess_input(inputs)

# Pass scaled inputs through pretrained backbone network.
features = backbone(scaled, training=False)
pooled = tf.keras.layers.GlobalAveragePooling2D()(features)

# Add small dense layer for extra capacity.
hidden = tf.keras.layers.Dense(64, activation="relu")(pooled)
outputs = tf.keras.layers.Dense(num_classes, activation="softmax")(hidden)

# Create full model combining backbone and head.
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Compile model for stage one with higher rate.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only classification head for few epochs.
history_stage1 = model.fit(
    x_train_resized,
    y_train,
    epochs=2,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Unfreeze backbone for gentle fine tuning stage.
backbone.trainable = True

# Set gentle learning rate for entire model backbone.
backbone_learning_rate = 1e-5

# Use smaller rate to avoid destroying pretrained features.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=backbone_learning_rate),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train whole model briefly with gentle backbone rate.
history_stage2 = model.fit(
    x_train_resized,
    y_train,
    epochs=2,
    batch_size=64,
    validation_split=0.2,
    verbose=0,
)

# Evaluate model performance after staged fine tuning.
loss, accuracy = model.evaluate(
    x_test_resized,
    y_test,
    batch_size=64,
    verbose=0,
)

# Print concise summary of both training stages.
print("Stage1 final val acc:", history_stage1.history["val_accuracy"][-1])
print("Stage2 final val acc:", history_stage2.history["val_accuracy"][-1])

# Show test accuracy after gentle backbone fine tuning.
print("Test accuracy after fine tuning:", accuracy)



### **3.3. Overfitting Monitoring Strategies**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_06/Lecture_B/image_03_03.jpg?v=1769414796" width="250">



>* Use a realistic validation set and metrics
>* Watch validation loss diverging from training as warning

>* Track per-class metrics to detect uneven learning
>* Use confusion patterns to spot bias, adjust training

>* Use validation-based early stopping and checkpoints
>* Stress-test checkpoints to ensure real-world robustness



In [None]:
#@title Python Code - Overfitting Monitoring Strategies

# This script shows staged fine tuning monitoring.
# We use a tiny CNN with transfer learning.
# We monitor validation loss to detect overfitting.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Detect available device type.
device_type = tf.config.list_physical_devices("GPU")
print("Using GPU:" if device_type else "Using CPU only")

# Load CIFAR10 dataset from keras datasets.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Use small subset for quick demonstration.
train_samples = 2000
test_samples = 500
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test set for validation usage.
x_val = x_test[:test_samples]
y_val = y_test[:test_samples]

# Normalize images to zero one range.
x_train = x_train.astype("float32") / 255.0
x_val = x_val.astype("float32") / 255.0

# Validate shapes before building model.
print("Train shape:", x_train.shape, y_train.shape)
print("Val shape:", x_val.shape, y_val.shape)

# Load small pretrained backbone from applications.
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(96, 96, 3), include_top=False, weights="imagenet"
)

# Freeze backbone initially for head training.
base_model.trainable = False

# Build preprocessing and resizing layers.
inputs = tf.keras.Input(shape=(32, 32, 3))
resize = tf.keras.layers.Resizing(96, 96)(inputs)
preprocessed = tf.keras.applications.mobilenet_v2.preprocess_input(resize)

# Add backbone and pooling layer.
features = base_model(preprocessed, training=False)
pooled = tf.keras.layers.GlobalAveragePooling2D()(features)

# Add small classifier head for CIFAR10.
outputs = tf.keras.layers.Dense(10, activation="softmax")(pooled)
model = tf.keras.Model(inputs, outputs)

# Compile model for first training stage.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy", metrics=["accuracy"],
)

# Define early stopping callback for monitoring.
early_stop = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=2, restore_best_weights=True,
)

# Train only classifier head first stage.
history_stage1 = model.fit(
    x_train, y_train, epochs=3, batch_size=64,
    validation_data=(x_val, y_val), verbose=0, callbacks=[early_stop],
)

# Record best validation loss from stage one.
best_val_loss_stage1 = min(history_stage1.history["val_loss"])
print("Stage1 best val_loss:", round(best_val_loss_stage1, 4))

# Unfreeze top layers of backbone for fine tuning.
for layer in base_model.layers[-20:]:
    layer.trainable = True

# Recompile with lower learning rate for stability.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss="sparse_categorical_crossentropy", metrics=["accuracy"],
)

# Define new early stopping for second stage.
early_stop_stage2 = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=2, restore_best_weights=True,
)

# Train second stage with partial unfreezing.
history_stage2 = model.fit(
    x_train, y_train, epochs=5, batch_size=64,
    validation_data=(x_val, y_val), verbose=0,
    callbacks=[early_stop_stage2],
)

# Get best validation loss from second stage.
best_val_loss_stage2 = min(history_stage2.history["val_loss"])
print("Stage2 best val_loss:", round(best_val_loss_stage2, 4))

# Decide if fine tuning improved generalization.
if best_val_loss_stage2 < best_val_loss_stage1:
    decision_message = "Fine tuning helped, keep new weights."
else:
    decision_message = "Fine tuning hurt, prefer earlier checkpoint."

# Print concise monitoring summary lines.
print("Monitoring summary:")
print("Stage1 epochs:", len(history_stage1.history["loss"]))
print("Stage2 epochs:", len(history_stage2.history["loss"]))
print("Decision:", decision_message)

# Evaluate final model on validation subset.
val_loss, val_acc = model.evaluate(
    x_val, y_val, verbose=0,
)
print("Final val_loss and val_accuracy:", round(val_loss, 4), round(val_acc, 4))



# <font color="#418FDE" size="6.5" uppercase>**Transfer Learning**</font>


In this lecture, you learned to:
- Load and adapt pretrained CNN backbones from tf.keras.applications for new classification tasks. 
- Configure layer freezing and unfreezing strategies to balance feature reuse and task-specific learning. 
- Implement a staged training schedule for fine-tuning and evaluate its impact on performance. 

In the next Module (Module 7), we will go over 'NLP with TensorFlow'