# <font color="#418FDE" size="6.5" uppercase>**Transfer Learning**</font>

>Last update: 20260129.
    
By the end of this Lecture, you will be able to:
- Load pretrained vision models from torchvision and adapt them to new classification tasks. 
- Configure layer freezing and unfreezing strategies to balance training speed and performance. 
- Fine‑tune a pretrained model on a custom dataset and compare results to training from scratch. 


## **1. Using Pretrained Models**

### **1.1. torchvision models overview**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_01_01.jpg?v=1769748490" width="250">



>* Torchvision offers many pretrained vision architectures
>* These models transfer easily to new image tasks

>* Torchvision offers many models with trade-off options
>* Easily switch models to match hardware and goals

>* Models split into feature extractor and head
>* Reuse features, swap heads for new tasks



In [None]:
#@title Python Code - torchvision models overview

# This script introduces torchvision style model concepts.
# We simulate a torchvision overview using TensorFlow models.
# Focus is on pretrained style loading and adaptation.

# TensorFlow is available by default in this environment.
# !pip install tensorflow.

# Import required TensorFlow and system modules.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one concise line.
print("TensorFlow version:", tf.__version__)

# Prepare a tiny synthetic image dataset for demonstration.
num_classes = 3
image_height = 64
image_width = 64
channels = 3

# Create small random image tensors and labels.
num_samples = 60
x_data = tf.random.uniform(
    shape=(num_samples, image_height, image_width, channels)
)
y_data = tf.random.uniform(
    shape=(num_samples,), maxval=num_classes, dtype=tf.int32
)

# Convert labels to categorical one hot encoding.
y_data_cat = tf.keras.utils.to_categorical(
    y_data, num_classes=num_classes
)

# Split data into simple train and validation subsets.
train_size = 40
x_train = x_data[:train_size]
y_train = y_data_cat[:train_size]

# Prepare validation subset from remaining samples.
x_val = x_data[train_size:]
y_val = y_data_cat[train_size:]

# Confirm shapes are as expected before modeling.
print("Train shape:", x_train.shape, y_train.shape)
print("Val shape:", x_val.shape, y_val.shape)

# Build a small base feature extractor like torchvision backbones.
base_model = tf.keras.applications.MobileNetV2(
    input_shape=(image_height, image_width, channels),
    include_top=False,
    weights=None
)

# Add a global pooling layer to compress spatial dimensions.
global_pool = tf.keras.layers.GlobalAveragePooling2D()

# Create a new classification head for our tiny task.
classifier_head = tf.keras.layers.Dense(
    num_classes, activation="softmax"
)

# Connect base model and head into a full model.
inputs = tf.keras.Input(
    shape=(image_height, image_width, channels)
)
features = base_model(inputs, training=False)
pooled = global_pool(features)
outputs = classifier_head(pooled)

# Define the complete model object.
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Show a short summary line about trainable layers.
print("Total layers:", len(model.layers))

# Compile the model with simple optimizer and loss.
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

# Train briefly to simulate adapting a pretrained backbone.
history = model.fit(
    x_train,
    y_train,
    validation_data=(x_val, y_val),
    epochs=2,
    batch_size=8,
    verbose=0
)

# Evaluate model performance on validation subset.
val_loss, val_acc = model.evaluate(
    x_val, y_val, verbose=0
)

# Print concise results to connect with overview ideas.
print("Validation loss:", round(float(val_loss), 4))
print("Validation accuracy:", round(float(val_acc), 4))




### **1.2. Model Weights Metadata**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_01_02.jpg?v=1769748565" width="250">



>* Metadata describes how pretrained weights were created
>* It guides correct inputs and reuse for tasks

>* Metadata describes original labels and output mapping
>* Guides replacing final layer and choosing models

>* Metadata defines required preprocessing and input settings
>* Performance metrics provide baselines and reveal training issues



In [None]:
#@title Python Code - Model Weights Metadata

# This script shows model weights metadata concepts.
# We simulate torchvision style metadata using TensorFlow tools.
# Focus is understanding metadata not heavy training.

# Optional install for TensorFlow datasets if missing.
# !pip install tensorflow-datasets.

# Import standard libraries for environment checks.
import os
import random
import sys

# Import TensorFlow for simple image model utilities.
import tensorflow as tf

# Set deterministic seeds for reproducible behavior.
random.seed(42)

# Set TensorFlow random seed for reproducibility.
tf.random.set_seed(42)

# Print TensorFlow version in one concise line.
print("TensorFlow version:", tf.__version__)

# Define a small dictionary mimicking weights metadata.
resnet_like_metadata = {
    "arch": "resnet_like",
    "dataset": "CIFAR10_small",
    "num_classes": 10,
    "input_size": (32, 32),
    "input_channels": 3,
}

# Add normalization statistics to the metadata dictionary.
resnet_like_metadata.update({
    "mean": (0.4914, 0.4822, 0.4465),
    "std": (0.2470, 0.2435, 0.2616),
})

# Add label names and training recipe version.
resnet_like_metadata.update({
    "labels": [
        "airplane",
        "automobile",
        "bird",
        "cat",
        "deer",
        "dog",
        "frog",
        "horse",
        "ship",
        "truck",
    ],
    "recipe": "v1_basic_augmentation",
})

# Add simple accuracy metrics from original benchmark.
resnet_like_metadata.update({
    "top1_acc": 0.92,
    "top5_acc": 0.995,
})

# Show key metadata fields that guide reuse.
print("Original dataset:", resnet_like_metadata["dataset"])

# Print expected input shape and channels from metadata.
print("Expected input size:", resnet_like_metadata["input_size"])

# Print normalization statistics for correct preprocessing.
print("Channel mean:", resnet_like_metadata["mean"])

# Print number of classes and first three labels.
print("Num classes:", resnet_like_metadata["num_classes"])

# Show a few example labels from original task.
print("Example labels:", resnet_like_metadata["labels"][:3])

# Print original benchmark accuracy for quick reference.
print("Top1 accuracy:", resnet_like_metadata["top1_acc"])

# Define a new smaller label space for transfer learning.
new_labels = ["cat", "dog", "horse"]

# Derive new number of classes from new labels list.
new_num_classes = len(new_labels)

# Build a tiny model head using metadata dimensions.
inputs = tf.keras.Input(
    shape=(
        resnet_like_metadata["input_size"][0],
        resnet_like_metadata["input_size"][1],
        resnet_like_metadata["input_channels"],
    )
)

# Simulate frozen backbone with a simple layer.
backbone_output = tf.keras.layers.GlobalAveragePooling2D()(inputs)

# Create new classification head for custom labels.
outputs = tf.keras.layers.Dense(
    new_num_classes,
    activation="softmax",
)(backbone_output)

# Build the transfer model combining backbone and head.
transfer_model = tf.keras.Model(inputs=inputs, outputs=outputs)

# Verify output shape matches new label space.
print("New head output shape:", transfer_model.output_shape)

# Confirm that metadata guided our adaptation choices.
print("Adapted to labels:", new_labels)




### **1.3. Input Size Constraints**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_01_03.jpg?v=1769748627" width="250">



>* Pretrained models expect specific image size and channels
>* Match preprocessing to model requirements to avoid issues

>* Input size affects accuracy and computation cost
>* Resize images to model’s expected resolution carefully

>* Different architectures tolerate different input resolutions
>* Standardize size, crop or pad, then fine-tune



In [None]:
#@title Python Code - Input Size Constraints

# This script shows image input size constraints.
# We use TensorFlow to inspect resized image tensors.
# Focus on how models expect specific input shapes.

# !pip install tensorflow.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and image utilities.
import tensorflow as tf
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Define a simple image size helper function.
def describe_tensor(name, tensor):
    shape = tensor.shape
    print(name, "shape:", tuple(shape))

# Choose a target input size like common vision models.
input_height, input_width = 224, 224
num_channels = 3

# Create a dummy RGB image with arbitrary size.
original_height, original_width = 300, 400
random_image = np.random.randint(
    0, 256, size=(original_height, original_width, num_channels), dtype=np.uint8
)

# Convert the NumPy image to a TensorFlow tensor.
image_tensor = tf.convert_to_tensor(random_image, dtype=tf.float32)

# Show the original tensor shape.
describe_tensor("Original", image_tensor)

# Build a resizing layer that mimics preprocessing.
resize_layer = layers.Resizing(input_height, input_width)

# Apply resizing to match model expected input.
resized_tensor = resize_layer(tf.expand_dims(image_tensor, axis=0))

# Remove batch dimension for easier shape display.
resized_tensor_single = tf.squeeze(resized_tensor, axis=0)

# Show the resized tensor shape.
describe_tensor("Resized", resized_tensor_single)

# Validate that resized tensor matches expected size.
expected_shape = (input_height, input_width, num_channels)
if tuple(resized_tensor_single.shape) != expected_shape:
    raise ValueError("Resized tensor shape mismatch with expected size")

# Demonstrate what happens with a grayscale single channel image.
gray_image = np.random.randint(
    0, 256, size=(original_height, original_width, 1), dtype=np.uint8
)

# Convert grayscale image to tensor.
gray_tensor = tf.convert_to_tensor(gray_image, dtype=tf.float32)

# Show grayscale tensor shape before adaptation.
describe_tensor("Grayscale", gray_tensor)

# Convert grayscale to RGB by repeating channels.
rgb_from_gray = tf.repeat(gray_tensor, repeats=3, axis=2)

# Show adapted grayscale tensor shape.
describe_tensor("Gray as RGB", rgb_from_gray)

# Resize the adapted grayscale tensor to expected size.
resized_gray = resize_layer(tf.expand_dims(rgb_from_gray, axis=0))

# Remove batch dimension again.
resized_gray_single = tf.squeeze(resized_gray, axis=0)

# Final confirmation print about input size handling.
print("Final resized grayscale shape:", tuple(resized_gray_single.shape))



## **2. Layer Freezing Techniques**

### **2.1. Controlling Requires Grad**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_02_01.jpg?v=1769748657" width="250">



>* Use requires_grad to choose which layers train
>* Freezing layers saves compute and preserves pretrained features

>* Freeze early layers to keep generic features
>* Train deeper layers to adapt quickly with less data

>* Adjust freezing depth to balance speed, performance
>* Unfreeze more layers when domain differs significantly



In [None]:
#@title Python Code - Controlling Requires Grad

# This script shows controlling requires_grad conceptually.
# We simulate freezing and unfreezing layers using simple flags.
# Focus is on understanding gradient flow decisions.

# TensorFlow is available by default in this environment.
# Uncomment next line if running elsewhere and missing tensorflow.
# !pip install tensorflow==2.20.0.

# Import required modules from TensorFlow.
import tensorflow as tf

# Set a global random seed for deterministic behavior.
tf.random.set_seed(42)

# Print TensorFlow version in one concise line.
print("TensorFlow version:", tf.__version__)

# Create a tiny dummy image batch with fixed values.
images = tf.ones((4, 8, 8, 3))

# Create tiny dummy labels for a three class task.
labels = tf.constant([0, 1, 2, 1], dtype=tf.int32)

# Build a simple sequential convolutional vision model.
base_model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(4, 3, activation="relu"),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(8, 3, activation="relu"),
    tf.keras.layers.GlobalAveragePooling2D(),
])

# Build a small classification head for three classes.
classifier_head = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation="relu"),
    tf.keras.layers.Dense(3, activation="softmax"),
])

# Define a simple optimizer for demonstration.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)

# Define a sparse categorical crossentropy loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

# Helper function to run one training step with control flags.
def train_step(freeze_base, freeze_head):
    # Use GradientTape to record operations for gradients.
    with tf.GradientTape(persistent=False) as tape:
        # Forward pass through base model.
        features = base_model(images, training=True)
        
        # Forward pass through classifier head.
        logits = classifier_head(features, training=True)
        
        # Compute loss between predictions and labels.
        loss = loss_fn(labels, logits)
    
    # Collect trainable variables from both parts.
    base_vars = base_model.trainable_variables
    head_vars = classifier_head.trainable_variables
    
    # Decide which variables should receive gradients.
    train_vars = []
    
    # Append base variables only when not frozen.
    if not freeze_base:
        train_vars.extend(base_vars)
    
    # Append head variables only when not frozen.
    if not freeze_head:
        train_vars.extend(head_vars)
    
    # Compute gradients only for selected variables.
    grads = tape.gradient(loss, train_vars)
    
    # Apply gradients to update selected parameters.
    if train_vars:
        optimizer.apply_gradients(zip(grads, train_vars))
    
    # Return scalar loss value for inspection.
    return float(loss.numpy())

# Run one step with base frozen and head trainable.
loss_freeze_base = train_step(freeze_base=True, freeze_head=False)

# Run one step with both base and head trainable.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
loss_train_all = train_step(freeze_base=False, freeze_head=False)

# Run one step with both base and head frozen.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
loss_freeze_all = train_step(freeze_base=True, freeze_head=True)

# Print concise summary of the three scenarios.
print("Loss with frozen base, trainable head:", loss_freeze_base)

# Show loss when all layers are trainable together.
print("Loss with all layers trainable:", loss_train_all)

# Show loss when everything is frozen, no updates applied.
print("Loss with all layers frozen:", loss_freeze_all)

# Print how many variables received gradient updates each case.
print("Trainable variables when base frozen:", len(classifier_head.trainable_variables))



### **2.2. Feature Extractor Mode**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_02_02.jpg?v=1769748736" width="250">



>* Use pretrained network as frozen feature generator
>* Train small new classifier, saving time and data

>* Freeze backbone, replace and train new head
>* Use frozen features while updating final classifier

>* Best when domains are similar or data scarce
>* Provides strong baseline and enables later fine-tuning



In [None]:
#@title Python Code - Feature Extractor Mode

# This script shows feature extractor mode basics.
# We use TensorFlow to mimic transfer learning.
# Focus on freezing layers and training classifier.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset for a tiny vision example.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset to keep runtime short.
train_samples = 2000
test_samples = 500
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data subset safely.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Validate shapes before further processing.
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm new shapes after expanding channels.
print("Train shape after expand:", x_train.shape)
print("Test shape after expand:", x_test.shape)

# Build a small convolutional backbone model.
backbone_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(16, (3, 3), activation="relu")(backbone_inputs)
x = layers.MaxPooling2D((2, 2))(x)

# Add another convolutional block for richer features.
x = layers.Conv2D(32, (3, 3), activation="relu")(x)
x = layers.MaxPooling2D((2, 2))(x)

# Flatten features to feed the classifier head.
x = layers.Flatten()(x)
backbone_outputs = layers.Dense(64, activation="relu")(x)
backbone_model = keras.Model(backbone_inputs, backbone_outputs)

# Pretend backbone is pretrained by quick training.
pretrain_model = keras.Sequential([
    backbone_model,
    layers.Dense(10, activation="softmax"),
])

# Compile pretraining model with simple settings.
pretrain_model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Pretrain briefly to simulate learned features.
pretrain_model.fit(
    x_train,
    y_train,
    epochs=1,
    batch_size=64,
    verbose=0,
)

# Freeze backbone to enable feature extractor mode.
for layer in backbone_model.layers:
    layer.trainable = False

# Build new classifier head for a new task.
new_inputs = keras.Input(shape=(28, 28, 1))
features = backbone_model(new_inputs, training=False)

# Add small dense layer and new output layer.
features = layers.Dense(32, activation="relu")(features)
new_outputs = layers.Dense(10, activation="softmax")(features)
feature_extractor_model = keras.Model(new_inputs, new_outputs)

# Confirm which layers are trainable or frozen.
trainable_status = [(layer.name, layer.trainable)
                    for layer in feature_extractor_model.layers]
print("Layer trainable flags:")
print(trainable_status)

# Compile feature extractor model for fine tuning.
feature_extractor_model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only the new head while backbone stays frozen.
feature_extractor_model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
)

# Evaluate performance on the small test subset.
loss, acc = feature_extractor_model.evaluate(
    x_test,
    y_test,
    verbose=0,
)

# Print concise summary of feature extractor results.
print("Feature extractor test accuracy:", round(acc, 4))



### **2.3. Progressive Layer Unfreezing**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_02_03.jpg?v=1769748808" width="250">



>* Gradually unfreeze deeper layers during training
>* Preserves pretrained features while reducing overfitting risk

>* Start with higher learning rates on head
>* Unfreeze blocks gradually, stopping when validation worsens

>* Focus training on most important model layers
>* Adapt unfreezing depth to domain shift size



In [None]:
#@title Python Code - Progressive Layer Unfreezing

# This script shows progressive layer unfreezing.
# We use TensorFlow for a tiny vision example.
# Focus on freezing and unfreezing convolutional layers.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a very small subset for quick training.
train_samples = 2000


test_samples = 500
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data subset safely.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Expand grayscale images to three channels.
x_train = np.repeat(x_train[..., np.newaxis], 3, axis=3)
x_test = np.repeat(x_test[..., np.newaxis], 3, axis=3)

# Resize images to match small CNN expectations.
img_size = 32
x_train = tf.image.resize(x_train, (img_size, img_size)).numpy()
x_test = tf.image.resize(x_test, (img_size, img_size)).numpy()

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Confirm shapes are as expected.
print("Train shape:", x_train.shape, "Test shape:", x_test.shape)

# Build a small convolutional backbone model.
def build_backbone():
    inputs = keras.Input(shape=(img_size, img_size, 3))
    x = layers.Conv2D(16, 3, activation="relu")(inputs)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Conv2D(64, 3, activation="relu")(x)
    x = layers.GlobalAveragePooling2D()(x)
    model = keras.Model(inputs, x, name="backbone")
    return model

# Create backbone and classification head model.
backbone = build_backbone()
backbone.trainable = False
inputs = keras.Input(shape=(img_size, img_size, 3))
features = backbone(inputs)
outputs = layers.Dense(10, activation="softmax")(features)
model = keras.Model(inputs, outputs, name="mnist_progressive")

# Compile model with simple optimizer and loss.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train only the new classification head first.
history_stage1 = model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate after first stage training.
loss1, acc1 = model.evaluate(x_test, y_test, verbose=0)
print("Stage1 test accuracy (head only):", round(acc1, 3))

# Progressive unfreezing of deeper backbone layers.
for layer in backbone.layers[-2:]:
    layer.trainable = True

# Recompile with lower learning rate for stability.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=5e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train again with partially unfrozen backbone.
history_stage2 = model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate after progressive unfreezing stage.
loss2, acc2 = model.evaluate(x_test, y_test, verbose=0)
print("Stage2 test accuracy (unfrozen tail):", round(acc2, 3))

# Show how many backbone layers are now trainable.
trainable_count = np.sum([int(l.trainable) for l in backbone.layers])
print("Trainable backbone layers after stage2:", int(trainable_count))



## **3. Fine Tuning Experiments**

### **3.1. Dataset Label Mapping**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_03_01.jpg?v=1769748887" width="250">



>* Define consistent mapping from classes to integers
>* Use same mapping across pipeline to avoid errors

>* Define class list and fixed label indices
>* Keep mapping consistent across pipeline to avoid mislabels

>* Consistent labels are required for fair comparisons
>* Shared canonical mapping improves reliability and reproducibility



In [None]:
#@title Python Code - Dataset Label Mapping

# This script shows dataset label mapping.
# We use TensorFlow image dataset utilities.
# Focus is on clear class index mapping.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and image utilities.
import tensorflow as tf
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)

# Configure TensorFlow global seed.
tf.random.set_seed(seed_value)

# Print TensorFlow version once.
print("TensorFlow version:", tf.__version__)

# Create a tiny directory dataset structure.
base_dir = "tiny_animals_dataset"
class_names = ["cat", "dog"]

# Ensure base directory exists.
os.makedirs(base_dir, exist_ok=True)

# Create one subdirectory per class.
for name in class_names:
    os.makedirs(os.path.join(base_dir, name), exist_ok=True)

# Create small random images and save.
height, width, channels = 32, 32, 3
num_images_per_class = 3

# Helper function to create random image.
def create_random_image(path):
    array = np.random.randint(0, 256, (height, width, channels))
    img = tf.keras.utils.array_to_img(array)
    img.save(path)

# Generate images for each class folder.
for name in class_names:
    folder = os.path.join(base_dir, name)
    for i in range(num_images_per_class):
        filename = f"img_{i}.png"
        create_random_image(os.path.join(folder, filename))

# Build a deterministic label mapping dictionary.
label_to_index = {name: idx for idx, name in enumerate(class_names)}

# Also build the inverse mapping dictionary.
index_to_label = {idx: name for name, idx in label_to_index.items()}

# Print both mappings clearly.
print("Label to index mapping:", label_to_index)
print("Index to label mapping:", index_to_label)

# Load dataset using image_dataset_from_directory.
train_ds = tf.keras.utils.image_dataset_from_directory(
    base_dir,
    labels="inferred",
    label_mode="int",
    class_names=class_names,
    image_size=(height, width),
    batch_size=2,
    shuffle=False,
    seed=seed_value,
)

# Take one small batch from dataset.
images, labels = next(iter(train_ds))

# Validate shapes before further use.
print("Batch images shape:", images.shape)
print("Batch labels shape:", labels.shape)

# Show how numeric labels map to names.
unique_labels = sorted(set(labels.numpy().tolist()))

# Print mapping used by the dataset.
for idx in unique_labels:
    print("Dataset label", idx, "means", index_to_label[int(idx)])

# Build a tiny model head matching label count.
num_classes = len(class_names)
model = tf.keras.Sequential([
    layers.Input(shape=(height, width, channels)),
    layers.Flatten(),
    layers.Dense(8, activation="relu"),
    layers.Dense(num_classes, activation="softmax"),
])

# Compile model with sparse categorical crossentropy.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train briefly to show mapping usage.
history = model.fit(train_ds, epochs=1, verbose=0)

# Evaluate once to complete example.
loss, acc = model.evaluate(train_ds, verbose=0)
print("Tiny model accuracy with mapping:", float(acc))



### **3.2. Choosing Learning Rates**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_03_02.jpg?v=1769748948" width="250">



>* Learning rate controls how strongly weights change
>* Balance preserving pretrained features with adapting quickly

>* Use higher learning rate for new head
>* Use lower rate for backbone; differential learning

>* Treat learning rate choice as systematic experiments
>* Compare runs to balance stability, adaptation, and baselines



In [None]:
#@title Python Code - Choosing Learning Rates

# This script compares two learning rates for fine tuning.
# We use a tiny CNN on a small MNIST subset.
# Focus is on how learning rate changes training.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import tensorflow and keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset for quick experiments.
num_train = 4000
num_val = 1000
x_train = x_train[: num_train + num_val]
y_train = y_train[: num_train + num_val]

# Split subset into train and validation parts.
x_val = x_train[num_train:]
y_val = y_train[num_train:]
x_train = x_train[:num_train]
y_train = y_train[:num_train]

# Normalize images to range zero one.
x_train = x_train.astype("float32") / 255.0
x_val = x_val.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolutional layers.
x_train = np.expand_dims(x_train, axis=-1)
x_val = np.expand_dims(x_val, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before building models.
print("Train shape:", x_train.shape, y_train.shape)

# Define a simple convolutional feature extractor.
def create_backbone():
    inputs = keras.Input(shape=(28, 28, 1))
    x = layers.Conv2D(16, 3, activation="relu")(inputs)
    x = layers.MaxPooling2D(2)(x)
    x = layers.Conv2D(32, 3, activation="relu")(x)
    x = layers.MaxPooling2D(2)(x)
    x = layers.Flatten()(x)
    outputs = layers.Dense(64, activation="relu")(x)
    model = keras.Model(inputs, outputs, name="backbone")
    return model

# Create and train a backbone to simulate pretraining.
backbone = create_backbone()
backbone.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Attach a temporary head for pretraining classification.
pretrain_outputs = layers.Dense(10, activation="softmax")(backbone.output)
pretrain_model = keras.Model(backbone.input, pretrain_outputs)
pretrain_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Pretrain briefly with silent training settings.
pretrain_model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
)

# Freeze backbone to simulate starting fine tuning.
backbone.trainable = False

# Function builds full model with new classification head.
def build_finetune_model(backbone_model):
    inputs = keras.Input(shape=(28, 28, 1))
    x = backbone_model(inputs, training=False)
    outputs = layers.Dense(10, activation="softmax")(x)
    model = keras.Model(inputs, outputs, name="finetune_model")
    return model

# Build two identical models for different learning rates.
model_low_lr = build_finetune_model(backbone)
model_high_lr = build_finetune_model(backbone)

# Choose two learning rates for comparison.
low_lr = 1e-4
high_lr = 1e-2

# Compile models with different learning rates.
model_low_lr.compile(
    optimizer=keras.optimizers.Adam(learning_rate=low_lr),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Compile second model with higher learning rate.
model_high_lr.compile(
    optimizer=keras.optimizers.Adam(learning_rate=high_lr),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train both models briefly on same data.
history_low = model_low_lr.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    validation_data=(x_val, y_val),
    verbose=0,
)

# Train high learning rate model silently.
history_high = model_high_lr.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    validation_data=(x_val, y_val),
    verbose=0,
)

# Evaluate both models on validation data.
val_loss_low, val_acc_low = model_low_lr.evaluate(
    x_val,
    y_val,
    verbose=0,
)

# Evaluate high learning rate model silently.
val_loss_high, val_acc_high = model_high_lr.evaluate(
    x_val,
    y_val,
    verbose=0,
)

# Print concise comparison of learning rate effects.
print("Low lr:", low_lr, "val_acc:", round(val_acc_low, 4))
print("High lr:", high_lr, "val_acc:", round(val_acc_high, 4))




### **3.3. Comparing baselines**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_B/image_03_03.jpg?v=1769749030" width="250">



>* Compare fine tuned models to strong baselines
>* Match setups to isolate benefits of pretraining

>* Include realistic lightweight and non-deep-learning baselines
>* Compare to baselines to judge real-world benefits

>* Compare models across learning speed and robustness
>* Evaluate multiple metrics to judge fine tuning benefits



In [None]:
#@title Python Code - Comparing baselines

# This script compares simple vision baselines.
# It uses TensorFlow for quick experiments.
# Focus is on comparing training from scratch.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a small subset for quick experiments.
train_samples = 4000
test_samples = 1000
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data subset safely.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Validate shapes before further processing.
print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)

# Normalize images to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolution layers.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm new shapes after expansion.
print("Train shape expanded:", x_train.shape)
print("Test shape expanded:", x_test.shape)

# Define a simple CNN model builder.
def build_cnn_model():
    inputs = keras.Input(shape=(28, 28, 1))
    x = layers.Conv2D(16, (3, 3), activation="relu")(inputs)
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Conv2D(32, (3, 3), activation="relu")(x)
    x = layers.MaxPooling2D((2, 2))(x)
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation="relu")(x)
    outputs = layers.Dense(10, activation="softmax")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

# Build baseline model trained from scratch.
baseline_model = build_cnn_model()

# Compile baseline model with simple settings.
baseline_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train baseline model silently for few epochs.
history_baseline = baseline_model.fit(
    x_train,
    y_train,
    epochs=3,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate baseline model on test subset.
loss_base, acc_base = baseline_model.evaluate(
    x_test,
    y_test,
    verbose=0,
)

# Simulate pretrained model by extra pretraining.
pretrained_model = build_cnn_model()

# Compile pretrained model with same settings.
pretrained_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Pretrain on same data for additional epochs.
pretrained_model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
)

# Fine tune pretrained model for few epochs.
history_finetune = pretrained_model.fit(
    x_train,
    y_train,
    epochs=3,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate fine tuned model on test subset.
loss_ft, acc_ft = pretrained_model.evaluate(
    x_test,
    y_test,
    verbose=0,
)

# Print concise comparison of both baselines.
print("Baseline test accuracy:", round(acc_base, 4))
print("Fine tuned test accuracy:", round(acc_ft, 4))
print("Accuracy difference:", round(acc_ft - acc_base, 4))




# <font color="#418FDE" size="6.5" uppercase>**Transfer Learning**</font>


In this lecture, you learned to:
- Load pretrained vision models from torchvision and adapt them to new classification tasks. 
- Configure layer freezing and unfreezing strategies to balance training speed and performance. 
- Fine‑tune a pretrained model on a custom dataset and compare results to training from scratch. 

In the next Module (Module 6), we will go over 'NLP with PyTorch'