<a href="https://colab.research.google.com/github/judyc4986/mec2-projects/blob/main/Judy%20Cheng_MLE_MiniProject_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mini Project: Transfer Learning with Keras

Transfer learning is a machine learning technique where a model trained on one task is used as a starting point to solve a different but related task. Instead of training a model from scratch, transfer learning leverages the knowledge learned from the source task and applies it to the target task. This approach is especially useful when the target task has limited data or computational resources.

In transfer learning, the pre-trained model, also known as the "base model" or "source model," is typically trained on a large dataset and a more general problem (e.g., image classification on ImageNet, a vast dataset with millions of labeled images). The knowledge learned by the base model in the form of feature representations and weights captures common patterns and features in the data.

To perform transfer learning, the following steps are commonly followed:

1. Pre-training: The base model is trained on a source task using a large dataset, which can take a considerable amount of time and computational resources.

2. Feature Extraction: After pre-training, the base model is used as a feature extractor. The last few layers (classifier layers) of the model are discarded, and the remaining layers (feature extraction layers) are retained. These layers serve as feature extractors, producing meaningful representations of the data.

3. Fine-tuning: The feature extraction layers and sometimes some of the earlier layers are connected to a new set of layers, often called the "classifier layers" or "task-specific layers." These layers are randomly initialized, and the model is trained on the target task with a smaller dataset. The weights of the base model can be frozen during fine-tuning, or they can be allowed to be updated with a lower learning rate to fine-tune the model for the target task.

Transfer learning has several benefits:

1. Reduced training time and resource requirements: Since the base model has already learned generic features, transfer learning can save time and resources compared to training a model from scratch.

2. Improved generalization: Transfer learning helps the model generalize better to the target task, especially when the target dataset is small and dissimilar from the source dataset.

3. Better performance: By starting from a model that is already trained on a large dataset, transfer learning can lead to better performance on the target task, especially in scenarios with limited data.

4. Effective feature extraction: The feature extraction layers of the pre-trained model can serve as powerful feature extractors for different tasks, even when the task domains differ.

Transfer learning is commonly used in various domains, including computer vision, natural language processing (NLP), and speech recognition, where pre-trained models are fine-tuned for specific applications like object detection, sentiment analysis, or speech-to-text.

In this mini-project you will perform fine-tuning using Keras with a pre-trained VGG16 model on the CIFAR-10 dataset.

First, import all the libraries you'll need.

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

The CIFAR-10 dataset is a widely used benchmark dataset in the field of computer vision and machine learning. It stands for the "Canadian Institute for Advanced Research 10" dataset. CIFAR-10 was created by researchers at the CIFAR institute and was originally introduced as part of the Neural Information Processing Systems (NIPS) 2009 competition.

The dataset consists of 60,000 color images, each of size 32x32 pixels, belonging to ten different classes. Each class contains 6,000 images. The ten classes in CIFAR-10 are:

1. Airplane
2. Automobile
3. Bird
4. Cat
5. Deer
6. Dog
7. Frog
8. Horse
9. Ship
10. Truck

The images are evenly distributed across the classes, making CIFAR-10 a balanced dataset. The dataset is divided into two sets: a training set and a test set. The training set contains 50,000 images, while the test set contains the remaining 10,000 images.

CIFAR-10 is often used for tasks such as image classification, object recognition, and transfer learning experiments. The relatively small size of the images and the variety of classes make it a challenging dataset for training machine learning models, especially deep neural networks. It also serves as a good dataset for teaching and learning purposes due to its manageable size and straightforward class labels.

Here are your tasks:

1. Load the CIFAR-10 dataset after referencing the documentation [here](https://keras.io/api/datasets/cifar10/).
2. Normalize the pixel values so they're all in the range [0, 1].
3. Apply One Hot Encoding to the train and test labels using the [to_categorical](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) function.
4. Further split the the training data into training and validation sets using [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). Use only 10% of the data for validation.  

In [6]:
from tensorflow.keras.datasets import cifar10

# Load dataset (downloads automatically if not available)
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print("Training data shape:", x_train.shape)
print("Test data shape:", x_test.shape) # Load the CIFAR-10 dataset

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)


In [7]:
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0 # Normalize the pixel values to [0, 1]

In [8]:
from tensorflow.keras.utils import to_categorical

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10) # One-hot encode the labels

In [9]:
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train, test_size=0.1, random_state=42
)

print("Train shape:", x_train.shape)
print("Validation shape:", x_val.shape) # Split the data into training and validation sets

Train shape: (45000, 32, 32, 3)
Validation shape: (5000, 32, 32, 3)


VGG16 (Visual Geometry Group 16) is a deep convolutional neural network architecture that was developed by the Visual Geometry Group at the University of Oxford. It was proposed by researchers Karen Simonyan and Andrew Zisserman in their paper titled "Very Deep Convolutional Networks for Large-Scale Image Recognition," which was presented at the International Conference on Learning Representations (ICLR) in 2015.

The VGG16 architecture gained significant popularity for its simplicity and effectiveness in image classification tasks. It was one of the pioneering models that demonstrated the power of deeper neural networks for visual recognition tasks.

Key characteristics of the VGG16 architecture:

1. Architecture: VGG16 consists of a total of 16 layers, hence the name "16." These layers are stacked one after another, forming a deep neural network.

2. Convolutional Layers: The main building blocks of VGG16 are the convolutional layers. It primarily uses 3x3 convolutional filters throughout the network, which allows it to capture local features effectively.

3. Max Pooling: After each set of convolutional layers, VGG16 applies max-pooling layers with 2x2 filters and stride 2, which halves the spatial dimensions (width and height) of the feature maps and reduces the number of parameters.

4. Fully Connected Layers: Towards the end of the network, VGG16 has fully connected layers that act as a classifier to make predictions based on the learned features.

5. Activation Function: The network uses the Rectified Linear Unit (ReLU) activation function for all hidden layers, which helps with faster convergence during training.

6. Number of Filters: The number of filters in each convolutional layer is relatively small compared to more recent architectures like ResNet or InceptionNet. However, stacking multiple layers allows VGG16 to learn complex hierarchical features.

7. Output Layer: The output layer consists of 1000 units, corresponding to 1000 ImageNet classes. VGG16 was originally trained on the large-scale ImageNet dataset, which contains millions of images from 1000 different classes.

VGG16 was instrumental in showing that increasing the depth of a neural network can significantly improve its performance on image recognition tasks. However, the main drawback of VGG16 is its high number of parameters, making it computationally expensive and memory-intensive to train. Despite this limitation, VGG16 remains an essential benchmark architecture and has paved the way for even deeper and more efficient models in the field of computer vision, such as ResNet, DenseNet, and EfficientNet.

Here are your tasks:

1. Load [VGG16](https://keras.io/api/applications/vgg/#vgg16-function) as a base model. Make sure to exclude the top layer.
2. Freeze all the layers in the base model. We'll be using these weights as a feature extraction layer to forward to layers that are trainable.

In [10]:
base_model = VGG16(
    weights='imagenet',        # Pretrained on ImageNet
    include_top=False,         # Exclude fully connected (top) layers
    input_shape=(32, 32, 3)    # CIFAR-10 image size
)

print("✅ VGG16 base model loaded successfully (without top layers).")
base_model.summary() # Load the pre-trained VGG16 model (excluding the top classifier)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
✅ VGG16 base model loaded successfully (without top layers).


In [11]:
for layer in base_model.layers:
    layer.trainable = False

# Confirm that all layers are frozen
trainable_layers = [layer.name for layer in base_model.layers if layer.trainable]
print(f"\nAll base layers are frozen. Trainable layers: {len(trainable_layers)}")   # Freeze the layers in the base model


All base layers are frozen. Trainable layers: 0


Now, we'll add some trainable layers to the base model.

1. Using the base model, add a [GlobalAveragePooling2D](https://keras.io/api/layers/pooling_layers/global_average_pooling2d/) layer, followed by a [Dense](https://keras.io/api/layers/core_layers/dense/) layer of length 256 with ReLU activation. Finally, add a classification layer with 10 units, corresponding to the 10 CIFAR-10 classes, with softmax activation.
2. Create a Keras [Model](https://keras.io/api/models/model/) that takes in approproate inputs and outputs.

In [12]:
x = GlobalAveragePooling2D()(base_model.output)    # Add a global average pooling layer

In [13]:
x = Dense(256, activation='relu')(x)    # Add a fully connected layer with 256 units and ReLU activation


In [14]:
output = Dense(10, activation='softmax')(x)  # Add the final classification layer with 10 units (for CIFAR-10 classes) and softmax activation

In [15]:
model = Model(inputs=base_model.input, outputs=output)
print("✅ VGG16 base + custom classification head created successfully!")
model.summary() # Create the fine-tuned model

✅ VGG16 base + custom classification head created successfully!


With your model complete it's time to train it and assess its performance.

1. Compile your model using an appropriate loss function. Feel free to play around with the optimizer, but a good starting optimizer might be Adam with a learning rate of 0.001.
2. Fit your model on the training data. Use the validation data to print the accuracy for each epoch. Try training for 10 epochs. Note, training can take a few hours so go ahead and grab a cup of coffee.

**Optional**: See if you can implement an [Early Stopping](https://keras.io/api/callbacks/early_stopping/) criteria as a callback function.

In [29]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32') / 255.0
y_train = to_categorical(y_train, 10)
y_test  = to_categorical(y_test, 10)

# --- ✅ SPEED BOOST 1: Use only a subset of data ---
# Use 20% of training data for faster runs
subset_size = int(0.2 * len(x_train))
x_train = x_train[:subset_size]
y_train = y_train[:subset_size]

# Split train into train/validation sets
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.1, random_state=42)

print(f"Training set: {x_train.shape}, Validation set: {x_val.shape}")

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
for layer in base_model.layers:
    layer.trainable = False

# --- Add custom classification head ---
x = GlobalAveragePooling2D()(base_model.output)
x = Dense(128, activation='relu')(x)   # ✅ SPEED BOOST 2: smaller dense layer
output = Dense(10, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=output)


model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)  # Compile the model

Training set: (9000, 32, 32, 3), Validation set: (1000, 32, 32, 3)


In [23]:
# --- Early Stopping to prevent long training ---
early_stop = EarlyStopping(
    monitor='val_accuracy',
    patience=2,
    restore_best_weights=True,
    verbose=1
)

# --- Train the model ---
history = model.fit(
    x_train, y_train,
    epochs=10,                    # limit to 10 epochs
    batch_size=64,                # ✅ SPEED BOOST 3: balanced mini-batches
    validation_data=(x_val, y_val),
    callbacks=[early_stop],
    verbose=1
)  # Train the model

Epoch 1/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 911ms/step - accuracy: 0.3072 - loss: 1.9630 - val_accuracy: 0.5300 - val_loss: 1.4369
Epoch 2/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 822ms/step - accuracy: 0.5117 - loss: 1.4200 - val_accuracy: 0.5510 - val_loss: 1.3258
Epoch 3/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m151s[0m 885ms/step - accuracy: 0.5531 - loss: 1.2951 - val_accuracy: 0.5560 - val_loss: 1.2849
Epoch 4/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m135s[0m 837ms/step - accuracy: 0.5801 - loss: 1.2191 - val_accuracy: 0.5770 - val_loss: 1.2265
Epoch 5/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m148s[0m 883ms/step - accuracy: 0.5945 - loss: 1.1697 - val_accuracy: 0.5830 - val_loss: 1.1999
Epoch 6/10
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 888ms/step - accuracy: 0.6087 - loss: 1.1365 - val_accuracy: 0.5710 - val_loss: 1.2029
Epoc

With your model trained, it's time to assess how well it performs on the test data.

1. Use your trained model to calculate the accuracy on the test set. Is the model performance better than random?
2. Experiment! See if you can tweak your model to improve performance.  

In [28]:

test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=1)

print("\n📊 --- Model Evaluation on Test Data ---")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

if test_accuracy > 0.10:
    print("\n🎯 The model performs better than random guessing (10% accuracy).")
else:
    print("\n⚠️ The model performs at or below random chance — it’s not learning effectively yet.")

print("\n🧩 Interpretation:")
print("- Random guessing across 10 CIFAR-10 classes gives 10% accuracy.")
print("- If your test accuracy is 0.6–0.75, that’s solid performance for a frozen VGG16 base.")
print("- Accuracy above 0.80 means your model generalizes very well!")
print("- You can fine-tune further to improve this performance.")

print("\n🚀 Experiment! Try these methods to boost performance:")

print("\n1️⃣ Fine-tuning the top layers:")
print("   → Unfreeze the last few convolutional blocks and train with a lower learning rate.")
print("""
for layer in base_model.layers[-4:]:
    layer.trainable = True
model.compile(optimizer=Adam(learning_rate=1e-5),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
""")
print("   🔍 Explanation: Allows the model to adjust high-level filters to CIFAR-10 features, improving accuracy by 5–10%.")

print("\n2️⃣ Data Augmentation:")
print("   → Randomly flip, rotate, or shift images to create new samples.")
print("""
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)
datagen.fit(x_train)
""")
print("   🔍 Explanation: Increases data diversity, helps prevent overfitting, and improves robustness.")

print("\n3️⃣ Regularization:")
print("   → Add Dropout or L2 penalty to dense layers to reduce overfitting.")
print("""
from tensorflow.keras.layers import Dropout
x = Dense(128, activation='relu', kernel_regularizer='l2')(x)
x = Dropout(0.5)(x)
""")
print("   🔍 Explanation: Forces the network to learn more robust, generalizable features.")

print("\n4️⃣ Train Longer with Early Stopping:")
print("   → Train for up to 20–30 epochs, but stop automatically when val_loss stops improving.")
print("""
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
""")
print("   🔍 Explanation: Captures the best model state without wasting extra training time.")

print("\n5️⃣ Use more training data:")
print("   → If you’re using only 20% of the dataset for speed, try 50% or full data once runtime allows.")
print("   🔍 Explanation: More data improves generalization, especially for deeper CNNs.")

print("\n✅ Summary:")
print("- The model’s test accuracy confirms whether transfer learning was effective.")
print("- If it’s well above 10%, VGG16 successfully learned image features from CIFAR-10.")
print("- Fine-tuning, data augmentation, and longer training can raise accuracy to 80–85%.")


  # Evaluate the model on the test set

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m118s[0m 377ms/step - accuracy: 0.5603 - loss: 1.2743

📊 --- Model Evaluation on Test Data ---
Test Loss: 1.2839
Test Accuracy: 0.5575

🎯 The model performs better than random guessing (10% accuracy).

🧩 Interpretation:
- Random guessing across 10 CIFAR-10 classes gives 10% accuracy.
- If your test accuracy is 0.6–0.75, that’s solid performance for a frozen VGG16 base.
- Accuracy above 0.80 means your model generalizes very well!
- You can fine-tune further to improve this performance.

🚀 Experiment! Try these methods to boost performance:

1️⃣ Fine-tuning the top layers:
   → Unfreeze the last few convolutional blocks and train with a lower learning rate.

for layer in base_model.layers[-4:]:
    layer.trainable = True
model.compile(optimizer=Adam(learning_rate=1e-5),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

   🔍 Explanation: Allows the model to adjust high-level filters to CIFAR-10 f