##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

Load and Prepare CIFAR-10 Dataset

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Load CIFAR-10
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

num_classes = 10
IMG_SIZE = 96   # resized to fit pretrained models
BATCH_SIZE = 64

def preprocess(images, labels):
    images = tf.image.resize(images, (IMG_SIZE, IMG_SIZE))
    images = tf.cast(images, tf.float32)
    images = tf.keras.applications.mobilenet_v2.preprocess_input(images)
    return images, labels

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)) \
            .shuffle(10000).batch(BATCH_SIZE) \
            .map(preprocess).prefetch(tf.data.AUTOTUNE)

test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)) \
            .batch(BATCH_SIZE) \
            .map(preprocess).prefetch(tf.data.AUTOTUNE)


Load Pretrained Model (Backbone)

In [2]:
mobilenet_base = tf.keras.applications.MobileNetV2(
    input_shape=(IMG_SIZE, IMG_SIZE, 3),
    include_top=False,
    weights="imagenet"
)

# Freeze all backbone layers initially
mobilenet_base.trainable = False

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_96_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


Build Classification Model

In [3]:
inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))

x = mobilenet_base(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)

mobilenet_model = keras.Model(inputs, outputs)

Inspect Architecture Before Training

In [4]:
mobilenet_model.summary()

print("Network depth:", len(mobilenet_model.layers))
print("Total parameters:", mobilenet_model.count_params())
print("Trainable parameters:",
      sum(tf.size(w).numpy() for w in mobilenet_model.trainable_weights))
print("Frozen parameters:",
      sum(tf.size(w).numpy() for w in mobilenet_model.non_trainable_weights))

Network depth: 5
Total parameters: 2270794
Trainable parameters: 12810
Frozen parameters: 2257984


Train with Frozen Backbone (Transfer Learning Stage)

In [5]:
mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

history = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=5
)


Epoch 1/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 177ms/step - accuracy: 0.7900 - loss: 0.6242 - val_accuracy: 0.8495 - val_loss: 0.4295
Epoch 2/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m133s[0m 171ms/step - accuracy: 0.8457 - loss: 0.4511 - val_accuracy: 0.8535 - val_loss: 0.4332
Epoch 3/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m178s[0m 227ms/step - accuracy: 0.8564 - loss: 0.4205 - val_accuracy: 0.8568 - val_loss: 0.4044
Epoch 4/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 228ms/step - accuracy: 0.8589 - loss: 0.4076 - val_accuracy: 0.8585 - val_loss: 0.4015
Epoch 5/5
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m193s[0m 246ms/step - accuracy: 0.8633 - loss: 0.4015 - val_accuracy: 0.8591 - val_loss: 0.4105


Fine-Tuning Stage

In [6]:
# Unfreeze backbone
mobilenet_base.trainable = True

# Freeze early layers and fine-tune last layers only
for layer in mobilenet_base.layers[:-30]:
    layer.trainable = False

print("Trainable layers in backbone:",
      sum(l.trainable for l in mobilenet_base.layers),
      "/", len(mobilenet_base.layers))

mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

history_ft = mobilenet_model.fit(
    train_ds,
    validation_data=test_ds,
    epochs=3
)


Trainable layers in backbone: 30 / 154
Epoch 1/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m244s[0m 303ms/step - accuracy: 0.8169 - loss: 0.5956 - val_accuracy: 0.8556 - val_loss: 0.4694
Epoch 2/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m245s[0m 313ms/step - accuracy: 0.8601 - loss: 0.4300 - val_accuracy: 0.8727 - val_loss: 0.4052
Epoch 3/3
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m244s[0m 312ms/step - accuracy: 0.8765 - loss: 0.3661 - val_accuracy: 0.8796 - val_loss: 0.3820


1) Which model achieved the highest accuracy?

ResNet50V2 fine-tuned achieved the highest accuracy: test accuracy = 0.9162.

2) Which model trained faster?

ResNet trained faster than MobileNetV2

3) How does architecture explain the differences?

The differences come from the model architecture. ResNet uses residual connections that allow deeper networks to learn more complex features, leading to higher accuracy. MobileNetV2 is designed to be lightweight and efficient, so it may achieve slightly lower accuracy. The custom CNN has a simpler structure, so it learns fewer features and usually performs worse than pretrained models.