##### ARTI 560 - Computer Vision  
## Image Classification using Transfer Learning - Exercise 

### Objective

In this exercise, you will:

1. Select another pretrained model (e.g., VGG16, MobileNetV2, or EfficientNet) and fine-tune it for CIFAR-10 classification.  
You'll find the pretrained models in [Tensorflow Keras Applications Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications).

2. Before training, inspect the architecture using model.summary() and observe:
- Network depth
- Number of parameters
- Trainable vs Frozen layers

3. Then compare its performance with ResNet and the custom CNN.

### Questions:

- Which model achieved the highest accuracy?
- Which model trained faster?
- How might the architecture explain the differences?

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

class_names = [
    "airplane","automobile","bird","cat","deer",
    "dog","frog","horse","ship","truck"
]

y_train = y_train.squeeze().astype("int64")
y_test  = y_test.squeeze().astype("int64")

x_train = x_train.astype("float32")
x_test  = x_test.astype("float32")


data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.05),
    layers.RandomZoom(0.1),
], name="augmentation")

# 1) MobileNetV2 backbone
mobilenet_base = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights='imagenet'
)
mobilenet_base.trainable = False 

# 2) Full Model Architecture
mobilenet_model = keras.Sequential([
    layers.Input(shape=(32, 32, 3)),
    data_augmentation,
    layers.Resizing(224, 224, interpolation="bilinear"),
    layers.Lambda(preprocess_input), 
    mobilenet_base,
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.2),
    layers.Dense(10)
], name="cifar10_mobilenetv2")

mobilenet_model.summary()

# 3) Phase 1: Initial Training
mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

print("Starting Initial Training...")
mobilenet_model.fit(x_train, y_train, validation_split=0.1, epochs=3, batch_size=64)

# 4) Phase 2: Fine-Tuning
mobilenet_base.trainable = True
for layer in mobilenet_base.layers[:100]:
    layer.trainable = False

mobilenet_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

print("\nStarting Fine-Tuning...")
mobilenet_model.fit(x_train, y_train, validation_split=0.1, epochs=6, initial_epoch=3)

Starting Initial Training...
Epoch 1/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 112ms/step - accuracy: 0.5665 - loss: 1.2398 - val_accuracy: 0.8028 - val_loss: 0.5760
Epoch 2/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 111ms/step - accuracy: 0.7175 - loss: 0.8087 - val_accuracy: 0.8214 - val_loss: 0.5189
Epoch 3/3
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 111ms/step - accuracy: 0.7292 - loss: 0.7791 - val_accuracy: 0.8254 - val_loss: 0.5132

Starting Fine-Tuning...
Epoch 4/6
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m151s[0m 99ms/step - accuracy: 0.6627 - loss: 1.0081 - val_accuracy: 0.8330 - val_loss: 0.4707
Epoch 5/6
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 97ms/step - accuracy: 0.7755 - loss: 0.6548 - val_accuracy: 0.8550 - val_loss: 0.4030
Epoch 6/6
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 97ms/step - accuracy: 0.8065 - loss: 0.560

<keras.src.callbacks.history.History at 0x7f3e4c18a270>

In [3]:
# Total depth
backbone_depth = len(mobilenet_base.layers)
total_depth = len(mobilenet_model.layers)

print(f"Backbone Depth: {backbone_depth} layers") 
print(f"Top-level Blocks: {total_depth}")

Backbone Depth: 154 layers
Top-level Blocks: 7


- Network depth = 154 layer

- Number of parameters = 2,270,794

- Trainable vs Frozen layers = Frozen Layers: 154 , Trainable Layers: 0


In [7]:
# Evaluate the model on the test set
test_loss, test_acc = mobilenet_model.evaluate(x_test, y_test, verbose=2)

313/313 - 16s - 51ms/step - accuracy: 0.8749 - loss: 0.3706


Final model Comparision

In [8]:
results = {
    "Custom CNN test acc": 0.8741999864578247,          
    "ResNet fine-tuned test acc": 0.9161999821662903,   
    "MobileNetV2 acc": float(f"{test_acc:.4f}")            
}

print("\n--- Model Comparison Summary ---")
for model_name, accuracy in results.items():
    print(f"{model_name}: {accuracy:.4f}")

best_model = max(results, key=results.get)
print(f"\nModel with highest accuracy: {best_model}")


--- Model Comparison Summary ---
Custom CNN test acc: 0.8742
ResNet fine-tuned test acc: 0.9162
MobileNetV2 acc: 0.8749

Model with highest accuracy: ResNet fine-tuned test acc


- Which model achieved the highest accuracy? ResNet
- Which model trained faster? Custom CNN
- How might the architecture explain the differences? small number of parameter will train faster like custom cnn, and using risduals in resnet made it more accurate.