# Lab 6 (Week 7) — Convolutional Neural Networks (CNNs) — Solution v6

Generated 2025-08-16 14:52:43.

This notebook strictly follows the student lab instructions for F.1, F.2, and F.3.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NSC-CS-BS/CSB410_LabsSolutions/blob/main/NSC-BS-CS/CSB410_LabsSolutions/blob/main/Lab6/lab6_solution_v6.ipynb)


## F.1 LeNet-Style CNN on MNIST — Strict


In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32') / 255.0
x_train = np.expand_dims(x_train, -1)
x_test  = np.expand_dims(x_test, -1)

model_lenet = models.Sequential([
    layers.Conv2D(32, (5,5), activation='relu', input_shape=(28,28,1)),
    layers.MaxPool2D(pool_size=(2,2)),
    layers.Conv2D(64, (5,5), activation='relu'),
    layers.MaxPool2D(pool_size=(2,2)),
    layers.Flatten(),
    layers.Dense(120, activation='relu'),
    layers.Dense(84, activation='relu'),
    layers.Dense(10, activation='softmax'),
])
model_lenet.summary()
model_lenet.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model_lenet.fit(x_train, y_train, validation_split=0.1, epochs=3, batch_size=128, verbose=2)
test_loss, test_acc = model_lenet.evaluate(x_test, y_test, verbose=0)
print({'mnist_test_accuracy': float(test_acc), 'mnist_test_loss': float(test_loss)})


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/3
422/422 - 20s - 46ms/step - accuracy: 0.9357 - loss: 0.2176 - val_accuracy: 0.9815 - val_loss: 0.0598
Epoch 2/3
422/422 - 1s - 3ms/step - accuracy: 0.9832 - loss: 0.0553 - val_accuracy: 0.9877 - val_loss: 0.0425
Epoch 3/3
422/422 - 1s - 3ms/step - accuracy: 0.9880 - loss: 0.0383 - val_accuracy: 0.9855 - val_loss: 0.0491
{'mnist_test_accuracy': 0.9850999712944031, 'mnist_test_loss': 0.04505486041307449}


## F.2 AlexNet-like CNN (CIFAR-10) — Strict


In [2]:
from tensorflow import keras
from tensorflow.keras import layers

(x_train_c, y_train_c), (x_test_c, y_test_c) = keras.datasets.cifar10.load_data()
y_train_c = y_train_c.flatten(); y_test_c = y_test_c.flatten()
x_train_c = x_train_c.astype('float32') / 255.0
x_test_c  = x_test_c.astype('float32') / 255.0

inputs = layers.Input(shape=(32,32,3))
x = layers.Conv2D(64,(3,3),strides=1,activation='relu',padding='same')(inputs)
x = layers.MaxPool2D(2,2)(x)
x = layers.Conv2D(192,(3,3),activation='relu',padding='same')(x)
x = layers.MaxPool2D(2,2)(x)
x = layers.Conv2D(384,(3,3),activation='relu',padding='same')(x)
x = layers.Conv2D(256,(3,3),activation='relu',padding='same')(x)
x = layers.Conv2D(256,(3,3),activation='relu',padding='same')(x)
x = layers.MaxPool2D(2,2)(x)
x = layers.Flatten()(x)
x = layers.Dense(1024,'relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(512,'relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10,'softmax')(x)
model_alex = keras.Model(inputs, outputs)
model_alex.summary()
model_alex.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_c = model_alex.fit(x_train_c, y_train_c, validation_split=0.1, epochs=10, batch_size=128, verbose=2)
test_loss_c, test_acc_c = model_alex.evaluate(x_test_c, y_test_c, verbose=0)
print({'cifar10_test_accuracy': float(test_acc_c), 'cifar10_test_loss': float(test_loss_c)})


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


Epoch 1/10
352/352 - 28s - 78ms/step - accuracy: 0.3470 - loss: 1.7300 - val_accuracy: 0.5050 - val_loss: 1.3490
Epoch 2/10
352/352 - 2s - 6ms/step - accuracy: 0.5506 - loss: 1.2360 - val_accuracy: 0.6356 - val_loss: 1.0035
Epoch 3/10
352/352 - 2s - 6ms/step - accuracy: 0.6421 - loss: 1.0191 - val_accuracy: 0.6820 - val_loss: 0.8973
Epoch 4/10
352/352 - 2s - 6ms/step - accuracy: 0.7027 - loss: 0.8488 - val_accuracy: 0.7048 - val_loss: 0.8561
Epoch 5/10
352/352 - 2s - 6ms/step - accuracy: 0.7453 - loss: 0.7359 - val_accuracy: 0.7398 - val_loss: 0.7700
Epoch 6/10
352/352 - 2s - 6ms/step - accuracy: 0.7819 - loss: 0.6318 - val_accuracy: 0.7470 - val_loss: 0.7518
Epoch 7/10
352/352 - 2s - 6ms/step - accuracy: 0.8078 - loss: 0.5527 - val_accuracy: 0.7644 - val_loss: 0.7114
Epoch 8/10
352/352 - 2s - 6ms/step - accuracy: 0.8351 - loss: 0.4773 - val_accuracy: 0.7590 - val_loss: 0.7621
Epoch 9/10
352/352 - 2s - 6ms/step - accuracy: 0.8554 - loss: 0.4170 - val_accuracy: 0.7650 - val_loss: 0.7509

## F.3 Transfer Learning & Fine-Tuning (MobileNetV2, 224×224) — Strict


In [5]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

(x_train_t, y_train_t), (x_test_t, y_test_t) = keras.datasets.cifar10.load_data()
y_train_t = y_train_t.flatten(); y_test_t = y_test_t.flatten()

IMG_SIZE = 224

def resize_and_preprocess(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = keras.applications.mobilenet_v2.preprocess_input(image)
    return image, label

# Create tf.data.Dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train_t, y_train_t))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test_t, y_test_t))

# Apply preprocessing and batching
BATCH_SIZE = 32
train_dataset = train_dataset.map(resize_and_preprocess).batch(BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)
test_dataset = test_dataset.map(resize_and_preprocess).batch(BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)


base = keras.applications.MobileNetV2(weights='imagenet', include_top=False, input_shape=(IMG_SIZE,IMG_SIZE,3))
base.trainable = False
inputs = keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = base(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(10, activation='softmax')(x)
model_tl = keras.Model(inputs, outputs)
model_tl.summary()
model_tl.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Use the created datasets for training
history_tl = model_tl.fit(train_dataset, validation_data=test_dataset, epochs=3, verbose=2)

base.trainable = True
for layer in base.layers[:-30]:
    layer.trainable = False
model_tl.compile(optimizer=keras.optimizers.Adam(1e-5), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Use the created datasets for fine-tuning
history_ft = model_tl.fit(train_dataset, validation_data=test_dataset, epochs=2, verbose=2)
test_loss_tl, test_acc_tl = model_tl.evaluate(test_dataset, verbose=0)
print({'transfer_learning_final_test_accuracy': float(test_acc_tl), 'transfer_learning_test_loss': float(test_loss_tl)})

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


Epoch 1/3
1563/1563 - 48s - 31ms/step - accuracy: 0.8085 - loss: 0.5626 - val_accuracy: 0.8374 - val_loss: 0.4784
Epoch 2/3
1563/1563 - 19s - 12ms/step - accuracy: 0.8529 - loss: 0.4267 - val_accuracy: 0.8439 - val_loss: 0.4619
Epoch 3/3
1563/1563 - 19s - 12ms/step - accuracy: 0.8631 - loss: 0.3956 - val_accuracy: 0.8440 - val_loss: 0.4585
Epoch 1/2
1563/1563 - 51s - 32ms/step - accuracy: 0.8351 - loss: 0.4808 - val_accuracy: 0.8664 - val_loss: 0.4132
Epoch 2/2
1563/1563 - 22s - 14ms/step - accuracy: 0.8935 - loss: 0.3137 - val_accuracy: 0.8771 - val_loss: 0.3724
{'transfer_learning_final_test_accuracy': 0.8770999908447266, 'transfer_learning_test_loss': 0.37235385179519653}


## Discussion (answer in complete sentences)

1. Compare LeNet and AlexNet design choices (kernel sizes, depth, FC layers). How do these affect compute and accuracy?  
2. Why does transfer learning typically outperform training from scratch on small datasets? Which layers would you fine-tune and why?  
3. If your AlexNet-like model underperforms, which two architecture or training changes would you try first, and why?

## Discussion Questions & Sample Answers

---

### 1. Compare LeNet and AlexNet design choices (kernel sizes, depth, FC layers). How do these affect compute and accuracy?

**Sample Answer:**

- **LeNet (1990s):**
  - Shallow network with 2 convolutional layers (5×5 kernels).
  - Small number of filters (tens).
  - Fully connected layers with ~120 and 84 units.
  - Designed for simple tasks like digit recognition (MNIST).

- **AlexNet (2012):**
  - Deeper network with 5 convolutional layers.
  - First conv layer uses 11×11 kernels (stride 4), later 5×5 and 3×3.
  - Many more filters (hundreds).
  - Very large fully connected layers (4096 units each).
  - Designed for large-scale ImageNet classification.

- **Effect:**
  - Deeper networks with more filters → higher capacity and accuracy on complex data.
  - Large kernels (like 11×11) capture wider patterns but are more expensive to compute.
  - Huge fully connected layers increase parameter count, boosting capacity but also risk of overfitting and memory use.

---

### 2. Why does transfer learning typically outperform training from scratch on small datasets? Which layers would you fine-tune and why?

**Sample Answer:**

- Pretrained models (e.g., trained on ImageNet) already capture general features such as edges, textures, and shapes.
- Small datasets don’t have enough data to learn these features from scratch.
- Transfer learning gives a strong initialization, improving accuracy and training speed.

**Fine-tuning strategy:**
- Keep **lower convolutional layers frozen** (they learn generic features).
- Fine-tune **higher convolutional layers** (they adapt to task-specific features).
- Always re-train the **classification head** for the new dataset’s classes.

---

### 3. If your AlexNet-like model underperforms, which two architecture or training changes would you try first, and why?

**Sample Answer:**

1. **Reduce FC layer size or use Global Average Pooling (GAP):**
   - AlexNet’s 4096-unit FC layers are very large and prone to overfitting.
   - GAP or smaller FC layers reduce parameters and improve generalization.

2. **Add more regularization or augmentation:**
   - Techniques like dropout, weight decay, and stronger data augmentation (flips, crops, color jitter) improve robustness and reduce overfitting.

*(Alternative if memory is an issue: replace large kernels like 11×11 with smaller 7×7 or 3×3 to save compute.)*