# AI Lab Assignment H5: Convolutional Neural Networks

**Objective:** In this lab, you will implement and compare two different convolutional neural network (CNN) architectures. You will first design your **own CNN**, and then implement a known architecture (**AlexNet-inspired**) using TensorFlow.

**Scenario:** You are now a machine learning engineer at DaciaVision Analytics. Your new task is to develop an image classification system for object recognition. You will use the **CIFAR-10 and MNIST datasets** as your base and design two CNNs: one original model designed by you, and a recreation of the historical AlexNet.

**Your goal:**
* Achieve as high accuracy as possible.
* Compare the performance between your custom CNN and AlexNet.
* Identify weakly performing classes and improve the model through data augmentation.

If training on the full CIFAR-10 dataset is too slow, you may sample a **fixed percentage from each class** (e.g., 20-30%). For this assignment, we will sample **25%** of the data per class.

## 0. Library Imports

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import tensorflow as tf
from tensorflow.keras.datasets import cifar10, mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization, Input
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import time

## 1. Data Preprocessing

We will load both MNIST and CIFAR-10 datasets. For faster experimentation, we'll sample 25% of the data from each class. Pixel values will be normalized to the [0, 1] range.

In [4]:
def load_and_preprocess_data(dataset_name, sample_percentage=0.25):

    if dataset_name == 'cifar10':
        (x_train, y_train), (x_test, y_test) = cifar10.load_data()
        num_classes = 10
        class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
        input_shape = (32, 32, 3)

    elif dataset_name == 'mnist':
        (x_train, y_train), (x_test, y_test) = mnist.load_data()
        x_train = x_train[..., np.newaxis] # Add channel dimension
        x_test = x_test[..., np.newaxis]
        num_classes = 10
        class_names = [str(i) for i in range(10)]
        input_shape = (28, 28, 1)
    else:
        raise ValueError("Dataset not supported. Choose 'cifar10' or 'mnist'.")

    # Normalize pixel values
    x_train = x_train.astype('float32') / 255.0
    x_test = x_test.astype('float32') / 255.0
    
    y_train = y_train.flatten() # Ensure y_train is 1D for stratified sampling
    y_test = y_test.flatten()

    # Sample data if sample_percentage is less than 1.0
    if sample_percentage < 1.0:
        # Stratified sampling for training data
        _, x_train_sampled, _, y_train_sampled = train_test_split(
            x_train, y_train, 
            test_size=sample_percentage, 
            stratify=y_train, 
            random_state=42
        )
        x_train, y_train = x_train_sampled, y_train_sampled
        
        # Stratified sampling for test data
        _, x_test_sampled, _, y_test_sampled = train_test_split(
            x_test, y_test, 
            test_size=sample_percentage, # Using same percentage for test for consistency in size reduction, adjust if needed
            stratify=y_test, 
            random_state=42
        )
        x_test, y_test = x_test_sampled, y_test_sampled
        
    print(f"\n--- {dataset_name.upper()} Data --- (Sampled at {sample_percentage*100}%)")
    print(f"x_train shape: {x_train.shape}")
    print(f"y_train shape: {y_train.shape}")
    print(f"x_test shape: {x_test.shape}")
    print(f"y_test shape: {y_test.shape}")
    
    return (x_train, y_train), (x_test, y_test), num_classes, class_names, input_shape

# Load datasets
SAMPLE_PERCENTAGE = 0.25 # Using 25% of the data
(x_train_cifar, y_train_cifar), (x_test_cifar, y_test_cifar), num_classes_cifar, class_names_cifar, input_shape_cifar = load_and_preprocess_data('cifar10', SAMPLE_PERCENTAGE)
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist), num_classes_mnist, class_names_mnist, input_shape_mnist = load_and_preprocess_data('mnist', SAMPLE_PERCENTAGE)


--- CIFAR10 Data --- (Sampled at 25.0%)
x_train shape: (12500, 32, 32, 3)
y_train shape: (12500,)
x_test shape: (2500, 32, 32, 3)
y_test shape: (2500,)

--- MNIST Data --- (Sampled at 25.0%)
x_train shape: (15000, 28, 28, 1)
y_train shape: (15000,)
x_test shape: (2500, 28, 28, 1)
y_test shape: (2500,)


## 2. Model Implementation

We will implement two CNN architectures: a custom CNN and an AlexNet-inspired CNN. We will focus our detailed comparison and augmentation efforts on the CIFAR-10 dataset due to its higher complexity.

### Helper Functions for Plotting and Evaluation

In [6]:
def plot_history(history, title_prefix):
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Loss", "Accuracy"))

    fig.add_trace(go.Scatter(y=history.history['loss'], name='Train Loss', mode='lines'), row=1, col=1)
    fig.add_trace(go.Scatter(y=history.history['val_loss'], name='Val Loss', mode='lines'), row=1, col=1)

    fig.add_trace(go.Scatter(y=history.history['accuracy'], name='Train Accuracy', mode='lines'), row=1, col=2)
    fig.add_trace(go.Scatter(y=history.history['val_accuracy'], name='Val Accuracy', mode='lines'), row=1, col=2)

    fig.update_layout(title_text=f'{title_prefix} Training History', height=400, template="plotly_white")
    fig.show()

In [7]:
def plot_confusion_matrix_px(cm, class_names, title):
    fig = px.imshow(cm, 
                      labels=dict(x="Predicted Label", y="True Label", color="Count"),
                      x=class_names, 
                      y=class_names,
                      text_auto=True,
                      color_continuous_scale='Blues')
    fig.update_layout(title=title, template="plotly_white")
    fig.show()

### A. Custom CNN Model

Design your own CNN architecture with:
* Minimum 2 convolutional layers.
* Pooling layers.
* Dense output layer (softmax).

We will experiment with a few variations for the custom CNN on CIFAR-10 to demonstrate hyperparameter exploration.

In [8]:
def create_custom_cnn(input_shape, num_classes, filters1=32, filters2=64, dense_units=128, kernel_size=(3,3), dropout_rate=0.25):
    model = Sequential([
        Input(shape=input_shape),
        Conv2D(filters1, kernel_size, activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        #Dropout(dropout_rate/2)
        
        Conv2D(filters2, kernel_size, activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        Dropout(dropout_rate),
        
        Flatten(),
        Dense(dense_units, activation='relu'),
        Dropout(dropout_rate),
        Dense(num_classes, activation='softmax')
    ])
    
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy'])
    return model

In [10]:
# Experiment with Custom CNN on CIFAR-10
print("--- Training Custom CNN Variations on CIFAR-10 ---")
custom_cnn_variations = []
histories_custom_cnn = []

--- Training Custom CNN Variations on CIFAR-10 ---


In [11]:
# Variation 1: Baseline
print("\nTraining Custom CNN Variation 1 (Baseline)")
custom_cnn_v1 = create_custom_cnn(input_shape_cifar, num_classes_cifar)
custom_cnn_v1.summary()
history_v1 = custom_cnn_v1.fit(x_train_cifar, y_train_cifar, 
                               epochs=30, # Reduced epochs for faster demo 
                               batch_size=64, 
                               validation_data=(x_test_cifar, y_test_cifar),
                               verbose=1)
custom_cnn_variations.append(custom_cnn_v1)
histories_custom_cnn.append(history_v1)
plot_history(history_v1, "Custom CNN V1 (Baseline) CIFAR-10")


Training Custom CNN Variation 1 (Baseline)


Epoch 1/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 40ms/step - accuracy: 0.2494 - loss: 2.0216 - val_accuracy: 0.4540 - val_loss: 1.5542
Epoch 2/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 39ms/step - accuracy: 0.4399 - loss: 1.5434 - val_accuracy: 0.5220 - val_loss: 1.3611
Epoch 3/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 44ms/step - accuracy: 0.5103 - loss: 1.3789 - val_accuracy: 0.5416 - val_loss: 1.2647
Epoch 4/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 50ms/step - accuracy: 0.5488 - loss: 1.2537 - val_accuracy: 0.5564 - val_loss: 1.2077
Epoch 5/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 45ms/step - accuracy: 0.5721 - loss: 1.1878 - val_accuracy: 0.6092 - val_loss: 1.1230
Epoch 6/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 45ms/step - accuracy: 0.6249 - loss: 1.0702 - val_accuracy: 0.6120 - val_loss: 1.0840
Epoch 7/30
[1m196/1

In [12]:
# Variation 2: More filters, more dense units
print("\nTraining Custom CNN Variation 2 (More Filters/Dense)")
custom_cnn_v2 = create_custom_cnn(input_shape_cifar, num_classes_cifar, filters1=64, filters2=128, dense_units=256)
custom_cnn_v2.summary()
history_v2 = custom_cnn_v2.fit(x_train_cifar, y_train_cifar, 
                               epochs=30, 
                               batch_size=64, 
                               validation_data=(x_test_cifar, y_test_cifar),
                               verbose=1)
custom_cnn_variations.append(custom_cnn_v2)
histories_custom_cnn.append(history_v2)
plot_history(history_v2, "Custom CNN V2 (More Filters/Dense) CIFAR-10")


Training Custom CNN Variation 2 (More Filters/Dense)


Epoch 1/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 106ms/step - accuracy: 0.2671 - loss: 1.9821 - val_accuracy: 0.4900 - val_loss: 1.4269
Epoch 2/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 105ms/step - accuracy: 0.4924 - loss: 1.4035 - val_accuracy: 0.5484 - val_loss: 1.2688
Epoch 3/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 103ms/step - accuracy: 0.5689 - loss: 1.2074 - val_accuracy: 0.6020 - val_loss: 1.1250
Epoch 4/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 117ms/step - accuracy: 0.6219 - loss: 1.0782 - val_accuracy: 0.6048 - val_loss: 1.1000
Epoch 5/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 111ms/step - accuracy: 0.6634 - loss: 0.9631 - val_accuracy: 0.6240 - val_loss: 1.0571
Epoch 6/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 109ms/step - accuracy: 0.6967 - loss: 0.8678 - val_accuracy: 0.6360 - val_loss: 1.0219
Epoch 7/30

In [13]:
# Variation 3: Different kernel size, higher dropout
print("\nTraining Custom CNN Variation 3 (5x5 Kernel, Higher Dropout)")
custom_cnn_v3 = create_custom_cnn(input_shape_cifar, num_classes_cifar, kernel_size=(5,5), dropout_rate=0.4)
custom_cnn_v3.summary()
history_v3 = custom_cnn_v3.fit(x_train_cifar, y_train_cifar, 
                               epochs=30, 
                               batch_size=64, 
                               validation_data=(x_test_cifar, y_test_cifar),
                               verbose=1)
custom_cnn_variations.append(custom_cnn_v3)
histories_custom_cnn.append(history_v3)
plot_history(history_v3, "Custom CNN V3 (5x5 Kernel, Higher Dropout) CIFAR-10")


Training Custom CNN Variation 3 (5x5 Kernel, Higher Dropout)


Epoch 1/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 58ms/step - accuracy: 0.1917 - loss: 2.1508 - val_accuracy: 0.4060 - val_loss: 1.6591
Epoch 2/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 56ms/step - accuracy: 0.3927 - loss: 1.6729 - val_accuracy: 0.4772 - val_loss: 1.4835
Epoch 3/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 57ms/step - accuracy: 0.4564 - loss: 1.5273 - val_accuracy: 0.5016 - val_loss: 1.4154
Epoch 4/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 56ms/step - accuracy: 0.4821 - loss: 1.4284 - val_accuracy: 0.5320 - val_loss: 1.2917
Epoch 5/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 59ms/step - accuracy: 0.5183 - loss: 1.3327 - val_accuracy: 0.5584 - val_loss: 1.2104
Epoch 6/30
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 57ms/step - accuracy: 0.5633 - loss: 1.2321 - val_accuracy: 0.5768 - val_loss: 1.1942
Epoch 7/30
[1m1

#### Hyperparameter Choice Argumentation <br>
**Variation 1** served as a baseline with moderate capacity. <br>
**Variation 2** increased capacity (more filters and dense units) to see if the model could learn more complex features. <br>
**Variation 3** explored a larger kernel size (5x5) which might capture broader spatial features, and increased dropout for regularization. <br>
Based on validation accuracy and loss curves, we will select the best performing custom CNN model for further comparison.

In [14]:
# Select the best custom CNN based on validation accuracy
best_custom_cnn_idx = np.argmax([h.history['val_accuracy'][-1] for h in histories_custom_cnn])
best_custom_cnn_cifar = custom_cnn_variations[best_custom_cnn_idx]
print(f"\nBest Custom CNN for CIFAR-10 is Variation {best_custom_cnn_idx+1}")

print("\n--- Best Custom CNN (CIFAR-10) Evaluation ---")
loss_custom_cifar, acc_custom_cifar = best_custom_cnn_cifar.evaluate(x_test_cifar, y_test_cifar, verbose=0)
print(f"Test Loss: {loss_custom_cifar:.4f}")
print(f"Test Accuracy: {acc_custom_cifar:.4f}")
y_pred_custom_cifar = np.argmax(best_custom_cnn_cifar.predict(x_test_cifar), axis=1)
cm_custom_cifar = confusion_matrix(y_test_cifar, y_pred_custom_cifar)
plot_confusion_matrix_px(cm_custom_cifar, class_names_cifar, "Confusion Matrix: Best Custom CNN (CIFAR-10)")


Best Custom CNN for CIFAR-10 is Variation 2

--- Best Custom CNN (CIFAR-10) Evaluation ---
Test Loss: 1.7534
Test Accuracy: 0.6628
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step


In [15]:
# Train Best Custom CNN configuration on MNIST for completeness
print("\n--- Training Best Custom CNN configuration on MNIST ---")
best_custom_cnn_params = custom_cnn_variations[best_custom_cnn_idx].optimizer.get_config() # Get some idea, but create new one
if best_custom_cnn_idx == 0:
    best_custom_cnn_mnist = create_custom_cnn(input_shape_mnist, num_classes_mnist, filters1=32, filters2=64, dense_units=256, kernel_size=(3,3), dropout_rate=0.25)
elif best_custom_cnn_idx == 1:
    best_custom_cnn_mnist = create_custom_cnn(input_shape_mnist, num_classes_mnist, filters1=64, filters2=128, dense_units=256, kernel_size=(3,3), dropout_rate=0.25)
else:
    best_custom_cnn_mnist = create_custom_cnn(input_shape_mnist, num_classes_mnist, kernel_size=(5,5), dropout_rate=0.4)

history_custom_mnist = best_custom_cnn_mnist.fit(x_train_mnist, y_train_mnist, 
                                               epochs=15, # MNIST trains faster
                                               batch_size=64, 
                                               validation_data=(x_test_mnist, y_test_mnist),
                                               verbose=1)
plot_history(history_custom_mnist, "Best Custom CNN (MNIST)")
loss_custom_mnist, acc_custom_mnist = best_custom_cnn_mnist.evaluate(x_test_mnist, y_test_mnist, verbose=0)
print(f"MNIST Test Loss: {loss_custom_mnist:.4f}")
print(f"MNIST Test Accuracy: {acc_custom_mnist:.4f}")
y_pred_custom_mnist = np.argmax(best_custom_cnn_mnist.predict(x_test_mnist), axis=1)
cm_custom_mnist = confusion_matrix(y_test_mnist, y_pred_custom_mnist)
plot_confusion_matrix_px(cm_custom_mnist, class_names_mnist, "Confusion Matrix: Best Custom CNN (MNIST)")


--- Training Best Custom CNN configuration on MNIST ---
Epoch 1/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 83ms/step - accuracy: 0.7915 - loss: 0.6702 - val_accuracy: 0.9668 - val_loss: 0.1017
Epoch 2/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 82ms/step - accuracy: 0.9703 - loss: 0.0933 - val_accuracy: 0.9812 - val_loss: 0.0550
Epoch 3/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 83ms/step - accuracy: 0.9812 - loss: 0.0627 - val_accuracy: 0.9832 - val_loss: 0.0541
Epoch 4/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 84ms/step - accuracy: 0.9849 - loss: 0.0454 - val_accuracy: 0.9832 - val_loss: 0.0479
Epoch 5/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 86ms/step - accuracy: 0.9903 - loss: 0.0287 - val_accuracy: 0.9848 - val_loss: 0.0458
Epoch 6/15
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 85ms/step - accuracy: 0.9908 - loss: 0.0283 -

MNIST Test Loss: 0.0661
MNIST Test Accuracy: 0.9820
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step


### B. AlexNet-inspired Model Implementation

Implement a CNN inspired by the AlexNet architecture. Given the smaller input sizes of CIFAR-10 (32x32) and MNIST (28x28) compared to AlexNet's original 227x227, the architecture will be adapted.
Key features to retain: multiple convolutional layers, MaxPooling, ReLU, Dropout, and Dense layers.

Reference: [AlexNet Paper](https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)

In [17]:
def create_alexnet_inspired(input_shape, num_classes):
    model = Sequential()
    model.add(Input(shape=input_shape))
    
    # Layer 1
    model.add(Conv2D(filters=32, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    #model.add(BatchNormalization()) # Optional, modern AlexNet variants use it
    
    # Layer 2
    model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    #model.add(BatchNormalization())

    # Layer 3
    model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    # Layer 4
    model.add(Conv2D(filters=128, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    # Layer 5
    model.add(Conv2D(filters=64, kernel_size=(3,3), strides=(1,1), padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    #model.add(BatchNormalization())
    
    model.add(Flatten())
    
    # Dense Layer 1
    model.add(Dense(units=256, activation='relu'))
    model.add(Dropout(0.5))
    # Dense Layer 2
    model.add(Dense(units=128, activation='relu'))
    model.add(Dropout(0.5))
    
    # Output Layer
    model.add(Dense(units=num_classes, activation='softmax'))
    
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy'])
    return model

# Train AlexNet-inspired on CIFAR-10
print("--- Training AlexNet-inspired CNN on CIFAR-10 ---")
alexnet_cifar = create_alexnet_inspired(input_shape_cifar, num_classes_cifar)
alexnet_cifar.summary()
start_time_alex_cifar = time.time()
history_alex_cifar = alexnet_cifar.fit(x_train_cifar, y_train_cifar, 
                                     epochs=50, # AlexNet might need more epochs 
                                     batch_size=64, 
                                     validation_data=(x_test_cifar, y_test_cifar),
                                     verbose=1)
end_time_alex_cifar = time.time()
train_time_alex_cifar = end_time_alex_cifar - start_time_alex_cifar
plot_history(history_alex_cifar, "AlexNet-inspired (CIFAR-10)")
print(f"AlexNet CIFAR-10 Training Time: {train_time_alex_cifar:.2f} seconds")

print("\n--- AlexNet-inspired (CIFAR-10) Evaluation ---")
loss_alex_cifar, acc_alex_cifar = alexnet_cifar.evaluate(x_test_cifar, y_test_cifar, verbose=0)
print(f"Test Loss: {loss_alex_cifar:.4f}")
print(f"Test Accuracy: {acc_alex_cifar:.4f}")
y_pred_alex_cifar = np.argmax(alexnet_cifar.predict(x_test_cifar), axis=1)
cm_alex_cifar = confusion_matrix(y_test_cifar, y_pred_alex_cifar)
plot_confusion_matrix_px(cm_alex_cifar, class_names_cifar, "Confusion Matrix: AlexNet-inspired (CIFAR-10)")

# Train AlexNet-inspired on MNIST
print("\n--- Training AlexNet-inspired CNN on MNIST ---")
alexnet_mnist = create_alexnet_inspired(input_shape_mnist, num_classes_mnist)
alexnet_mnist.summary()
start_time_alex_mnist = time.time()
history_alex_mnist = alexnet_mnist.fit(x_train_mnist, y_train_mnist, 
                                     epochs=20, # MNIST trains faster 
                                     batch_size=64, 
                                     validation_data=(x_test_mnist, y_test_mnist),
                                     verbose=1)
end_time_alex_mnist = time.time()
train_time_alex_mnist = end_time_alex_mnist - start_time_alex_mnist
plot_history(history_alex_mnist, "AlexNet-inspired (MNIST)")
print(f"AlexNet MNIST Training Time: {train_time_alex_mnist:.2f} seconds")

print("\n--- AlexNet-inspired (MNIST) Evaluation ---")
loss_alex_mnist, acc_alex_mnist = alexnet_mnist.evaluate(x_test_mnist, y_test_mnist, verbose=0)
print(f"Test Loss: {loss_alex_mnist:.4f}")
print(f"Test Accuracy: {acc_alex_mnist:.4f}")
y_pred_alex_mnist = np.argmax(alexnet_mnist.predict(x_test_mnist), axis=1)
cm_alex_mnist = confusion_matrix(y_test_mnist, y_pred_alex_mnist)
plot_confusion_matrix_px(cm_alex_mnist, class_names_mnist, "Confusion Matrix: AlexNet-inspired (MNIST)")

--- Training AlexNet-inspired CNN on CIFAR-10 ---


Epoch 1/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 119ms/step - accuracy: 0.1265 - loss: 2.2484 - val_accuracy: 0.2900 - val_loss: 1.8451
Epoch 2/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 144ms/step - accuracy: 0.2697 - loss: 1.9211 - val_accuracy: 0.3864 - val_loss: 1.6381
Epoch 3/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 139ms/step - accuracy: 0.3451 - loss: 1.7466 - val_accuracy: 0.4216 - val_loss: 1.5298
Epoch 4/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 117ms/step - accuracy: 0.4198 - loss: 1.5653 - val_accuracy: 0.4884 - val_loss: 1.4030
Epoch 5/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 141ms/step - accuracy: 0.4675 - loss: 1.4540 - val_accuracy: 0.5224 - val_loss: 1.3207
Epoch 6/50
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 132ms/step - accuracy: 0.5129 - loss: 1.3560 - val_accuracy: 0.5076 - val_loss: 1.3573
Epoch 7/50

AlexNet CIFAR-10 Training Time: 987.21 seconds

--- AlexNet-inspired (CIFAR-10) Evaluation ---
Test Loss: 2.7137
Test Accuracy: 0.6292
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step



--- Training AlexNet-inspired CNN on MNIST ---


Epoch 1/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 65ms/step - accuracy: 0.4984 - loss: 1.3886 - val_accuracy: 0.9496 - val_loss: 0.1614
Epoch 2/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 64ms/step - accuracy: 0.9452 - loss: 0.1987 - val_accuracy: 0.9796 - val_loss: 0.0661
Epoch 3/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 64ms/step - accuracy: 0.9693 - loss: 0.1160 - val_accuracy: 0.9784 - val_loss: 0.0711
Epoch 4/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 62ms/step - accuracy: 0.9794 - loss: 0.0763 - val_accuracy: 0.9848 - val_loss: 0.0526
Epoch 5/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 63ms/step - accuracy: 0.9794 - loss: 0.0701 - val_accuracy: 0.9848 - val_loss: 0.0565
Epoch 6/20
[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 63ms/step - accuracy: 0.9833 - loss: 0.0547 - val_accuracy: 0.9804 - val_loss: 0.0725
Epoch 7/20
[1m2

AlexNet MNIST Training Time: 117426.62 seconds

--- AlexNet-inspired (MNIST) Evaluation ---
Test Loss: 0.0806
Test Accuracy: 0.9836
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 19ms/step


## 3. Performance Comparison (Custom CNN vs. AlexNet-inspired on CIFAR-10)

We compare test accuracy, training time, and model size between the best custom CNN and the AlexNet-inspired model on the CIFAR-10 dataset.

In [18]:
print("--- Performance Comparison on CIFAR-10 ---")

print("\nBest Custom CNN:")
print(f"  Test Accuracy: {acc_custom_cifar:.4f}")
custom_cnn_train_time = histories_custom_cnn[best_custom_cnn_idx].epoch[-1] * np.mean([t_end - t_start for t_start, t_end in zip(histories_custom_cnn[best_custom_cnn_idx].history.get('batch_start_time', [0]*len(histories_custom_cnn[best_custom_cnn_idx].epoch)), histories_custom_cnn[best_custom_cnn_idx].history.get('batch_end_time', [0]*len(histories_custom_cnn[best_custom_cnn_idx].epoch)))]) if histories_custom_cnn[best_custom_cnn_idx].history.get('batch_start_time') else 'N/A (Callback not used)' # Approximation
# Note: A more accurate way for training time is to record it directly as done for AlexNet.
# For simplicity here, we'll use the recorded AlexNet time and refer to relative epoch counts for the custom CNN.
print(f"  Training Time: (Refer to training logs, ~{len(histories_custom_cnn[best_custom_cnn_idx].epoch)} epochs)")
print(f"  Model Size (Params): {best_custom_cnn_cifar.count_params()}")

print("AlexNet-inspired CNN:")
print(f"  Test Accuracy: {acc_alex_cifar:.4f}")
print(f"  Training Time: {train_time_alex_cifar:.2f} seconds ({len(history_alex_cifar.epoch)} epochs)")
print(f"  Model Size (Params): {alexnet_cifar.count_params()}")

print("\nDiscussion:")
print("- Generalization: Observe the gap between training and validation accuracy/loss. A smaller gap suggests better generalization.")
if acc_alex_cifar > acc_custom_cifar:
    print("- Which model generalizes better? The AlexNet-inspired model showed higher test accuracy, suggesting better generalization on this task and dataset sampling.")
else:
    print("- Which model generalizes better? The Custom CNN model showed higher or comparable test accuracy.")

print("- Efficiency: Consider both training time and model size (parameters).")
if alexnet_cifar.count_params() > best_custom_cnn_cifar.count_params() and train_time_alex_cifar > 30 * (train_time_alex_cifar/50): # Rough comparison assuming custom CNN trained for 30 epochs
    print("  The custom CNN was likely more efficient (faster training for its complexity and smaller size).")
else:
    print("  The AlexNet-inspired model, despite its complexity, might achieve better results justifying its resource use. Or, the custom model might be more efficient if its performance is close with fewer resources.")
    
print("- Implementation Challenges:")
print("  Custom CNN: Relatively straightforward. The main challenge is finding a good balance of layers and hyperparameters through experimentation.")
print("  AlexNet-inspired: Requires adapting a known (often larger) architecture to a smaller dataset and input size, which can be tricky. Understanding the core principles of the original AlexNet and scaling them down appropriately is key.")

--- Performance Comparison on CIFAR-10 ---

Best Custom CNN:
  Test Accuracy: 0.6628
  Training Time: (Refer to training logs, ~30 epochs)
  Model Size (Params): 2175626
AlexNet-inspired CNN:
  Test Accuracy: 0.6292
  Training Time: 987.21 seconds (50 epochs)
  Model Size (Params): 611210

Discussion:
- Generalization: Observe the gap between training and validation accuracy/loss. A smaller gap suggests better generalization.
- Which model generalizes better? The Custom CNN model showed higher or comparable test accuracy.
- Efficiency: Consider both training time and model size (parameters).
  The AlexNet-inspired model, despite its complexity, might achieve better results justifying its resource use. Or, the custom model might be more efficient if its performance is close with fewer resources.
- Implementation Challenges:
  Custom CNN: Relatively straightforward. The main challenge is finding a good balance of layers and hyperparameters through experimentation.
  AlexNet-inspired: Req

## 4. Class Performance and Improvement (Data Augmentation on CIFAR-10)

We analyze the per-class performance of the better performing model on CIFAR-10 (let's assume it's AlexNet-inspired for this example, or choose based on actual results) and apply targeted data augmentation to improve performance on weakly performing classes.

In [19]:
# Choose the model to augment (e.g., AlexNet if it performed better, or best_custom_cnn)
model_to_augment = alexnet_cifar 
model_name_to_augment = "AlexNet-inspired"
if acc_custom_cifar > acc_alex_cifar:
    model_to_augment = best_custom_cnn_cifar
    model_name_to_augment = f"Best Custom CNN (V{best_custom_cnn_idx+1})"
    
print(f"--- Analyzing and Augmenting {model_name_to_augment} on CIFAR-10 ---")

print("Classification Report (Before Augmentation):")
y_pred_before_aug = np.argmax(model_to_augment.predict(x_test_cifar), axis=1)
print(classification_report(y_test_cifar, y_pred_before_aug, target_names=class_names_cifar))

print("Identifying weakly performing classes (e.g., those with low F1-score or recall).")
print("Based on the report, let's assume 'cat' and 'dog' are often confused or have lower scores.")

# Data Augmentation Setup
datagen = ImageDataGenerator(
    rotation_range=15,      # Randomly rotate images by up to 15 degrees
    width_shift_range=0.1,  # Randomly shift images horizontally by up to 10% of width
    height_shift_range=0.1, # Randomly shift images vertically by up to 10% of height
    horizontal_flip=True,   # Randomly flip images horizontally
    zoom_range=0.1          # Randomly zoom into images by up to 10%
)

datagen.fit(x_train_cifar)

# Re-train the chosen model with data augmentation
# It's often better to create a new instance of the model for retraining with augmentation
if model_name_to_augment == "AlexNet-inspired":
    model_augmented = create_alexnet_inspired(input_shape_cifar, num_classes_cifar)
else: # One of the custom CNNs
    if best_custom_cnn_idx == 0:
        model_augmented = create_custom_cnn(input_shape_cifar, num_classes_cifar)
    elif best_custom_cnn_idx == 1:
        model_augmented = create_custom_cnn(input_shape_cifar, num_classes_cifar, filters1=64, filters2=128, dense_units=256)
    else:
        model_augmented = create_custom_cnn(input_shape_cifar, num_classes_cifar, kernel_size=(5,5), dropout_rate=0.4)
        
model_augmented.summary()

print(f"\n--- Retraining {model_name_to_augment} with Data Augmentation on CIFAR-10 ---")
history_augmented = model_augmented.fit(
    datagen.flow(x_train_cifar, y_train_cifar, batch_size=64),
    epochs=60,  # Augmentation often benefits from more epochs
    validation_data=(x_test_cifar, y_test_cifar),
    steps_per_epoch=len(x_train_cifar) // 64,
    verbose=1
)

plot_history(history_augmented, f"{model_name_to_augment} (Augmented) CIFAR-10")

print(f"\n--- {model_name_to_augment} (Augmented) CIFAR-10 Evaluation ---")
loss_aug, acc_aug = model_augmented.evaluate(x_test_cifar, y_test_cifar, verbose=0)
print(f"Test Loss (Augmented): {loss_aug:.4f}")
print(f"Test Accuracy (Augmented): {acc_aug:.4f}")

y_pred_aug = np.argmax(model_augmented.predict(x_test_cifar), axis=1)
cm_aug = confusion_matrix(y_test_cifar, y_pred_aug)
plot_confusion_matrix_px(cm_aug, class_names_cifar, f"Confusion Matrix: {model_name_to_augment} (Augmented) CIFAR-10")

print("Classification Report (After Augmentation):")
print(classification_report(y_test_cifar, y_pred_aug, target_names=class_names_cifar))

print("\nEffect of Augmentation:")
print(f"Accuracy before augmentation: {model_to_augment.evaluate(x_test_cifar, y_test_cifar, verbose=0)[1]:.4f}")
print(f"Accuracy after augmentation:  {acc_aug:.4f}")
if acc_aug > model_to_augment.evaluate(x_test_cifar, y_test_cifar, verbose=0)[1]:
    print("Data augmentation improved the model's test accuracy.")
else:
    print("Data augmentation did not significantly improve test accuracy, or parameters need further tuning.")

--- Analyzing and Augmenting Best Custom CNN (V2) on CIFAR-10 ---
Classification Report (Before Augmentation):
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step
              precision    recall  f1-score   support

    airplane       0.71      0.63      0.67       250
  automobile       0.82      0.72      0.77       250
        bird       0.60      0.55      0.57       250
         cat       0.46      0.44      0.45       250
        deer       0.61      0.63      0.62       250
         dog       0.55      0.62      0.59       250
        frog       0.65      0.78      0.71       250
       horse       0.72      0.74      0.73       250
        ship       0.81      0.74      0.77       250
       truck       0.73      0.77      0.75       250

    accuracy                           0.66      2500
   macro avg       0.67      0.66      0.66      2500
weighted avg       0.67      0.66      0.66      2500

Identifying weakly performing classes (e.g., those with 


--- Retraining Best Custom CNN (V2) with Data Augmentation on CIFAR-10 ---
Epoch 1/60



Your `PyDataset` class should call `super().__init__(**kwargs)` in its constructor. `**kwargs` can include `workers`, `use_multiprocessing`, `max_queue_size`. Do not pass these arguments to `fit()`, as they will be ignored.



[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 136ms/step - accuracy: 0.2588 - loss: 2.0168 - val_accuracy: 0.4588 - val_loss: 1.4914
Epoch 2/60
[1m  1/195[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m19s[0m 103ms/step - accuracy: 0.3906 - loss: 1.6344


Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.



[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.3906 - loss: 1.6344 - val_accuracy: 0.4472 - val_loss: 1.5145
Epoch 3/60
[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 123ms/step - accuracy: 0.4295 - loss: 1.5730 - val_accuracy: 0.5112 - val_loss: 1.3632
Epoch 4/60
[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.6000 - loss: 1.3445 - val_accuracy: 0.5044 - val_loss: 1.3788
Epoch 5/60
[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 123ms/step - accuracy: 0.4845 - loss: 1.4311 - val_accuracy: 0.5072 - val_loss: 1.3534
Epoch 6/60
[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.5469 - loss: 1.3416 - val_accuracy: 0.4844 - val_loss: 1.4108
Epoch 7/60
[1m195/195[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 127ms/step - accuracy: 0.5258 - loss: 1.3481 - val_accuracy: 0.5764 - val_loss: 1.1849
Epoch 8/60
[1m195/195[0m [3


--- Best Custom CNN (V2) (Augmented) CIFAR-10 Evaluation ---
Test Loss (Augmented): 0.8668
Test Accuracy (Augmented): 0.7028
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step


Classification Report (After Augmentation):
              precision    recall  f1-score   support

    airplane       0.79      0.67      0.72       250
  automobile       0.87      0.86      0.86       250
        bird       0.66      0.54      0.59       250
         cat       0.63      0.39      0.48       250
        deer       0.71      0.59      0.64       250
         dog       0.67      0.59      0.63       250
        frog       0.51      0.90      0.65       250
       horse       0.72      0.76      0.74       250
        ship       0.84      0.84      0.84       250
       truck       0.76      0.88      0.82       250

    accuracy                           0.70      2500
   macro avg       0.71      0.70      0.70      2500
weighted avg       0.71      0.70      0.70      2500


Effect of Augmentation:
Accuracy before augmentation: 0.6628
Accuracy after augmentation:  0.7028
Data augmentation improved the model's test accuracy.


## 5. Deliverables Summary (Short Report)

### A. Architecture Details
**Custom CNN (Best Variation for CIFAR-10):**
* *(Describe the architecture based on the selected best custom CNN, e.g., number of conv layers, filters, kernel sizes, pooling, dropout, dense layers. Use `best_custom_cnn_cifar.summary()` output as a guide)*

**AlexNet-inspired CNN (for CIFAR-10):**
* *(Describe the AlexNet-inspired architecture. Use `alexnet_cifar.summary()` output as a guide)*
  *   Layer 1: Conv2D (32 filters, 3x3, ReLU) -> MaxPooling2D
  *   Layer 2: Conv2D (64 filters, 3x3, ReLU) -> MaxPooling2D
  *   Layer 3: Conv2D (128 filters, 3x3, ReLU)
  *   Layer 4: Conv2D (128 filters, 3x3, ReLU)
  *   Layer 5: Conv2D (64 filters, 3x3, ReLU) -> MaxPooling2D
  *   Flatten
  *   Dense (256 units, ReLU) -> Dropout (0.5)
  *   Dense (128 units, ReLU) -> Dropout (0.5)
  *   Dense (10 units, Softmax)

### B. Hyperparameters Analysis (Custom CNN on CIFAR-10)
*   **Variation 1 (Baseline):** Filters (32, 64), Dense (128), Dropout (0.25), Kernel (3x3). This provided a starting point.
*   **Variation 2 (More Capacity):** Filters (64, 128), Dense (256), Dropout (0.25), Kernel (3x3). Aimed to see if increased model complexity would capture more features and improve accuracy. This variation often performs better if sufficient data is available and overfitting is managed.
*   **Variation 3 (Larger Kernels, More Dropout):** Filters (32, 64), Dense (128), Dropout (0.4), Kernel (5x5). Explored if larger receptive fields in early layers and stronger regularization would be beneficial.
*   **Choice:** The best custom CNN variant was selected based on the highest validation accuracy achieved during the experimental runs. *(State which one was chosen based on the actual run results.)*

### C. Performance Comparison (CIFAR-10)
*(Summarize from section 3's output)*
*   **Test Accuracy:** 
    *   Custom CNN: *(insert acc_custom_cifar)*
    *   AlexNet-inspired: *(insert acc_alex_cifar)*
*   **Training Time (Approximate for similar epochs):** 
    *   The AlexNet-inspired model, being deeper, generally took longer per epoch.
*   **Model Size (Parameters):**
    *   Custom CNN: *(insert `best_custom_cnn_cifar.count_params()`)*
    *   AlexNet-inspired: *(insert `alexnet_cifar.count_params()`)*
*   **Generalization:** *(Discuss which model showed a smaller gap between training and validation performance, and which performed better on the test set).* The AlexNet-inspired model, despite its complexity, often generalizes well due to its structured depth and use of dropout, potentially outperforming the simpler custom CNN on a dataset like CIFAR-10 if trained appropriately.
*   **Efficiency:** The custom CNN is generally more efficient in terms of parameters and potentially training time for a given performance level if it's well-tuned. The AlexNet-inspired model is more resource-intensive but might achieve higher peak accuracy.
*   **Implementation Challenges:** Custom CNNs are simpler to start with but require iterative design. Adapting AlexNet involves understanding its principles and scaling it down, which can be more complex initially but leverages a proven architectural pattern.

### D. Augmentation Strategy and Its Effect (CIFAR-10 with chosen model)
*   **Model Chosen for Augmentation:** *(State which model: e.g., AlexNet-inspired)*
*   **Weakly Performing Classes (Example):** Before augmentation, classes like 'cat' and 'dog' might have shown lower precision/recall. *(Refer to actual classification report before augmentation)*.
*   **Augmentation Techniques Used:** Random rotations (up to 15 degrees), width/height shifts (up to 10%), horizontal flips, and zoom (up to 10%).
*   **Effect on Performance:**
    *   Accuracy Before Augmentation: *(insert model_to_augment test accuracy)*
    *   Accuracy After Augmentation: *(insert acc_aug)*
    *   *(Discuss if augmentation improved accuracy and potentially balanced class performance, referring to the classification report after augmentation. Augmentation generally helps improve generalization and makes the model more robust, especially for classes with high intra-class variance or similarity to other classes.)*

### Performance on MNIST
*   **Custom CNN (MNIST):** Achieved an accuracy of *(insert acc_custom_mnist)*.
*   **AlexNet-inspired (MNIST):** Achieved an accuracy of *(insert acc_alex_mnist)*.
*   MNIST is a simpler dataset, and both models are expected to perform very well. The AlexNet-inspired architecture might be overkill for MNIST but should still achieve high accuracy.