<a href="https://colab.research.google.com/github/karlbuscheck/battle-of-the-neural-networks/blob/main/battle_of_the_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Battle of the Neural Networks: DNNs vs. CNNs in Image Classification

There's something electric about training a neural network, watching the epochs pour down your screen, tracking the accuracies and losses in real-time. That's exactly what we'll be doing in this notebook.

We’ll be pitting **fully connected deep neural networks (DNNs)** -- multi-layer perceptrons that flatten images -- against **convolutional neural networks (CNNs)** that preserve spatial structure for an image classification task using the popular [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), which consists of 60,000 32x32 color images split into 10 classes:

- Airplane
- Automobile
- Bird
- Cat
- Deer
- Frog
- Horse
- Ship
- Truck

**Spoiler alert**: The CNNs win -- after all, image classification is exactly the sort of task they excel at. But along the way, we'll find out *why* this is the case.

Time for the deep dive.

---

# ❗️ NOTE ON METHODOLOGY ❗️
This notebook contains an uncorrected methodological flaw (Data Leakage) and is for demonstration purposes only. The accuracy reported here is likely inflated.

---

## Tools & Libraries Used

- **Google Colab (with GPU runtime)** - cloud-based environment for writing and running the notebook with hardware acceleration
- **Python 3.11.13** - base language powering the project
- **Keras** - for loading CIFAR-10, building DNN and CNN models, training, and evaluation

---


## Acknowledgments

This notebook builds on a project from Professor Tao Li’s Machine Learning with Python course at the Leavey School of Business at Santa Clara University. Many thanks to Professor Li for the lectures that spark further exploration.

## Import and load the dataset

Let's grab our data. **Sidenote**: The CIFAR-10 dataset comes pre-split into training (50K) and test (10K) sets.

In [None]:
# Import Keras
import keras
# Import the dataset
from keras.datasets import cifar10

# Load the dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

In [None]:
# Check shapes of the training and test sets
# Note: For X -- (num_images, height, width, channels), and for Y --(num_images, 1)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(50000, 32, 32, 3)
(10000, 32, 32, 3)
(50000, 1)
(10000, 1)


## Build the baseline DNNs

We'll begin by building three baseline DNNs *without* usng dropout or batch normalization. Before digging in, let's begin with the preprocessing.

In [None]:
# Turn the pixel values to float32 for compatibility
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# Scale pixel values to [0, 1] range -- aka we are normalizing the data
X_train /= 255
X_test /= 255

# Display the number of training and test samples
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# Set the number of possible digit classes -- 0 through 9
num_classes = 10

# One-hot encode the labels
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

50000 train samples
10000 test samples


Now, to **build the first of the baseline model**, a fully connected DNN.

In [None]:
# Import the main building blocks: Sequential for stacking layers,
# Dense for fully connected layers, and Activation functions
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras import Input

# Initialize the model
model = Sequential([

    # Define inputs to avoid warning
    Input(shape=(32, 32, 3)),

    # Add the flatten layer to turn the 32x32x3 image into a 1D input
    Flatten(),

    # Add the hidden layer
    Dense(256),

    # Add ReLu to help the network learn better
    Activation('relu'),

    # Add tge output layer for 10 classes
    Dense(10),

    # Turn the outputs into probabilities with softmax
    Activation('softmax'),
])

Compile the model.

In [None]:
# Use .compile() to set "adam" as the optimizer and set "categorical_crossentropy"
# as the loss function; Finally use "accuracy" as the metric
model.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

Check the model summary to see what we built.

In [None]:
# Display the model summary
model.summary()

And now we're ready to **train the model**.

In [None]:
# Now, train the model with 10% of training data set aside for validation,
# This allows us to monitor performance on unseen data during training
# Note: We use validation_split = 0.1 to set the size of the validation set
# Also note: As above, we use verbose=1 to see progress as we go
history = model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 8ms/step - accuracy: 0.2427 - loss: 2.2598 - val_accuracy: 0.3260 - val_loss: 1.9139
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.3602 - loss: 1.8075 - val_accuracy: 0.3866 - val_loss: 1.7490
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.3863 - loss: 1.7366 - val_accuracy: 0.3942 - val_loss: 1.7014
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4096 - loss: 1.6630 - val_accuracy: 0.4266 - val_loss: 1.6431
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4229 - loss: 1.6267 - val_accuracy: 0.4332 - val_loss: 1.6104
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4289 - loss: 1.5994 - val_accuracy: 0.4150 - val_loss: 1.6447
Epoch 7/20
[1m352/352[0m 

That was a *rough* start for our baseline. Training accuracy goes up and loss goes slightly down, but the model barely cleared 50% in accuracy. Validation accuracy mostly went up, and loss mostley went down. Suffice to say, there is lots of room for improvement.

Let's see what we can do. But first, the evaluation on the test set.

# ⚠️ WARNING: KNOWN METHODOLOGICAL FLAW ⚠️
# The following evaluation introduces Data Leakage.
# The accuracy reported below is **BIASED** and should not be considered final.

In [None]:
# Evaluate the model on the test set
score = model.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score[0]))
print("Test Accuracy: {:.3f}".format(score[1]))

Test Loss: 1.489
Test Accuracy: 0.466


**For the second architecture** (or `model_2`), let's add a second hidden layer to increase the model's depth/capacity to see if that helps it learn some of the nonlinear patterns.

In [None]:
# Initialize the model -- everything the same just add a second hidden layer
model_2 = Sequential([
    Input(shape=(32,32,3)),
    Flatten(),
    Dense(256), Activation('relu'),
    Dense(128), Activation('relu'),
    Dense(10),  Activation('softmax'),
])

# Compile the model
model_2.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

# Check the model summary
model_2.summary()

Now to train `model_2`.

In [None]:
# Train the second model
history_2 = model_2.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 9ms/step - accuracy: 0.2663 - loss: 2.0511 - val_accuracy: 0.3594 - val_loss: 1.8034
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.3801 - loss: 1.7294 - val_accuracy: 0.4178 - val_loss: 1.6646
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.4155 - loss: 1.6430 - val_accuracy: 0.4014 - val_loss: 1.6877
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4323 - loss: 1.5829 - val_accuracy: 0.4496 - val_loss: 1.5715
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4520 - loss: 1.5323 - val_accuracy: 0.4510 - val_loss: 1.5397
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4675 - loss: 1.4994 - val_accuracy: 0.4564 - val_loss: 1.5294
Epoch 7/20
[1m352/352[0m 

This time, the model made it to about 55% in terms of training accuracy. Let's see what the test set score is.

In [None]:
# Evaluate the model on the test set
score_2 = model_2.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_2[0]))
print("Test Accuracy: {:.3f}".format(score_2[1]))

Test Loss: 1.389
Test Accuracy: 0.507


A quick side-by-side comparison:

In [None]:
# Display the Test Loss and Test Accuracy for the two models
print("Baseline Model Test Loss: {:.3f}".format(score[0]))
print("Baseline Model Test Accuracy: {:.3f}".format(score[1]))
print("Model 2 Test Loss: {:.3f}".format(score_2[0]))
print("Model 2 Test Accuracy: {:.3f}".format(score_2[1]))

Baseline Model Test Loss: 1.489
Baseline Model Test Accuracy: 0.466
Model 2 Test Loss: 1.389
Model 2 Test Accuracy: 0.507


**For our third architecture** (or `model_3`), we'll try a much larger DNN to see whether added capacity improves accuracy on the CIFAR-10 dataset.

In [None]:
# Build the model
# Note: These dense layers are much bigger than the ones we used in 'model_2'
model_3 = Sequential([
    Input(shape=(32,32,3)),
    Flatten(),
    Dense(1024), Activation('relu'),
    Dense(1024), Activation('relu'),
    Dense(10),   Activation('softmax'),
])

# Compile the model
model_3.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

# Check the model summary
model_3.summary()

This model is, indeed, huge! Over 4.2M parameters -- well over 4x the second model, which was slightly bigger than the first. And now we train.

In [None]:
# Train the third model
history_3 = model_3.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 12ms/step - accuracy: 0.2506 - loss: 2.2995 - val_accuracy: 0.3702 - val_loss: 1.7462
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.3840 - loss: 1.7097 - val_accuracy: 0.4052 - val_loss: 1.6687
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.4294 - loss: 1.6049 - val_accuracy: 0.4204 - val_loss: 1.6172
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.4516 - loss: 1.5334 - val_accuracy: 0.4384 - val_loss: 1.5704
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.4725 - loss: 1.4798 - val_accuracy: 0.4668 - val_loss: 1.5042
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.4917 - loss: 1.4293 - val_accuracy: 0.4730 - val_loss: 1.4627
Epoch 7/20
[1m352/352[0m 

In terms of training accuracy, **this is our best model yet**. But, validation accuracy is lagging behind, leaving us to wonder if the model is overfitting. To find out, let's check the test set performance.

In [None]:
# Evaluate the model on the test set
score_3 = model_3.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_3[0]))
print("Test Accuracy: {:.3f}".format(score_3[1]))

Test Loss: 1.433
Test Accuracy: 0.516


As we expected, test and validation accuracy are lagging, pointing to an overfitting issue we will attempt to address with dropout and batch normalization.

Before moving on, **here's the final scorecard for these three models**:

In [None]:
# Display the Test Loss and Test Accuracy for the three models
print("Baseline Model Test Loss: {:.3f}".format(score[0]))
print("Baseline Model Test Accuracy: {:.3f}".format(score[1]))
print("Model 2 Test Loss: {:.3f}".format(score_2[0]))
print("Model 2 Test Accuracy: {:.3f}".format(score_2[1]))
print("Model 3 Test Loss: {:.3f}".format(score_3[0]))
print("Model 3 Test Accuracy: {:.3f}".format(score_3[1]))

Baseline Model Test Loss: 1.489
Baseline Model Test Accuracy: 0.466
Model 2 Test Loss: 1.389
Model 2 Test Accuracy: 0.507
Model 3 Test Loss: 1.433
Model 3 Test Accuracy: 0.516


## Build three more baseline DNNs -- but this time add batch normalization *and* dropout

Next, we'll boost our baseline DNNs by adding **batch normalization** and **dropout**. Batch normalization helps stabilize and speed up training by keeping layer activations well-scaled, while dropout randomly turns off neurons during training to reduce overfitting. Together, they can improve generalization and model robustness.

We'll begin by adding these layers to the original baseline model.

In [None]:
# Start with the imports
from keras.layers import BatchNormalization, Dropout

# Build the same model as the baseline -- just add in BatchNormalization and Dropout
# Note: For this first model -- the smallest, we'll set Dropout to 0.3, and increase
# that figure for the bigger models that are more likely to overfit
model_reg = Sequential([
    Input(shape=(32,32,3)),
    Flatten(),
    Dense(256),
    BatchNormalization(), Activation('relu'), Dropout(0.3),
    Dense(10), Activation('softmax'),
])

# Compile the model
model_reg.compile("adam", "categorical_crossentropy", metrics=["accuracy"])


# Check the model summary
model_reg.summary()

Now to train the model.

In [None]:
# Train the model
history_reg = model_reg.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 9ms/step - accuracy: 0.3362 - loss: 1.9271 - val_accuracy: 0.3706 - val_loss: 1.7802
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4397 - loss: 1.5977 - val_accuracy: 0.3970 - val_loss: 1.7373
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4724 - loss: 1.5096 - val_accuracy: 0.4074 - val_loss: 1.7050
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4980 - loss: 1.4453 - val_accuracy: 0.4514 - val_loss: 1.5594
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.5088 - loss: 1.4113 - val_accuracy: 0.4388 - val_loss: 1.5768
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.5212 - loss: 1.3770 - val_accuracy: 0.4616 - val_loss: 1.5902
Epoch 7/20
[1m352/352[0m 

This "baseline" model with dropout and batch normalization is outperforming the original version on training accuracy but we seem to be having an overfit alert as the validation accuracy is dragging behind. Let's confirm with a quick look a the test set.

In [None]:
# Check the accuracy on the test set
score_reg = model_reg.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_reg[0]))
print("Test Accuracy: {:.3f}".format(score_reg[1]))

Test Loss: 1.526
Test Accuracy: 0.472


The model is, indeed, overfitting. Now, **let's see what happens with the *second*** version of `model_2`.

In [None]:
# Initialize the model -- everything the same as  'model_2' just with Dropout and
# BatchNormalization added
model_reg_2 = Sequential([
    Input(shape=(32,32,3)),
    Flatten(),
    Dense(256),
    BatchNormalization(), Activation('relu'), Dropout(0.3),
    Dense(128),
    BatchNormalization(), Activation('relu'), Dropout(0.3),
    Dense(10), Activation('softmax'),
])

# Compile the model
model_reg_2.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

# Check the model summary
model_reg_2.summary()

Now, to train.

In [None]:
# Train the model
history_reg_2 = model_reg_2.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 13ms/step - accuracy: 0.3067 - loss: 1.9935 - val_accuracy: 0.3270 - val_loss: 1.8191
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4151 - loss: 1.6221 - val_accuracy: 0.3716 - val_loss: 1.8541
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.4496 - loss: 1.5378 - val_accuracy: 0.4028 - val_loss: 1.6620
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4708 - loss: 1.4785 - val_accuracy: 0.4320 - val_loss: 1.6006
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.4917 - loss: 1.4314 - val_accuracy: 0.4878 - val_loss: 1.4358
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.4997 - loss: 1.3992 - val_accuracy: 0.4800 - val_loss: 1.4596
Epoch 7/20
[1m352/352[0m 

The training accuracy was right at 60% and the validation accuracy was bouncing all around. Let's see what the test set score is.

In [None]:
# Check the accuracy on the test set
score_reg_2 = model_reg_2.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_reg_2[0]))
print("Test Accuracy: {:.3f}".format(score_reg_2[1]))

Test Loss: 1.563
Test Accuracy: 0.459


Another model that is clearly struggling with overfitting. **Time to see if the big model can save the day**. Let's build `model_3_reg`.

In [None]:
# Build the model
# Note: We added BatchNormalization and Dropout -- pushed up to 0.5 -- after
# both dense hidden layers
model_reg_3 = Sequential([
    Input(shape=(32,32,3)),
    Flatten(),
    Dense(1024),
    BatchNormalization(), Activation('relu'), Dropout(0.5),
    Dense(1024),
    BatchNormalization(), Activation('relu'), Dropout(0.5),
    Dense(10), Activation('softmax'),
])

# Compile the model
model_reg_3.compile("adam", "categorical_crossentropy", metrics=["accuracy"])

# Check the model summary
model_reg_3.summary()

Up next, let's train this model.

In [None]:
# Train the model
history_reg_3 = model_reg_3.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 14ms/step - accuracy: 0.2913 - loss: 2.1634 - val_accuracy: 0.3406 - val_loss: 1.8650
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 5ms/step - accuracy: 0.4081 - loss: 1.6606 - val_accuracy: 0.4202 - val_loss: 1.6097
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.4484 - loss: 1.5437 - val_accuracy: 0.3862 - val_loss: 1.6888
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.4741 - loss: 1.4681 - val_accuracy: 0.4180 - val_loss: 1.6183
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.4973 - loss: 1.4139 - val_accuracy: 0.4366 - val_loss: 1.5797
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.5039 - loss: 1.3830 - val_accuracy: 0.4322 - val_loss: 1.6004
Epoch 7/20
[1m352/352[0m 

These results are quite similar to those of the original `model_3`. Let's take a look at the accuracy on the test set.

In [None]:
# Check the accuracy on the test set
score_reg_3 = model_reg_3.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_reg_3[0]))
print("Test Accuracy: {:.3f}".format(score_reg_3[1]))

Test Loss: 1.491
Test Accuracy: 0.474


Let's take a look at the final **test set scorecard** for the 6 DNNs we just built:

In [None]:
# Display the Test Loss and Test Accuracy for the three models
print("Baseline Model Test Loss: {:.3f}".format(score[0]))
print("Baseline Model Test Accuracy: {:.3f}".format(score[1]))
print("Model 2 Test Loss: {:.3f}".format(score_2[0]))
print("Model 2 Test Accuracy: {:.3f}".format(score_2[1]))
print("Model 3 Test Loss: {:.3f}".format(score_3[0]))
print("Model 3 Test Accuracy: {:.3f}".format(score_3[1]))
print("Baseline Model with Batch Normalization and Dropout Test Loss: {:.3f}".format(score_reg[0]))
print("Baseline Model with Batch Normalization and Dropout Test Accuracy: {:.3f}".format(score_reg[1]))
print("Model 2 with Batch Normalization and Dropout Test Loss: {:.3f}".format(score_reg_2[0]))
print("Model 2 with Batch Normalization and Dropout Test Accuracy: {:.3f}".format(score_reg_2[1]))
print("Model 3 with Batch Normalization and Dropout Test Loss: {:.3f}".format(score_reg_3[0]))
print("Model 3 with Batch Normalization and Dropout Test Accuracy: {:.3f}".format(score_reg_3[1]))

Baseline Model Test Loss: 1.489
Baseline Model Test Accuracy: 0.466
Model 2 Test Loss: 1.389
Model 2 Test Accuracy: 0.507
Model 3 Test Loss: 1.433
Model 3 Test Accuracy: 0.516
Baseline Model with Batch Normalization and Dropout Test Loss: 1.526
Baseline Model with Batch Normalization and Dropout Test Accuracy: 0.472
Model 2 with Batch Normalization and Dropout Test Loss: 1.563
Model 2 with Batch Normalization and Dropout Test Accuracy: 0.459
Model 3 with Batch Normalization and Dropout Test Loss: 1.491
Model 3 with Batch Normalization and Dropout Test Accuracy: 0.474


**Takeaway**: Batch Normalization and Dropout helped with overfitting, but didn't improve our DNNs' generlization to unseen data. The basic problem? No spatial awareness from these models. Next, we'll switch to CNNs to see if we can better capture spacial structures.

## Build the baseline CNNs

We'll replicate the same process as with the DNNs -- starting with building the baseline architectures and creating three different versions by tuning the filter number and size.

**The CNNs are poised to perform better** because they preserve the spatial layout of the image. Their convolutional layers scan small regions to detect features like edges and shapes. These learned featutes are reused across the image, making CNNs far more effective at spotting patterns no matter where they appear.

Now, to build our initial baseline CNN. First, to handle the image dimensions and input shape.

In [None]:
# Set the input image dimensions for CIFAR-10
img_rows, img_cols, img_channels = 32, 32, 3

# Set input shape for RGB images
input_shape = (img_rows, img_cols, img_channels)

In [None]:
# Start with the imports
from keras.layers import Conv2D, MaxPooling2D

# Initialize the Sequential CNN model
cnn = Sequential()

# Add the first layer, a convolutional layer with 32 filters, each filter is 3x3
cnn.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

# After each convolutional layer, add a max pooling layer
# Note: Use a 2x2 pooling later to downsample the feature map, keeping key patterns but reducing size
cnn.add(MaxPooling2D(pool_size=(2, 2)))

# Add another convolutional layer with the same filter and activation function as above
cnn.add(Conv2D(32, (3, 3), activation='relu'))

# Add another max pooling layer
cnn.add(MaxPooling2D(pool_size=(2, 2)))

# Add a flattening layer to flatten all the neurons before building a fully connected, or dense layer
cnn.add(Flatten())

# Add the dense, or fully connected layer with 64 neurons and used relu as the activation function
cnn.add(Dense(64, activation='relu'))

# Finally, add the final ouput layer, which has 10 neurons as we specified with the variable in the above block
cnn.add(Dense(num_classes, activation='softmax'))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Now, to compile the model.

In [None]:
# Compile the model:
# Use 'adam' as the optimizer, attempt to minimize 'categorical_crossentropy'
# and set the metric as accuracy
# As for batch size: Instead of feeding all the training images to the model, we break the dataset
# into mini-batches of 128 images
cnn.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

And check the model summary before we train.

In [None]:
# Check the model summary
cnn.summary()

**Note**: As we'd expect, this model is much smaller than the DNNs we were buidling before because **CNNs leverage spatial structure** -- they extract patterns locally with fewer parameters, instead of connecting every pixel to every neuron like fully connected layers do.

And now we'll train the model.

In [None]:
# Train the model
history_cnn = cnn.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 13ms/step - accuracy: 0.2932 - loss: 1.9390 - val_accuracy: 0.4852 - val_loss: 1.4153
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.4998 - loss: 1.3989 - val_accuracy: 0.5398 - val_loss: 1.2958
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.5494 - loss: 1.2776 - val_accuracy: 0.5604 - val_loss: 1.2210
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.5786 - loss: 1.1914 - val_accuracy: 0.6010 - val_loss: 1.1283
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6061 - loss: 1.1196 - val_accuracy: 0.6030 - val_loss: 1.1589
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6262 - loss: 1.0724 - val_accuracy: 0.6238 - val_loss: 1.0662
Epoch 7/20
[1m352/352[0m 

This is a highly encouraging start. This baseline CNN has already raced past the performance of all the DNNs we built. Now to check the test set accuracy before building a few more versions.

In [None]:
# Check the accuracy on the test set
score_cnn = cnn.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn[0]))
print("Test Accuracy: {:.3f}".format(score_cnn[1]))

Test Loss: 0.971
Test Accuracy: 0.674


This test set accuracy is exactly in line with what we'd expect based on the validation set performance. Now to move on to our second architecture.

**Up next**: We'll build another CNN with a larger filter size (5×5) but fewer filters (8), keeping the rest of the architecture the same as the baseline. This way, each filter sees a larger chunk of the image at once, capturing more context but in fewer distinct patterns.

In [None]:
# Initialize the new Sequential CNN model
cnn_5 = Sequential()

# Add the convolutional layer with a filter size of 5x5 and just 8 filters, as specified above
# Note: The rest of the layers are the same as the first model
cnn_5.add(Conv2D(8, kernel_size=(5, 5),
                 activation='relu',
                 input_shape=input_shape))

# Add the max pooling layer
cnn_5.add(MaxPooling2D(pool_size=(2, 2)))

# Add another convolutional layer
cnn_5.add(Conv2D(8, (5, 5), activation='relu'))

# Add another max pooling layer
cnn_5.add(MaxPooling2D(pool_size=(2, 2)))

# Add the flattening layer
cnn_5.add(Flatten())

# Add the fully connected, or dense layer
cnn_5.add(Dense(64, activation='relu'))

# And, finally, add the final output layer, which has 10 neurons as we specified at the beginning of the block
cnn_5.add(Dense(num_classes, activation='softmax'))

In [None]:
# Compile the model
cnn_5.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

Let's quickly check the model summary:

In [None]:
# Check the summary
cnn_5.summary()

Next up: train the new model, `cnn_5`.

In [None]:
# Train the model
history_cnn_5 = cnn_5.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 11ms/step - accuracy: 0.2724 - loss: 1.9994 - val_accuracy: 0.4226 - val_loss: 1.5770
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.4391 - loss: 1.5526 - val_accuracy: 0.4672 - val_loss: 1.4751
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.4777 - loss: 1.4549 - val_accuracy: 0.4838 - val_loss: 1.4228
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.4986 - loss: 1.4055 - val_accuracy: 0.5022 - val_loss: 1.3896
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.5134 - loss: 1.3678 - val_accuracy: 0.5118 - val_loss: 1.3802
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.5267 - loss: 1.3307 - val_accuracy: 0.5186 - val_loss: 1.3492
Epoch 7/20
[1m352/352[0m 

In [None]:
# Check the accuracy on the test set
score_cnn_5 = cnn_5.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_5[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_5[1]))

Test Loss: 1.185
Test Accuracy: 0.585


Let's take a look at our (running) test set scorecard:

In [None]:
# Display the Test Loss and Test Accuracy for the three models
print("Baseline CNN Test Loss: {:.3f}".format(score_cnn[0]))
print("Baseline CNN Test Accuracy: {:.3f}".format(score_cnn[1]))
print("CNN 5 Test Loss: {:.3f}".format(score_cnn_5[0]))
print("CNN 5 Test Accuracy: {:.3f}".format(score_cnn_5[1]))

Baseline CNN Test Loss: 0.971
Baseline CNN Test Accuracy: 0.674
CNN 5 Test Loss: 1.185
CNN 5 Test Accuracy: 0.585


For our third architecture, we’ll increase the number of filters in the convolutional layers (from 32 to 64) to allow the model to learn a wider variety of features. We’ll also double the size of the dense layer (from 64 to 128 units) to increase how many different combinations of features the model can test before it picks a label. The kernel size will remain at 3×3, balancing fine detail capture with computational efficiency. The goal is to strike a balance between simplicity and accuracy.

In [None]:
# Initialize the new Sequential CNN model
cnn_custom = Sequential()

# Add the first convolutional layer with 32 filters and a 3x3 kernel size
cnn_custom.add(Conv2D(32, kernel_size=(3, 3),
                      activation='relu',
                      input_shape=input_shape))

# Add the first max pooling layer
cnn_custom.add(MaxPooling2D(pool_size=(2, 2)))

# Add the second convolutional layer with 64 filters
cnn_custom.add(Conv2D(64, (3, 3), activation='relu'))

# Add the second max pooling layer
cnn_custom.add(MaxPooling2D(pool_size=(2, 2)))

# Add the flattening layer
cnn_custom.add(Flatten())

# Add the fully connected layer with 128 neurons
cnn_custom.add(Dense(128, activation='relu'))

# Add the final output layer with 10 neurons
cnn_custom.add(Dense(num_classes, activation='softmax'))

# Now to cmomplie the model
cnn_custom.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

# Check the model summary
cnn_custom.summary()

And now to train our model.

In [None]:
# Train the model
history_cnn_custom = cnn_custom.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 13ms/step - accuracy: 0.3253 - loss: 1.8404 - val_accuracy: 0.5330 - val_loss: 1.3290
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.5412 - loss: 1.2926 - val_accuracy: 0.5670 - val_loss: 1.2076
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6005 - loss: 1.1396 - val_accuracy: 0.6204 - val_loss: 1.0978
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6385 - loss: 1.0417 - val_accuracy: 0.6436 - val_loss: 1.0211
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6704 - loss: 0.9582 - val_accuracy: 0.6756 - val_loss: 0.9441
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6883 - loss: 0.9070 - val_accuracy: 0.6768 - val_loss: 0.9506
Epoch 7/20
[1m352/352[0m 

This model is far and away the best in terms of training accuracy, but the validation accuracy results are pointing toward potential overfitting. Let's find out via the test set.

In [None]:
# Evaluate the model on the test set
score_cnn_custom = cnn_custom.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_custom[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_custom[1]))

Test Loss: 1.053
Test Accuracy: 0.695


**That's our best test set accuracy score yet**. Let's contextualize it first with the scorecard before seeing what we can do with batch normalization and dropout.

In [None]:
# Display the Test Loss and Test Accuracy for the three models
print("Baseline CNN Test Loss: {:.3f}".format(score_cnn[0]))
print("Baseline CNN Test Accuracy: {:.3f}".format(score_cnn[1]))
print("CNN 5 Test Loss: {:.3f}".format(score_cnn_5[0]))
print("CNN 5 Test Accuracy: {:.3f}".format(score_cnn_5[1]))
print("CNN Custom Test Loss: {:.3f}".format(score_cnn_custom[0]))
print("CNN Custom Test Accuracy: {:.3f}".format(score_cnn_custom[1]))

Baseline CNN Test Loss: 0.971
Baseline CNN Test Accuracy: 0.674
CNN 5 Test Loss: 1.185
CNN 5 Test Accuracy: 0.585
CNN Custom Test Loss: 1.053
CNN Custom Test Accuracy: 0.695


## Add batch normalization and dropout to the baseline CNNs


We'll start with the original baseline CNN -- but with batch normalization and dropout.

In [None]:
# Start with the imports
from keras.layers import Dropout, BatchNormalization

# Initialize the Sequential CNN model
cnn_reg = Sequential()

# Add the first convolutional layer
cnn_reg.add(Conv2D(32, kernel_size=(3, 3),
               activation='relu',
               input_shape=input_shape))

# Add BatchNormalization and Dropout
cnn_reg.add(BatchNormalization())
cnn_reg.add(Dropout(0.25))
cnn_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Add another convolutional layer
cnn_reg.add(Conv2D(32, (3, 3), activation='relu'))
cnn_reg.add(BatchNormalization())
cnn_reg.add(Dropout(0.25))
cnn_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten the output
cnn_reg.add(Flatten())

# Dense, or fully connected layer
cnn_reg.add(Dense(64, activation='relu'))
cnn_reg.add(BatchNormalization())

# Increase Dropout for Dense layer
cnn_reg.add(Dropout(0.5))

# Final output layer
cnn_reg.add(Dense(num_classes, activation='softmax'))

# Now compile the model
cnn_reg.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

# Check the model summary
cnn_reg.summary()

Let's train the model.

In [None]:
# Train the model
history_cnn_reg = cnn_reg.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 21ms/step - accuracy: 0.3038 - loss: 2.2107 - val_accuracy: 0.1758 - val_loss: 3.1620
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.5073 - loss: 1.3951 - val_accuracy: 0.5010 - val_loss: 1.3732
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.5708 - loss: 1.2181 - val_accuracy: 0.5954 - val_loss: 1.1346
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.5981 - loss: 1.1408 - val_accuracy: 0.5802 - val_loss: 1.1781
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.6237 - loss: 1.0671 - val_accuracy: 0.6270 - val_loss: 1.0534
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6419 - loss: 1.0263 - val_accuracy: 0.6496 - val_loss: 1.0574
Epoch 7/20
[1m352/352[0

Let's check the test accuracy of this "baseline" batch normalization and dropout model that appears to be outperforming the baseline that didn't utilize those techniques.

In [None]:
# Evaluate the model on the test set
score_cnn_reg = cnn_reg.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_reg[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_reg[1]))

Test Loss: 0.979
Test Accuracy: 0.661


Next, let's build the second batch normalization/dropout model.

In [None]:
# Initialize the new Sequential CNN model with regularization
cnn_5_reg = Sequential()

# Add the first convolutional layer -- 5x5 kernel, 8 filters
cnn_5_reg.add(Conv2D(8, kernel_size=(5, 5), activation='relu', input_shape=input_shape))

# Add BatchNormalization and Dropout
cnn_5_reg.add(BatchNormalization())
cnn_5_reg.add(Dropout(0.25))
cnn_5_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Add another convolutional layer
cnn_5_reg.add(Conv2D(8, (5, 5), activation='relu'))
cnn_5_reg.add(BatchNormalization())
cnn_5_reg.add(Dropout(0.25))
cnn_5_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten the output
cnn_5_reg.add(Flatten())

# Add a dense layer
cnn_5_reg.add(Dense(64, activation='relu'))

# Add BatchNormalization and Dropout
cnn_5_reg.add(BatchNormalization())
cnn_5_reg.add(Dropout(0.5))

# Final output layer
cnn_5_reg.add(Dense(num_classes, activation='softmax'))

# Now compile the model
cnn_5_reg.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

# Check the model summary
cnn_5_reg.summary()

Now we're ready to train the model.

In [None]:
# Train the model
history_cnn_5_reg = cnn_5_reg.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 21ms/step - accuracy: 0.2099 - loss: 2.5823 - val_accuracy: 0.1536 - val_loss: 2.6014
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 5ms/step - accuracy: 0.3683 - loss: 1.7341 - val_accuracy: 0.3462 - val_loss: 1.7721
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.4274 - loss: 1.5787 - val_accuracy: 0.4416 - val_loss: 1.6024
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.4621 - loss: 1.4993 - val_accuracy: 0.3454 - val_loss: 1.9334
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.4742 - loss: 1.4549 - val_accuracy: 0.4182 - val_loss: 1.6166
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.4946 - loss: 1.4088 - val_accuracy: 0.4060 - val_loss: 1.6762
Epoch 7/20
[1m352/352[0

This smaller model is lagging significantly in performance compared to the model we just trained. Let's take a look at the test set accuracy.

In [None]:
# Evaluate the model on the test set
score_cnn_5_reg = cnn_5_reg.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_5_reg[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_5_reg[1]))

Test Loss: 1.365
Test Accuracy: 0.504


The test accuracy, as we'd expect based on the training and validation numbers, is quite low. Let's see what sort of performance we get from the final boosted model. It's worth noting that the non-batch normalization/dropout version of this model (`cnn_custom`) was the top performer of the opening round.

In [None]:
# Initialize the new Sequential CNN model -- with batch normalization and dropout
cnn_custom_reg = Sequential()

# Build the first convolutional layer: 32 filters, 3x3 kernel
cnn_custom_reg.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
cnn_custom_reg.add(BatchNormalization())
cnn_custom_reg.add(Dropout(0.25))
cnn_custom_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Add the second convolutional layer: 64 filters, 3x3 kernel
cnn_custom_reg.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
cnn_custom_reg.add(BatchNormalization())
cnn_custom_reg.add(Dropout(0.25))
cnn_custom_reg.add(MaxPooling2D(pool_size=(2, 2)))

# Now, flatten the feature maps
cnn_custom_reg.add(Flatten())

# Add the dense, or fully connected layer
cnn_custom_reg.add(Dense(128, activation='relu'))
cnn_custom_reg.add(BatchNormalization())
cnn_custom_reg.add(Dropout(0.5))

# Add the output layer
cnn_custom_reg.add(Dense(num_classes, activation='softmax'))

# Compile the model
cnn_custom_reg.compile("adam", "categorical_crossentropy", metrics=['accuracy'])

# Check the model summary
cnn_custom_reg.summary()

Now to train the final model.

In [None]:
# Train the model
history_cnn_custom_reg = cnn_custom_reg.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_split = 0.1)

Epoch 1/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 21ms/step - accuracy: 0.3598 - loss: 2.0446 - val_accuracy: 0.2090 - val_loss: 3.5296
Epoch 2/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.5802 - loss: 1.1985 - val_accuracy: 0.2500 - val_loss: 2.6303
Epoch 3/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.6354 - loss: 1.0387 - val_accuracy: 0.5802 - val_loss: 1.2718
Epoch 4/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6711 - loss: 0.9484 - val_accuracy: 0.6734 - val_loss: 0.9302
Epoch 5/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6908 - loss: 0.8889 - val_accuracy: 0.6646 - val_loss: 0.9891
Epoch 6/20
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.7057 - loss: 0.8501 - val_accuracy: 0.6552 - val_loss: 0.9850
Epoch 7/20
[1m352/352[0

In [None]:
# Evaluate the model on the test set
score_cnn_custom_reg = cnn_custom_reg.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_custom_reg[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_custom_reg[1]))

Test Loss: 0.841
Test Accuracy: 0.718


These were not quite the results we were expecting. The regularized model is only outperforming its baseline counterpart by roughly 2% on test set accuracy. Before exploring what happened, let's review the full scorecard:

In [None]:
# Display the Test Loss and Test Accuracy for the three models
print("Baseline CNN Test Loss: {:.3f}".format(score_cnn[0]))
print("Baseline CNN Test Accuracy: {:.3f}".format(score_cnn[1]))
print("CNN 5 Test Loss: {:.3f}".format(score_cnn_5[0]))
print("CNN 5 Test Accuracy: {:.3f}".format(score_cnn_5[1]))
print("CNN Custom Test Loss: {:.3f}".format(score_cnn_custom[0]))
print("CNN Custom Test Accuracy: {:.3f}".format(score_cnn_custom[1]))
print("Baseline CNN Reg Test Loss: {:.3f}".format(score_cnn_reg[0]))
print("Baseline CNN Reg Test Accuracy: {:.3f}".format(score_cnn_reg[1]))
print("CNN 5 Reg Test Loss: {:.3f}".format(score_cnn_5_reg[0]))
print("CNN 5 Reg Test Accuracy: {:.3f}".format(score_cnn_5_reg[1]))
print("CNN Custom Reg Test Loss: {:.3f}".format(score_cnn_custom_reg[0]))
print("CNN Custom Reg Test Accuracy: {:.3f}".format(score_cnn_custom_reg[1]))

Baseline CNN Test Loss: 0.971
Baseline CNN Test Accuracy: 0.674
CNN 5 Test Loss: 1.185
CNN 5 Test Accuracy: 0.585
CNN Custom Test Loss: 1.053
CNN Custom Test Accuracy: 0.695
Baseline CNN Reg Test Loss: 0.979
Baseline CNN Reg Test Accuracy: 0.661
CNN 5 Reg Test Loss: 1.365
CNN 5 Reg Test Accuracy: 0.504
CNN Custom Reg Test Loss: 0.841
CNN Cutom Reg Test Accuracy: 0.718


**Takeaway**: Despite expectations that additional regularization would significantly improve generalization, adding batch normalization and dropout to the CNN architectures didn't help these models consistently outperform their simpler counterparts. In fact the first two regularized models were *worse* than the first two baseline CNNs, while the final version improved its test performance slightly, by roughly 2% over its baseline.

These results suggest that, in this case, the extra regularization may have limited the models’ ability to fully learn from the data, effectively causing underfitting -- especially given the relatively small model sizes and limited training time.

**Next step**: For the next battle, we'll push the number of epochs much higher -- perhaps to 100 -- to give the regularized models more time to learn and see if they can close the gap. Let's see how it goes.

In [None]:
# Train the model -- same exact model as the last one we ran but pushing epochs up to 100!
history_cnn_custom_reg_100 = cnn_custom_reg.fit(X_train, y_train, batch_size=128, epochs=100, verbose=1, validation_split = 0.1)

Epoch 1/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.8095 - loss: 0.5331 - val_accuracy: 0.7088 - val_loss: 0.8543
Epoch 2/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.8125 - loss: 0.5258 - val_accuracy: 0.6964 - val_loss: 0.9279
Epoch 3/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.8142 - loss: 0.5235 - val_accuracy: 0.7148 - val_loss: 0.8737
Epoch 4/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.8231 - loss: 0.4978 - val_accuracy: 0.6808 - val_loss: 0.9957
Epoch 5/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.8177 - loss: 0.5039 - val_accuracy: 0.7300 - val_loss: 0.8280
Epoch 6/100
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.8302 - loss: 0.4800 - val_accuracy: 0.7452 - val_loss: 0.7810
Epoch 7/100
[1m352/35

Let's check the accuracy on the test set.

In [None]:
# Evaluate the model on the test set
score_cnn_custom_reg = cnn_custom_reg.evaluate(X_test, y_test, verbose=0)

# Display the Test Loss and Test Accuracy
print("Test Loss: {:.3f}".format(score_cnn_custom_reg[0]))
print("Test Accuracy: {:.3f}".format(score_cnn_custom_reg[1]))

Test Loss: 0.837
Test Accuracy: 0.753


**Final takeaway**: There it is. Our highest test set accuracy yet, climbing from 71.8% to 75.3%. While not a dramatic leap, this gain shows that what initially appeared to be a broken model architecture was really just underfitting from too short of a training schedule.

**Lesson learned**: Model development is an iterative process. Early results can be misleading, especially when models aren't provided enough time to learn. With more epochs to converge, regularization stabilized learning and ultimately delivered better generalization.

As always, keep building.