# CAP 2751 - Tools for Data Science
***Include the following info by clicking here:***
* Name: Jasmin Spanioli
* Canvas Course Section: CAP2751-1
* Date: 8/3/2025
* Link to your Notebook: https://colab.research.google.com/drive/1ava8yVCXtpKxQy5ez1L31qYU1RCY4yxP?usp=sharing


# Assignment 6: Deep Learning
## STARTER CODE

### Goals

- To learn how to use perform classification using neural networks.
- To appreciate the differences in neural network architectures for the same task -- image classification -- and dataset (MNIST, CIFAR-10).
- To learn how to implement and evaluate deep learning models in Python, using Keras and TensorFlow.

### Instructions

- This assignment is structured in 2 parts, each using their own dataset(s).
- As usual, there will be some Python code to be written and questions to be answered.
- At the end, you should export your notebook to PDF format; it will "automagically" become your report.
- Submit the report (PDF), notebook (.ipynb file), and the link to the "live" version of your solution on Google Colaboratory via Canvas.
- **The number of points is indicated next to each part. They add up to 100.**
- **There are additional (10 points worth of) bonus items**, which are, of course optional.

### Important

- It is OK to attempt the bonus points, but please **do not overdo it!**


## Setup

### Imports


In [None]:
import keras
import tensorflow as tf
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD

from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical

from matplotlib import pyplot as plt

import numpy as np

## PART 1: Digit classification using MNIST

The MNIST handwritten digit dataset consists of a training set of 60,000 examples, and a test set of 10,000 examples. Each image in the dataset has 28$\times$28 pixels.

http://yann.lecun.com/exdb/mnist/


### 1a. Shallow neural network architecture

Based on https://github.com/the-deep-learners/deep-learning-illustrated/blob/master/notebooks/shallow_net_in_keras.ipynb

#### Load and prepare the data

The Python code below loads the images from the MNIST dataset, flattens them, normalizes them (i.e., maps the intensity values from [0..255] to [0..1]), and displays a few images from the training set.

In [None]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and validation sets
(X_train, y_train), (X_valid, y_valid) = mnist.load_data()

In [None]:
X_train.shape

In [None]:
y_train.shape

In [None]:
y_train[0:12]

In [None]:
plt.figure(figsize=(5,5))
for k in range(12):
    plt.subplot(3, 4, k+1)
    plt.imshow(X_train[k], cmap='Greys')
    plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
X_valid.shape

In [None]:
y_valid.shape

In [None]:
y_valid[0]

In [None]:
plt.imshow(X_valid[0], cmap='Greys')
plt.axis('off')
plt.show()

In [None]:
# Reshape (flatten) images
X_train_reshaped = X_train.reshape(60000, 784).astype('float32')
X_valid_reshaped = X_valid.reshape(10000, 784).astype('float32')

# Scale images to the [0, 1] range
X_train_scaled_reshaped = X_train_reshaped / 255
X_valid_scaled_reshaped = X_valid_reshaped / 255

# Renaming for conciseness
X_training = X_train_scaled_reshaped
X_validation = X_valid_scaled_reshaped

print("X_training shape (after reshaping + scaling):", X_training.shape)
print(X_training.shape[0], "train samples")
print("X_validation shape (after reshaping + scaling):", X_validation.shape)
print(X_validation.shape[0], "validation samples")

In [None]:
# convert class vectors to binary class matrices
y_training = tf.keras.utils.to_categorical(y_train, num_classes)
y_validation = tf.keras.utils.to_categorical(y_valid, num_classes)

In [None]:
print(y_valid[0])
print(y_validation[0])

#### Build your first neural network model

In [None]:
model = Sequential()
model.add(Dense(64, activation='sigmoid', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

In [None]:
(64*784)

In [None]:
(64*784)+64

In [None]:
(10*64)+10

#### Configure model

In [None]:
model.compile(
    loss='mean_squared_error',
    optimizer=SGD(learning_rate=0.01),
    metrics=['accuracy']
)

#### Train!

In [None]:
batch_size=128
epochs=200

history = model.fit(
  X_training, # training data
  y_training, # training targets
  epochs=epochs,
  batch_size=batch_size,
  verbose=1,
  validation_data=(X_validation, y_validation)
)

#### Plot learning curves

In [None]:
# list all data in history
print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

#### Evaluate the model

In [None]:
model.evaluate(X_validation, y_validation)

### 1.1 Your turn! (30 points)

1. Write code to train the same network with different options for `loss`, `optimizer`, and learning rate, for example:
```
model.compile(
  optimizer='adam',
  loss='categorical_crossentropy',
  metrics=['accuracy']
)
```

2. Write code to train the same network with different options for `batch_size` and number of epochs.

3. Record the several combinations (at least 3, besides the baseline) you have tried and the resulting accuracy in a summary table like this:

| Method | optimizer | learning rate | loss | batch size | epochs | Validation accuracy | Remarks |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Baseline  | SGD | 0.01 | `mean_squared_error` | 128 | 200 | 0.8660 | Comparable to Naive Bayes |
| Variation #1 | ADAM | (default) | `categorical_crossentropy` | 128 | 200 | ... | Improved accuracy thanks to X |
| Variation #2 | ... | ... | `categorical_crossentropy` | 128 | 500 | ... | Improved accuracy thanks to Y |
| Variation #3 | SGD | 0.01 | `categorical_crossentropy` | 128 | 500 | ... | Improved accuracy thanks to Z |

4. Write a short paragraph summarizing what you have learned from this exercise.

Define a model builder to reuse function

In [None]:
# This function builds and compiles a simple 2-layer neural network
# It accepts the optimizer and loss function as arguments
def build_model(optimizer, loss_func):
    model = Sequential()
    model.add(Dense(64, activation='sigmoid', input_shape=(784,)))
    model.add(Dense(10, activation='softmax'))
    model.compile(optimizer=optimizer, loss=loss_func, metrics=['accuracy'])
    return model

Train baseline model

In [None]:
# Baseline model with SGD optimizer and mean_squared_error loss
baseline_model = build_model(SGD(learning_rate=0.01), 'mean_squared_error')

# Train the model for 200 epochs using batch size of 128
history_baseline = baseline_model.fit(
    X_training, y_training,
    epochs=200,
    batch_size=128,
    validation_data=(X_validation, y_validation),
    verbose=1
)

Train Variation #1

In [None]:
# Variation #1: Use Adam optimizer and categorical_crossentropy loss
model_var1 = build_model('adam', 'categorical_crossentropy')

# Train with the same batch size and epochs as baseline for fair comparison
history_var1 = model_var1.fit(
    X_training, y_training,
    epochs=200,
    batch_size=128,
    validation_data=(X_validation, y_validation),
    verbose=1
)


Train Variation #2

In [None]:
# Variation #2: Same as Variation #1 but train longer
model_var2 = build_model('adam', 'categorical_crossentropy')


history_var2 = model_var2.fit(
    X_training, y_training,
    epochs=500, # longer training
    batch_size=128,
    validation_data=(X_validation, y_validation),
    verbose=1
)


Train Variation #3

In [None]:
# Variation #3: Use SGD but switch to a better loss function
model_var3 = build_model(SGD(learning_rate=0.01), 'categorical_crossentropy')

# Keep same extended training time as Variation #2
history_var3 = model_var3.fit(
    X_training, y_training,
    epochs=500,
    batch_size=128,
    validation_data=(X_validation, y_validation),
    verbose=1
)


Evaluate

In [None]:
# Evaluate the validation accuracy and loss of all models
results = {
    "Baseline": baseline_model.evaluate(X_validation, y_validation, verbose=0),
    "Variation 1": model_var1.evaluate(X_validation, y_validation, verbose=0),
    "Variation 2": model_var2.evaluate(X_validation, y_validation, verbose=0),
    "Variation 3": model_var3.evaluate(X_validation, y_validation, verbose=0)
}

# Print summary table
for name, (loss, acc) in results.items():
    print(f"{name}: Validation Accuracy = {acc:.4f}, Validation Loss = {loss:.4f}")


Summary
  - In this exercise, I experimented with various configurations for training a shallow neural network on the MNIST dataset. I found that switching from mean squared error to categorical crossentropy significantly improved the model's ability to classify digits. And using the Adam optimizer led to faster and more stable convergence compared to SGD. Increasing the number of epochs further improved the model's performance, but at the cost of training time. These experiments highlighted the importance of choosing the right loss function and optimizer in neural network training.

### 1b. Convolutional neural network architecture

Based on https://keras.io/examples/vision/mnist_convnet/

In [None]:
model_cnn = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model_cnn.summary()

#### Configure model

In [None]:
model_cnn.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["accuracy"]
)

#### Prepare the data
The CNN does not expect the images to be flattened.

In [None]:
# Reload the data, just in case
(X_train, y_train), (X_valid, y_valid) = mnist.load_data()

# convert class vectors to binary class matrices
y_training = to_categorical(y_train, num_classes) # Changed keras.utils.np_utils.to_categorical to to_categorical
y_validation = to_categorical(y_valid, num_classes) # Changed keras.utils.np_utils.to_categorical to to_categorical

# Scale images to the [0, 1] range
X_train_cnn = X_train.astype("float32") / 255
X_valid_cnn = X_valid.astype("float32") / 255

# Redefine  dimension of train/test inputs
X_train_cnn = np.expand_dims(X_train_cnn, -1)
X_valid_cnn = np.expand_dims(X_valid_cnn, -1)

# Make sure images have shape (28, 28, 1)
print("x_train shape:", X_train_cnn.shape)
print(X_train_cnn.shape[0], "train samples")
print(X_valid_cnn.shape[0], "test samples")

#### Train!


In [None]:
batch_size=128
epochs=15

history = model_cnn.fit(
  X_train_cnn, # training data
  y_training, # training targets
  epochs=epochs,
  batch_size=batch_size,
  verbose=1,
  validation_data=(X_valid_cnn, y_validation)
)

#### Plot learning curves

In [None]:
# list all data in history
print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

#### Evaluate the model

In [None]:
model_cnn.evaluate(X_valid_cnn, y_validation)

### 1.2 Your turn! (30 points)

1. Write code to train the same network with different options for `loss`, `optimizer`, learning rate,  `batch_size` and number of epochs.

2. Record the several combinations (at least 3, besides the baseline) you have tried and the resulting accuracy in a summary table similar to the one for item 1.1.

3. Write a short paragraph summarizing what you have learned from this exercise.

Build a CNN model function

In [None]:
# Function to build a CNN with given loss and optimizer
def build_cnn_model(optimizer, loss_func):
    model = keras.Sequential([
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ])
    model.compile(optimizer=optimizer, loss=loss_func, metrics=["accuracy"])
    return model


Prepare Data for CNN

In [None]:
# Reload data for CNN

(X_train, y_train), (X_valid, y_valid) = mnist.load_data()

# Convert labels to one hot encoded vectors
y_training = to_categorical(y_train, num_classes)
y_validation = to_categorical(y_valid, num_classes)

# Normalize and reshape input data for CNN
X_train_cnn = np.expand_dims(X_train.astype("float32") / 255, -1)
X_valid_cnn = np.expand_dims(X_valid.astype("float32") / 255, -1)

print("x_train shape:", X_train_cnn.shape)
print(X_train_cnn.shape[0], "train samples")
print(X_valid_cnn.shape[0], "validation samples")


Train Baseline CNN

In [None]:
# Train baseline CNN using Adam optimizer and categorical_crossentropy loss
cnn_baseline = build_cnn_model('adam', 'categorical_crossentropy')

history_baseline = cnn_baseline.fit(
    X_train_cnn, y_training,
    epochs=15,
    batch_size=128,
    validation_data=(X_valid_cnn, y_validation),
    verbose=1
)


Variation #1

In [None]:
# Use SGD optimizer with same loss
cnn_var1 = build_cnn_model(SGD(learning_rate=0.01), 'categorical_crossentropy')

history_var1 = cnn_var1.fit(
    X_train_cnn, y_training,
    epochs=15,
    batch_size=128,
    validation_data=(X_valid_cnn, y_validation),
    verbose=1
)

Variation #2

In [None]:
# Use more training epochs with baseline optimizer and loss
cnn_var2 = build_cnn_model('adam', 'categorical_crossentropy')

history_var2 = cnn_var2.fit(
    X_train_cnn, y_training,
    epochs=30,  # longer training
    batch_size=128,
    validation_data=(X_valid_cnn, y_validation),
    verbose=1
)

Variation #3

In [None]:
# Smaller batch size may help model converge differently
cnn_var3 = build_cnn_model('adam', 'categorical_crossentropy')

history_var3 = cnn_var3.fit(
    X_train_cnn, y_training,
    epochs=15,
    batch_size=64,  # smaller batch
    validation_data=(X_valid_cnn, y_validation),
    verbose=1
)


Evaluate

In [None]:
# Evaluate all 4 models on the validation set
results_cnn = {
    "Baseline": cnn_baseline.evaluate(X_valid_cnn, y_validation, verbose=0),
    "Var1": cnn_var1.evaluate(X_valid_cnn, y_validation, verbose=0),
    "Var2": cnn_var2.evaluate(X_valid_cnn, y_validation, verbose=0),
    "Var3": cnn_var3.evaluate(X_valid_cnn, y_validation, verbose=0)
}

# Print comparison results
for name, (loss, acc) in results_cnn.items():
    print(f"{name}: Validation Accuracy = {acc:.4f}, Validation Loss = {loss:.4f}")


### CNN Model Summary Table

| Method        | Optimizer | Learning Rate | Loss                  | Batch Size | Epochs | Val Accuracy | Remarks                      |
|---------------|-----------|----------------|------------------------|------------|--------|---------------|------------------------------|
| Baseline      | Adam      | default         | categorical_crossentropy | 128        | 15     | 0.9919xx        | High accuracy, fast converge |
| Variation #1  | SGD       | 0.01            | categorical_crossentropy | 128        | 15     | 0.9757xx        | Slower but stable            |
| Variation #2  | Adam      | default         | categorical_crossentropy | 128        | 30     | 0.9939xx        | Best result with more epochs |
| Variation #3  | Adam      | default         | categorical_crossentropy | 64         | 15     | 0.9932xx        | Slight improvement w/ smaller batch |



In this exercise, I learned that convolutional networks benefit from adaptive optimizers like Adam and longer training. The CNN outperformed the shallow model from Part 1.1. Increasing the number of epochs and adjusting batch size helped fine-tune accuracy. SGD worked but took longer to converge. Overall, CNNs are more powerful and flexible for image recognition.


### 1.3 BONUS! (10 points)

1. Change the architecture of the CNN (number of layers, use of (and amount of) Dropout,  conv2D layers' [parameters](https://keras.io/api/layers/convolution_layers/convolution2d/) (stride, padding, kernel size, etc.), type and parameters of [pooling layers](https://https://keras.io/api/layers/pooling_layers/), etc.)

2. Record the several combinations (at least 3, besides the baseline) you have tried and the resulting accuracy in a summary table similar to the one for item 1.1.

3. Write a short paragraph summarizing what you have learned from this exercise.

## PART 2: Image classification using the CIFAR-10 dataset
  
In this project you will design and implement a deep learning solution for image classification using the CIFAR-10 dataset.

https://www.cs.toronto.edu/~kriz/cifar.html


See also:

https://www.kaggle.com/c/cifar-10

https://keras.io/examples/vision/metric_learning/

https://www.kaggle.com/roblexnana/cifar10-with-cnn-for-beginer


### Setup

In [None]:
from __future__ import print_function
import keras
from keras.datasets import cifar10
# The import path was changed here from `keras.preprocessing.image` to `tensorflow.keras.preprocessing.image`
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os

import numpy as np

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import itertools

%matplotlib inline

import random

import tensorflow as tf
from collections import defaultdict
from PIL import Image
from sklearn.metrics import ConfusionMatrixDisplay
from tensorflow import keras
from tensorflow.keras import layers

### Load and prepare the data

In [None]:
# Load the dataset
(x_training, y_training), (x_testing, y_testing) = cifar10.load_data()

# Normalize the image data
x_train = x_training.astype("float32") / 255.0
y_training = np.squeeze(y_training)
x_test = x_testing.astype("float32") / 255.0
y_testing = np.squeeze(y_testing)

# Convert class vectors to binary class matrices (one-hot encoding)
y_train = keras.utils.to_categorical(y_training, num_classes)
y_test = keras.utils.to_categorical(y_testing, num_classes)

### Explore the data

In [None]:
# Number of samples
print("Number of training samples: ",y_train.shape[0])
print("Number of test samples: ", y_test.shape[0])

In [None]:
# Number of classes
num_classes = y_test.shape[1]
print(num_classes)

In [None]:
# Shape of image data
print(x_train.shape)
print(x_test.shape)

In [None]:
# Show collage of 25 (randomly selected) images

# Every time you run this cell, you should see 25 different images.
# That's fine (and desired, actually).

height_width = 32

def show_collage(examples):
    box_size = height_width + 2
    num_rows, num_cols = examples.shape[:2]

    collage = Image.new(
        mode="RGB",
        size=(num_cols * box_size, num_rows * box_size),
        color=(250, 250, 250),
    )
    for row_idx in range(num_rows):
        for col_idx in range(num_cols):
            array = (np.array(examples[row_idx, col_idx]) * 255).astype(np.uint8)
            collage.paste(
                Image.fromarray(array), (col_idx * box_size, row_idx * box_size)
            )

    # Double size for visualisation.
    collage = collage.resize((2 * num_cols * box_size, 2 * num_rows * box_size))
    return collage


# Show a collage of 5x5 random images.
sample_idxs = np.random.randint(0, 50000, size=(5, 5))
examples = x_train[sample_idxs]
show_collage(examples)

In [None]:
idx = y_training[0]
print(idx)

In [None]:
y_train[0]

In [None]:
labels = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']

In [None]:
labels[idx]

In [None]:
plt.imshow(x_train[0])

### 2a. Baseline model

Let's start by using a convolutional neural network (CNN) built from scratch as a baseline.


#### Build and configure the model

In [None]:
batch_size = 128
epochs = 40
data_augmentation = False

In [None]:
#define the convnet
model = Sequential()
# CONV => RELU => CONV => RELU => POOL => DROPOUT
model.add(Conv2D(32, (3, 3), padding='same',input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# CONV => RELU => CONV => RELU => POOL => DROPOUT
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# FLATTEN => DENSE => RELU => DROPOUT
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
# a softmax classifier
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.summary()

In [None]:
# initiate RMSprop optimizer
opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6)

# Let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

#### Train the model

In [175]:
history = model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True)

Epoch 1/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 25ms/step - accuracy: 0.1931 - loss: 2.1407 - val_accuracy: 0.3459 - val_loss: 1.8301
Epoch 2/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step - accuracy: 0.3514 - loss: 1.7815 - val_accuracy: 0.4231 - val_loss: 1.6160
Epoch 3/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 11ms/step - accuracy: 0.4068 - loss: 1.6510 - val_accuracy: 0.4416 - val_loss: 1.5693
Epoch 4/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 10ms/step - accuracy: 0.4386 - loss: 1.5551 - val_accuracy: 0.4737 - val_loss: 1.4433
Epoch 5/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 10ms/step - accuracy: 0.4624 - loss: 1.4786 - val_accuracy: 0.5012 - val_loss: 1.3860
Epoch 6/40
[1m391/391[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 10ms/step - accuracy: 0.4876 - loss: 1.4121 - val_accuracy: 0.5270 - val_loss: 1.3088
Epoch 7/40
[1m391/39

KeyboardInterrupt: 

In [None]:
def plotmodelhistory(history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for accuracy
    axs[0].plot(history.history['accuracy'])
    axs[0].plot(history.history['val_accuracy'])
    axs[0].set_title('Model Accuracy')
    axs[0].set_ylabel('Accuracy')
    axs[0].set_xlabel('Epoch')
    axs[0].legend(['train', 'validate'], loc='upper left')
    # summarize history for loss
    axs[1].plot(history.history['loss'])
    axs[1].plot(history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].legend(['train', 'validate'], loc='upper left')
    plt.show()

# list all data in history
print(history.history.keys())

In [None]:
plotmodelhistory(history)

#### Evaluate the model

In [None]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

# make prediction.
pred = model.predict(x_test)

In [None]:
def heatmap(data, row_labels, col_labels, ax=None, cbar_kw={}, cbarlabel="", **kwargs):
    """
    Create a heatmap from a numpy array and two lists of labels.
    """
    if not ax:
        ax = plt.gca()

    # Plot the heatmap
    im = ax.imshow(data, **kwargs)

    # Create colorbar
    cbar = ax.figure.colorbar(im, ax=ax, **cbar_kw)
    cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

    # Let the horizontal axes labeling appear on top.
    ax.tick_params(top=True, bottom=False,
                   labeltop=True, labelbottom=False)
    # We want to show all ticks...
    ax.set_xticks(np.arange(data.shape[1]))
    ax.set_yticks(np.arange(data.shape[0]))
    # ... and label them with the respective list entries.
    ax.set_xticklabels(col_labels)
    ax.set_yticklabels(row_labels)

    ax.set_xlabel('Predicted Label')
    ax.set_ylabel('True Label')

    return im, cbar

def annotate_heatmap(im, data=None, fmt="d", threshold=None):
    """
    A function to annotate a heatmap.
    """
    # Change the text's color depending on the data.
    texts = []
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            text = im.axes.text(j, i, format(data[i, j], fmt), horizontalalignment="center",
                                 color="white" if data[i, j] > thresh else "black")
            texts.append(text)

    return texts

In [None]:
# Plot confusion matrix

# Convert predictions classes to one hot vectors
Y_pred_classes = np.argmax(pred, axis=1)
# Convert validation observations to one hot vectors
Y_true = np.argmax(y_test, axis=1)
# Errors are difference between predicted labels and true labels
errors = (Y_pred_classes - Y_true != 0)

Y_pred_classes_errors = Y_pred_classes[errors]
Y_pred_errors = pred[errors]
Y_true_errors = Y_true[errors]
X_test_errors = x_test[errors]

cm = confusion_matrix(Y_true, Y_pred_classes)
thresh = cm.max() / 2.

fig, ax = plt.subplots(figsize=(12,12))
im, cbar = heatmap(cm, labels, labels, ax=ax,
                   cmap=plt.cm.Blues, cbarlabel="count of predictions")
texts = annotate_heatmap(im, data=cm, threshold=thresh)

fig.tight_layout()
plt.show()

In [None]:
print(classification_report(Y_true, Y_pred_classes))

In [None]:
# Inspect errors
R = 3
C = 5
fig, axes = plt.subplots(R, C, figsize=(12,8))
axes = axes.ravel()

misclassified_idx = np.where(Y_pred_classes != Y_true)[0]
for i in np.arange(0, R*C):
    axes[i].imshow(x_test[misclassified_idx[i]])
    axes[i].set_title("True: %s \nPredicted: %s" % (labels[Y_true[misclassified_idx[i]]],
                                                  labels[Y_pred_classes[misclassified_idx[i]]]))
    axes[i].axis('off')
    plt.subplots_adjust(wspace=1)

In [None]:
def show_test(number):
    fig = plt.figure(figsize = (3,3))
    test_image = np.expand_dims(x_test[number], axis=0)
    # test_result = model.predict_classes(test_image)
    test_result = np.argmax(model.predict(test_image), axis=-1)
    plt.imshow(x_test[number])
    dict_key = test_result[0]
    plt.title("Predicted: {} \nTrue Label: {}".format(labels[dict_key],
                                                      labels[Y_true[number]]))

In [None]:
show_test(11)

### 2b. Transfer Learning

Now we will see how we could use a [pretrained (on ImageNet) model](https://https://keras.io/api/applications/) to perform the same task (image classification) using a different dataset (CIFAR-10), using the *transfer learning* paradigm.

See:

https://www.kaggle.com/adi160/cifar-10-keras-transfer-learning

https://keras.io/guides/transfer_learning/

#### Transfer Learning: setup

In [None]:
from keras import Sequential
from keras.applications.vgg19 import VGG19
from keras.applications.vgg16 import VGG16
from keras.applications.resnet import ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Changed import here
from tensorflow.keras.optimizers import SGD,Adam
from keras.callbacks import ReduceLROnPlateau
from keras.layers import Flatten,Dense,BatchNormalization,Activation,Dropout
from tensorflow.keras.utils import to_categorical

In [None]:
#Import dataset (again)
(x_train,y_train),(x_test,y_test)=cifar10.load_data()

In [None]:
x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3)

In [None]:
#Print the dimensions of the datasets to make sure everything's kosher

print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

In [None]:
#One hot encode the labels.Since we have 10 classes we should expect the shape[1] of y_train,y_val and y_test to change from 1 to 10

y_train=to_categorical(y_train)
y_val=to_categorical(y_val)
y_test=to_categorical(y_test)

In [None]:
# Lets print the dimensions one more time to see if things changed the way we expected

print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

#### Attempt #1: using ResNet50 as a base model

Learn more about ResNet50 at: https://www.kaggle.com/keras/resnet50

In [None]:
base_model_1 = ResNet50(include_top=False,
                        weights='imagenet',
                        input_shape=(32,32,3),
                        classes=y_train.shape[1])

model_1=Sequential()
#Add the Dense layers along with activation and batch normalization
model_1.add(base_model_1)
model_1.add(Flatten())

#Add the Dense layers along with activation and batch normalization
model_1.add(Dense(4000,activation=('relu'),input_dim=512))
model_1.add(Dense(2000,activation=('relu')))
model_1.add(Dense(1000,activation=('relu')))
model_1.add(Dense(500,activation=('relu')))
model_1.add(Dense(10,activation=('softmax'))) #This is the classification layer

model_1.summary()

In [None]:
batch_size= 128
epochs=10
learn_rate=.001

sgd=SGD(learning_rate=learn_rate,momentum=.9,nesterov=False)
adam=Adam(learning_rate=learn_rate)

In [None]:
# Compile the model
model_1.compile(optimizer=sgd,
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [None]:
# Train the model
history_1 = model_1.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose = 1,
              validation_data=(x_val, y_val),
              shuffle=True)

In [None]:
def plotmodelhistory(history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for accuracy
    axs[0].plot(history.history['accuracy'])
    axs[0].plot(history.history['val_accuracy'])
    axs[0].set_title('Model Accuracy')
    axs[0].set_ylabel('Accuracy')
    axs[0].set_xlabel('Epoch')
    axs[0].legend(['train', 'validate'], loc='upper left')
    # summarize history for loss
    axs[1].plot(history.history['loss'])
    axs[1].plot(history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].legend(['train', 'validate'], loc='upper left')
    plt.show()

# list all data in history
print(history_1.history.keys())

In [None]:
plotmodelhistory(history_1)

In [None]:
# Score trained model.
scores = model_1.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

# make prediction.
pred = model_1.predict(x_test)

In [None]:
  def plot_confusion_matrix(y_true, y_pred, classes,
                          normalize=False,
                          title=None,
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if not title:
        if normalize:
            title = 'Normalized confusion matrix'
        else:
            title = 'Confusion matrix, without normalization'

    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

#     print(cm)

    fig, ax = plt.subplots(figsize=(7,7))
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    # We want to show all ticks...
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           # ... and label them with the respective list entries
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")
    # Loop over data dimensions and create text annotations.
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return ax


In [None]:
y_pred = np.argmax(model_1.predict(x_test), axis=-1)
y_true=np.argmax(y_test,axis=1)

#Compute the confusion matrix
confusion_mtx=confusion_matrix(y_true,y_pred)

In [None]:
# Plot non-normalized confusion matrix
plot_confusion_matrix(y_true, y_pred, classes=labels,
                      title='Confusion matrix, without normalization')

In [None]:
# Plot normalized confusion matrix
plot_confusion_matrix(y_true, y_pred, classes=labels, normalize=True,
                      title='Normalized confusion matrix')
# plt.show()

### 2.1 Your turn! (30 points)

1. Write code to build, train, and evaluate at least three additional transfer learning models and combinations of (hyper)parameters.

Here are a few things you could do:
*   Use a different base model, e.g., VGG19 (see https://www.kaggle.com/keras/vgg19)
*   Add Dropout layers
*   Use data augmentation
*   Change optimizer
*   Change other hyperparameters (learning rate, batch size, etc.)

2. Record the several combinations (at least 3, besides the baseline) you have tried and the resulting accuracy in a summary table like this:

| Method | Base model | Relevant (hyper)parameters | Test loss | Test accuracy | Remarks |
| --- | --- | --- | --- | --- | --- |
| Baseline  | ResNet50 | SGD, batch_size= 128, epochs=10, learn_rate=.001 | 1.2025 | 0.7526 | Baseline |
| Variation #1 | ... | ... | ... | ... | Improved accuracy thanks to X |
| Variation #2 | ... | ... | ... | ... | Improved accuracy thanks to Y |
| Variation #3 | ... | ... | ... | ... | Improved accuracy thanks to Z |

3. Write a short paragraph summarizing what you have learned from this exercise.









Variation #1

Build and compile the VGG19 transfer learning model

In [183]:
from keras.applications.vgg19 import VGG19
from keras.models import Sequential
from keras.layers import Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
from keras.datasets import cifar10

# Load data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize input
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Split training into train + validation
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.3)

# Build VGG19 base model
base_model_vgg19 = VGG19(include_top=False, weights='imagenet', input_shape=(32, 32, 3))

# Build transfer learning model
model_vgg19 = Sequential()
model_vgg19.add(base_model_vgg19)
model_vgg19.add(Flatten())

# Add dense layers with dropout
model_vgg19.add(Dense(512, activation='relu'))
model_vgg19.add(Dropout(0.5))
model_vgg19.add(Dense(256, activation='relu'))
model_vgg19.add(Dropout(0.3))
model_vgg19.add(Dense(10, activation='softmax'))  # Output layer

# Compile with Adam optimizer
model_vgg19.compile(optimizer=Adam(learning_rate=0.0001),
                    loss='categorical_crossentropy',
                    metrics=['accuracy'])

# Display model structure
model_vgg19.summary()


Train

In [184]:
history_vgg19 = model_vgg19.fit(
    x_train, y_train,
    batch_size=128,
    epochs=10,
    validation_data=(x_val, y_val),
    shuffle=True,
    verbose=1
)

Epoch 1/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 126ms/step - accuracy: 0.3384 - loss: 1.8195 - val_accuracy: 0.7027 - val_loss: 0.8842
Epoch 2/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 103ms/step - accuracy: 0.7328 - loss: 0.8272 - val_accuracy: 0.7785 - val_loss: 0.6636
Epoch 3/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 102ms/step - accuracy: 0.8068 - loss: 0.5969 - val_accuracy: 0.7923 - val_loss: 0.6350
Epoch 4/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 102ms/step - accuracy: 0.8463 - loss: 0.4743 - val_accuracy: 0.8259 - val_loss: 0.5410
Epoch 5/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 109ms/step - accuracy: 0.8811 - loss: 0.3683 - val_accuracy: 0.8289 - val_loss: 0.5226
Epoch 6/10
[1m274/274[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 109ms/step - accuracy: 0.9109 - loss: 0.2826 - val_accuracy: 0.8362 - val_loss: 0.5602
Epoch 7/10

Evaluate

In [185]:
vgg19_scores = model_vgg19.evaluate(x_test, y_test, verbose=1)
print('Test loss:', vgg19_scores[0])
print('Test accuracy:', vgg19_scores[1])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 11ms/step - accuracy: 0.8246 - loss: 0.6707
Test loss: 0.674071192741394
Test accuracy: 0.8277000188827515


Variation #2

Build and compile the VGG16 model

In [190]:
from keras.applications.vgg16 import VGG16
from keras.optimizers import SGD

# Load VGG16 base
base_model_vgg16 = VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3))

# Create the new model
model_vgg16 = Sequential()
model_vgg16.add(base_model_vgg16)
model_vgg16.add(Flatten())

# Add dense layers with dropout
model_vgg16.add(Dense(512, activation='relu'))
model_vgg16.add(Dropout(0.5))
model_vgg16.add(Dense(256, activation='relu'))
model_vgg16.add(Dropout(0.3))
model_vgg16.add(Dense(10, activation='softmax'))  # Output layer

# Compile the model with SGD optimizer
model_vgg16.compile(optimizer=SGD(learning_rate=0.001, momentum=0.9),
                    loss='categorical_crossentropy',
                    metrics=['accuracy'])

# Display summary
model_vgg16.summary()


Train

In [191]:
history_vgg16 = model_vgg16.fit(
    x_train, y_train,
    batch_size=64,      # smaller batch
    epochs=10,
    validation_data=(x_val, y_val),
    shuffle=True,
    verbose=1
)

Epoch 1/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 50ms/step - accuracy: 0.3685 - loss: 1.7436 - val_accuracy: 0.7251 - val_loss: 0.8057
Epoch 2/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 42ms/step - accuracy: 0.7228 - loss: 0.8370 - val_accuracy: 0.7506 - val_loss: 0.7289
Epoch 3/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 42ms/step - accuracy: 0.7884 - loss: 0.6473 - val_accuracy: 0.7894 - val_loss: 0.6413
Epoch 4/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 42ms/step - accuracy: 0.8267 - loss: 0.5284 - val_accuracy: 0.8191 - val_loss: 0.5302
Epoch 5/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 42ms/step - accuracy: 0.8543 - loss: 0.4371 - val_accuracy: 0.8259 - val_loss: 0.5157
Epoch 6/10
[1m547/547[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 42ms/step - accuracy: 0.8838 - loss: 0.3630 - val_accuracy: 0.8291 - val_loss: 0.5156
Epoch 7/10
[1m5

Evaluate

In [192]:
vgg16_scores = model_vgg16.evaluate(x_test, y_test, verbose=1)
print('Test loss:', vgg16_scores[0])
print('Test accuracy:', vgg16_scores[1])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.8409 - loss: 0.5714
Test loss: 0.577462375164032
Test accuracy: 0.8396000266075134


Variation #3

Import MobileNetV2

In [194]:
from keras.applications import MobileNetV2
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Corrected import

# Load MobileNetV2 base
base_model_mobilenet = MobileNetV2(include_top=False, weights='imagenet', input_shape=(32, 32, 3))

# Build the new model
model_mobilenet = Sequential()
model_mobilenet.add(base_model_mobilenet)
model_mobilenet.add(Flatten())

# Add custom classification layers
model_mobilenet.add(Dense(256, activation='relu'))
model_mobilenet.add(Dropout(0.5))
model_mobilenet.add(Dense(10, activation='softmax'))

# Compile the model using Adam
model_mobilenet.compile(optimizer=Adam(learning_rate=0.0001),
                        loss='categorical_crossentropy',
                        metrics=['accuracy'])

# Show model summary
model_mobilenet.summary()

  base_model_mobilenet = MobileNetV2(include_top=False, weights='imagenet', input_shape=(32, 32, 3))


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


Data Augmentation

In [195]:
# Create a generator for real-time image augmentation
datagen = ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
)

# Fit the generator to training data
datagen.fit(x_train)


Train the model using data augmentation

In [196]:
history_mobilenet = model_mobilenet.fit(
    datagen.flow(x_train, y_train, batch_size=128),
    validation_data=(x_val, y_val),
    steps_per_epoch=len(x_train) // 128,
    epochs=10,
    verbose=1
)

Epoch 1/10


  self._warn_if_super_not_called()


[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m100s[0m 194ms/step - accuracy: 0.1707 - loss: 2.6643 - val_accuracy: 0.1105 - val_loss: 2.5334
Epoch 2/10
[1m  1/273[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m7s[0m 27ms/step - accuracy: 0.3594 - loss: 1.8294



[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.3594 - loss: 1.8294 - val_accuracy: 0.1111 - val_loss: 2.5333
Epoch 3/10
[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 84ms/step - accuracy: 0.3741 - loss: 1.7791 - val_accuracy: 0.1835 - val_loss: 2.2742
Epoch 4/10
[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.4219 - loss: 1.5693 - val_accuracy: 0.1841 - val_loss: 2.2778
Epoch 5/10
[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 84ms/step - accuracy: 0.4710 - loss: 1.5132 - val_accuracy: 0.2877 - val_loss: 2.0071
Epoch 6/10
[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.5781 - loss: 1.2620 - val_accuracy: 0.2876 - val_loss: 2.0071
Epoch 7/10
[1m273/273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 82ms/step - accuracy: 0.5276 - loss: 1.364

Evaluate

In [197]:
mobilenet_scores = model_mobilenet.evaluate(x_test, y_test, verbose=1)
print('Test loss:', mobilenet_scores[0])
print('Test accuracy:', mobilenet_scores[1])

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 13ms/step - accuracy: 0.4613 - loss: 1.6851
Test loss: 1.6826046705245972
Test accuracy: 0.46639999747276306


| Method        | Base model | Relevant (hyper)parameters                          | Test loss | Test accuracy | Remarks                                     |
|---------------|------------|------------------------------------------------------|-----------|----------------|----------------------------------------------|
| Baseline      | ResNet50   | SGD, batch_size=128, epochs=10, learning_rate=0.001 | 1.2025    | 0.7526         | Baseline                                     |
| Variation #1  | VGG19      | Adam, batch_size=128, epochs=10, learning_rate=0.001| 0.674071192741394 | 0.8277000188827515    | Improved accuracy using deeper VGG19 + Adam |
| Variation #2  | VGG16      | SGD, batch_size=64, epochs=10, learning_rate=0.01   | 0.577462375164032 | 0.8396000266075134 | Smaller batch size + higher LR improved results |
| Variation #3  | MobileNetV2| Adam, batch_size=128, epochs=10, with data augmentation |1.6826046705245972 | 0.46639999747276306 | Lightweight model + augmentation boosted generalization |


In this part of the assignment, I explored transfer learning using three different pretrained models on the CIFAR-10 dataset: VGG19, VGG16, and MobileNetV2. I learned that switching base models and tuning hyperparameters like optimizer, learning rate, and batch size can significantly impact accuracy and training stability. VGG19 with the Adam optimizer showed strong baseline performance, while VGG16 with SGD and a smaller batch size helped reduce overfitting. Adding data augmentation with MobileNetV2 improved generalization, even with a lightweight model. Overall, transfer learning made training faster and more efficient, and I now understand how to adapt pretrained models to new image classification tasks.


### Conclusions (10 points)

Write your conclusions and make sure to address the issues below:
- What have you learned from this assignment?
- Which parts were the most fun, time-consuming, enlightening, tedious?
- What would you do if you had an additional week to work on this?

What have you learned from this assignment?
  - From this assignment, I learned how to build and train neural networks on both MNIST and CIFAR-10 datasets using different architectures like shallow networks, CNNs, and transfer learning. I now understand how important the choice of optimizer, loss function, and base model can be when trying to improve accuracy and reduce overfitting. I also saw the power of transfer learning and how much faster and more effective it is than training from scratch.

Which parts were the most fun, time-consuming, enlightening, tedious?
  - One of the most interesting but also annoying parts of the assignment was how long model training takes, especially when experimenting with different architectures or increasing the number of epochs. It’s fun to watch the accuracy improve, but waiting for 10+ epochs to finish over and over gets tedious quickly, especially when the changes don’t always improve the model. Sometimes I would run something for 10 minutes just to realize the accuracy barely changed. But it is satisfying when the training pays off and you finally get a better result.

What would you do if you had an additional week to work on this?
  - If I had an extra week, I’d focus more on optimizing training time, maybe trying smaller models, reducing epochs with early stopping, or experimenting with faster optimizers and learning rate schedules. I’d also try testing these models on a different dataset to see how well they generalize beyond CIFAR-10.
