Project 1 - Group 7

Author: **Behzad Hosseini**, **Raffay Ahmed**, **Nick Bidler**, **Michael Clausen**, **Kristopher Curry**, **Janvier Uwase**,

Course: **Artificial Neural Network**

Section: **01**


---

# Project 1

For this project, our group worked with creating a convolutional neural network that is capable of recognizing hand-written letters. We also explored transfer learning during the last phase of this project.

# Global Imports

Below, we set up many of the environment variables that we needed for the duration of the project.

The first important decision that we made was to use keras over pytorch, since the majority of our team had more experience with keras. The second decision was that we decided to set up a shared Google Drive in order to dump data, models, and logs. Since the CSUF Google Account did not allow us to set up a shared google drive, we used an external account that one of our team members possessed to create the drive.

In [None]:
%matplotlib inline

# Check if the tensorboard extension is already loaded
if 'tensorboard' not in get_ipython().extension_manager.loaded:
    # Load the TensorBoard extension if it's not loaded
    %load_ext tensorboard
else:
    # Reload the TensorBoard extension if it's already loaded
    %reload_ext tensorboard

from IPython.display import display
import numpy as np
from sklearn.model_selection import KFold
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import datetime, os
import string
import shutil
from keras.optimizers import Adam, SGD
from keras.callbacks import ModelCheckpoint

# mount google drive to store and retrieve data
from google.colab import drive
drive.mount('/content/drive')


data_dir = "/content/drive/Shareddrives/csuf-585-p1-g7/Data"
models_dir = "/content/drive/Shareddrives/csuf-585-p1-g7/Models"
logs_dir = "/content/drive/Shareddrives/csuf-585-p1-g7/Logs"


# Set random seed
seed = 1234
np.random.seed(seed)
tf.random.set_seed(seed)

# Check if GPU is available
if tf.test.gpu_device_name():
  device = tf.device('GPU')
  print('Using GPU:', tf.test.gpu_device_name())
else:
  device = tf.device('CPU')
  print('Using CPU')

print(f"Tensorflow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

# A dictionary of labels in EMNIST letters dataset
emnist_labels = {i: letter for i, letter in enumerate(string.ascii_uppercase)}

MessageError: Error: credential propagation was unsuccessful

# Helper Functions

Below we create some other helper functions to clean up our code.

In [None]:
# remove all files and directories in the directory
def remove_dir_contents(dir_path):
  for file in os.listdir(dir_path):
      file_path = os.path.join(dir_path, file)
      try:
          if os.path.isfile(file_path):
              os.remove(file_path)
          elif os.path.isdir(file_path):
              shutil.rmtree(file_path)
      except Exception as e:
          print(e)


def visualize_image(image, label):
  # Visualize an image in the dataset
  plt.imshow(image, cmap='gray')

  # Get the original label
  original_label = np.argmax(label)

  print(f"Label (Lowercase | Uppercase) = {emnist_labels[original_label]}")


# Load emnist dataset and return various sets after removing unused class
def emnist_load_data():
  with np.load(os.path.join(data_dir, "emnist_letters.npz")) as f:
    (train_images, train_labels) = f["train_images"], f["train_labels"]
    (validate_images, validate_labels) = f["validate_images"], f["validate_labels"]
    (test_images, test_labels) = f["test_images"], f["test_labels"]

  # Remove first class (index 0)
  train_labels = np.delete(train_labels, 0, axis=1)

  validate_labels = np.delete(validate_labels, 0, axis=1)

  test_labels = np.delete(test_labels, 0, axis=1)

  return (train_images, train_labels), (validate_images, validate_labels), (test_images, test_labels)

# Part 1 - Warm-Up

During part 1, we mainly focused on running other datasets (Multilayer Perceptron, MNIST, and EMNIST). We had to document the accuracies as well throughout.

## Question 1
Open this notebook by Francois Chollet, which creates a simple Multilayer Perceptron as described in Section 2.1 of Deep Learning with Python, Second Edition. (Recall that this book is available from the library’s O’Reilly database.)
Chollet’s example uses the simpler MNIST dataset, which includes only handwritten digits. That dataset is included with Keras.
Run the model from this notebook. What accuracy does it achieve for MNIST?


**Loading the MNIST dataset in Keras**

In [None]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()

In [None]:
print(f"train_images.shape  =  {train_images.shape}\n")
print(f"len(train_labels)  =  {len(train_labels)}\n")
print(f"train_labels  =  {train_labels}\n")
print(f"test_images.shape  =  {test_images.shape}\n")
print(f"len(test_labels)  =  {len(test_labels)}\n")
print(f"test_labels  =  {test_labels}")

**The network architecture**

In [None]:
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

**The compilation step**

In [None]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

**Preparing the image data**

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

**"Fitting" the model**

In [None]:
history_part1 = model.fit(train_images, train_labels, epochs=5, batch_size=128)

In [None]:
## Visualize the accuracy and loss

plt.plot(history_part1.history['accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.show()

plt.plot(history_part1.history['loss'], color='red')
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.show()



Note on results: We noticed that the loss curve isn't as smooth (seems to be 3 different linear segments).

**Using the model to make predictions**

In [None]:
test_digits = test_images[0:10]
predictions = model.predict(test_digits)

print(f"predictions[0]  =  {predictions[0]}\n")
print(f"predictions[0].argmax()  =  {predictions[0].argmax()}\n")
print(f"predictions[0][7]  =  {predictions[0][7]}\n")
print(f"test_labels[0]  =  {test_labels[0]}")

**Evaluating the model on new data**

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

## Answer to question 1:
As it can be seen, the accuracy on the test set is 98%.

## Question 2 and Question 3
Load the EMNIST Letters dataset, and use plt.imshow() to verify that the image data has been loaded correctly and that the corresponding labels are correct.

**Loading the EMNIST letters dataset**

In [None]:
(train_images, train_labels), (validate_images, validate_labels), (test_images, test_labels) = emnist_load_data()

In [None]:
print("------------------------------ Training set -----------------------------\n")
print(f"train_images.shape  =  {train_images.shape}\n")
print(f"len(train_labels)  =  {len(train_labels)}\n")
print(f"train_labels  =  {train_labels}\n")

print("------------------------------ Validation set -----------------------------\n")
print(f"validate_images.shape  =  {validate_images.shape}\n")
print(f"len(validate_labels)  =  {len(validate_labels)}\n")
print(f"validate_labels  =  {validate_labels}\n")

print("------------------------------ Test set -----------------------------\n")
print(f"test_images.shape  =  {test_images.shape}\n")
print(f"len(test_labels)  =  {len(test_labels)}\n")
print(f"test_labels  =  {test_labels}")

In [None]:
# Visualize an image in the dataset
visualize_image(train_images[100].reshape(28, 28), train_labels[100])

**The network architecture**

In [None]:
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(26, activation="softmax")
])

**The compilation step**

In [None]:
model.compile(optimizer="rmsprop",
              loss="categorical_crossentropy",
              metrics=["accuracy"])

**"Fitting" the model**

In [None]:
history_part2 = model.fit(train_images, train_labels, epochs=5, batch_size=128)

In [None]:
## Visualize the accuracy and loss

plt.plot(history_part2.history['accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.show()

plt.plot(history_part2.history['loss'], color='red')
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.show()


**Using the model to make predictions**

In [None]:
test_digits = test_images[0:10]

predictions = model.predict(test_digits)

print(f"predictions[0]  =  {predictions[0]}\n")
print(f"predictions[0].argmax()  =  {predictions[0].argmax()}\n")
print(f"predictions[0][0]  =  {predictions[0][0]}\n")
print(f"test_labels[0]  =  {test_labels[0]}")

**Evaluating the model on new data**

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"test_acc: {test_acc}")

## Answer to question 2 and 3:
As can be seen, the accuracy on the test set is nearly 90%.

If we compare this model to the previous one (MNIST dataset), the accuracy of the current model with the EMNIST Letters dataset is about **8% lower**. As a result, the current neural network is unable to perfectly capture information from the EMNIST Letters dataset.

## Question 4
The Keras examples include a Simple MNIST convnet. Note the accuracy obtained by that code compared to the previous example from Chollet.
Apply the same architecture to the EMNIST Letters data. (Again, you are welcome to implement an equivalent architecture in PyTorch instead). What accuracy do you achieve? How does this compare with the accuracy for the MNIST? How does it compare with the accuracy for EMNIST that you saw with a Dense network in step (3)?


In [None]:
# Delete the contents of the log directory
# remove_dir_contents(logs_dir)

# Delete the contents of the model directory
# remove_dir_contents(models_dir)

In [None]:
# load emnist dataset into training, validation, and test sets
(train_images, train_labels), (validate_images, validate_labels), (test_images, test_labels) = emnist_load_data()

In [None]:
# Make sure images have shape (28, 28, 1)
train_images = train_images.reshape((-1, 28, 28, 1))
test_images = test_images.reshape((-1, 28, 28, 1))
validate_images = validate_images.reshape((-1, 28, 28, 1))

print("train_images shape:", train_images.shape)
print(train_images.shape[0], "train samples")
print(test_images.shape[0], "test samples")

In [None]:
# Visualize an image in the dataset
visualize_image(train_images[100], train_labels[100])

**Build the model**

In [None]:
# Model / data parameters
num_classes = 26
input_shape = (28, 28, 1)

model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

**Train the model**

In [None]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

logdir = os.path.join(logs_dir, "logs_q5", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

model.fit(train_images, train_labels, batch_size=batch_size, epochs=epochs, validation_data=(validate_images, validate_labels), callbacks=[tensorboard_callback])

In [None]:
logdir = os.path.join(logs_dir, "logs_q5")
%tensorboard --port 6004 --logdir $logdir

**Evaluate the trained model**

In [None]:
score = model.evaluate(test_images, test_labels, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

## Answer to question 4 (Part 1)
When we compare the accuracy of FCNN and CNN on the MNIST dataset, the CNN network outperforms FCNN by about 1%, according to the Simple MNIST convnet and the Simple Multilayer Perceptron from Francois Chollet.

-

We obtain nearly 93% accuracy if we utilize the CNN network and train it on the EMNIST Letters data after applying the same architecture as Francois Chollet's Simple MNIST convnet example.

-

If we compare the accuracy of the CNN network on the EMNIST Letters data and the MNIST data, we can see that the model's accuracy on the EMNIST Letters data is 93% while the model's accuracy on the MNIST data is 99%.

-

Using the EMNIST Letters data, the CNN model outperforms the FCNN model by over 3%.

# Part 2 - Main Event

Part 2 deals with training our own Convolutional Neural Network to perform image recognition. As mentioned later on, each member trained at least two models in this section. If we were working on collab, the results were automatically dumped into Google Drive. We took the best model based off of all of the training.

## Question 5
Add TensorBoard support to the CNN model you run in Part 1, and add TensorBoard to your notebook to visualize the training process.

We are adding TensorBoard to hopefully avoid running into dead-ends while training. Due to the training process slowing down drastically, it is imperative that we are able to catch mistakes early.

-

**Answer:**

***We have added TensorBoard support to the previous part.***

## Question 6

With the baseline CNN, we have to experiment and try different architectures to obtain the highest accuracy possible for the validation set.

For this stage, each member of the team trained multiple models with different parameters. We then proceeded to compare our accuracies in order to find the best model.

The final model we settled is shown below. This model produced the best accuracy out of our individual attempts, and also did not take as long to train as some other models.

In [None]:
# load emnist dataset into training, validation, and test sets
(train_images, train_labels), (validate_images, validate_labels), (test_images, test_labels) = emnist_load_data()

In [None]:
# Make sure images have shape (28, 28, 1)
train_images = train_images.reshape((-1, 28, 28, 1))
test_images = test_images.reshape((-1, 28, 28, 1))
validate_images = validate_images.reshape((-1, 28, 28, 1))

print("train_images shape:", train_images.shape)
print(train_images.shape[0], "train samples")
print(test_images.shape[0], "test samples")

In [None]:
# Visualize an image in the dataset
visualize_image(train_images[100], train_labels[100])

**Build the Model**

During our tuning phase, we edited this code block to try different hyperparameters to obtain the highest accuracy.

The model built below was deemed to be the best model from our training.

In [None]:
# Model / data parameters
num_classes = 26
input_shape = (28, 28, 1)
epochs = 15
batch_size = 256

# This model's name is used to answer the following questions in parts 2 and 3.
pretrained_model_name = f"model_ep_{epochs}_bs_{batch_size}_adam"

In [None]:
model = keras.models.Sequential(
    [keras.Input(shape=input_shape),
    layers.Conv2D(64, kernel_size=(5, 5), activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dropout(0.5),
    layers.Dense(2048, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(1024, activation="relu"),
    layers.Dense(num_classes, activation="softmax")]
    )

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(learning_rate=0.0004),
              metrics=['accuracy'])

# Use the ModelCheckpoint callback to train the model and save the best model in terms of validation accuracy.
checkpoint_filepath = os.path.join(models_dir, f'{pretrained_model_name}.h5')
checkpoint = ModelCheckpoint(checkpoint_filepath, monitor='val_accuracy', mode="max", save_best_only=True)

# Use the TensorBoard callback to save logs
logdir = os.path.join(logs_dir, "logs_q6", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

history = model.fit(x=train_images,y=train_labels,
                    batch_size=batch_size,
                    epochs=epochs,

                    validation_data=(validate_images, validate_labels),
                    callbacks=[tensorboard_callback, checkpoint])


**Logging**

In order to retain the logs created while building, we set up a logdir in a shared Google Drive. The following block will dump the logs in said drive.

In [None]:
logdir = os.path.join(logs_dir, "logs_q6")

%tensorboard --port 6003 --logdir  $logdir

## Question 7:
Load and Evaluate the Model

-

Now that our model is built and saved to the drive, we are able to automatically load the best model. After we load the model, we will then evaluate the model on the test set to see the test accuracy.

In [None]:
# Load the best model and evaluate on the test set

checkpoint_filepath = os.path.join(models_dir, f'{pretrained_model_name}.h5')
best_model = keras.models.load_model(checkpoint_filepath)
test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f'Model - Test accuracy: {test_acc}')

# Part 3: Transfer Learning

Here, we will be using transfer learning to see if our model can properly evaluate different datasets.

## Question 8:
The process of transfer learning can be used to apply an existing model to a new dataset. See Transfer learning & fine-tuning in the Keras Developer Guide or the Transfer Learning for Computer Vision Tutorial in the PyTorch Tutorials.
The images in the Binary Alphadigits dataset are a different size from those in EMNIST Letters. Use a function like tf.image.resize_with_pad(), PIL.ImageOps.pad(), or the PyTorch torchvision.transforms.Resize class to resize them into the right format for the network you trained in Part 2.

### Loading New Dataset

Here, we are going to load the new dataset, and visualize it.

This will be the dataset that we use for transfer learning with our trained model from earlier.

In [None]:
#it only has two files, btw
with np.load(os.path.join(data_dir, "binaryalphadigs.npz")) as f:
  (ad_images, ad_labels) = f["images"], f["labels"]

In [None]:
# what is the number, size, and shape of this dataset?
print('img shape ', ad_images.shape)
print('lbl shape ', ad_labels.shape)

print('trim unused class from labels')
# Remove first class (index 0)
ad_labels = np.delete(ad_labels, 0, axis=1)

# what is the number, size, and shape of this dataset after removing the first class?
print('img shape ', ad_images.shape)
print('lbl shape ', ad_labels.shape)

print('img zero ', ad_images[0])
print('lbl zero ', ad_labels[0])

print('img zero shape ', ad_images[0].shape)
print('lbl zero shape ', ad_labels[0].shape)

In [None]:
# Visualize an image in the dataset
visualize_image(ad_images[1000].reshape(20, 16), ad_labels[1000])

NOTE FROM THE ASSIGNMENT:<br>
**Note, however, that the resolution of the images is different in this dataset: 20×16 rather than 28×28.**
<br>
So we have to resize our "images" https://www.tensorflow.org/api_docs/python/tf/image/resize_with_pad.

The end result will be an array that will match the dimensions of our original array. This will be compatible with our model from earlier.

In [None]:
def preprocess_image(target_image):
    reshaped_img = tf.reshape(tf.convert_to_tensor(target_image, dtype=tf.float32), (20, 16, 1))
    return tf.image.resize_with_pad(
        image=reshaped_img,
        target_height=28,
        target_width=28,
        method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
    )

def preprocess_batch(images_batch):
    return tf.map_fn(preprocess_image, images_batch, dtype=tf.float32)


# Convert the dataset to TensorFlow tensors
ad_images = tf.convert_to_tensor(ad_images, dtype=tf.float32)

# resize the images
ad_images_processed = preprocess_batch(ad_images)

# Convert the dataset to numpy array
ad_images_processed = ad_images_processed.numpy()

In [None]:
visualize_image(ad_images_processed[100], ad_labels[100])

## Question 9:
Is the model you trained in Part 2 capable of recognizing letters from this new dataset?

-
## Using Model from Checkpoint
Now, with the model in hand from the saved checkpoint, let's take the weights out and evaluate on the new dataset.

In [None]:
# Split data into 80% train and 20% test subsets
train_images, test_images, train_labels, test_labels = train_test_split(ad_images_processed, ad_labels, test_size=0.2, random_state=42)

In [None]:
# Load our pre-trained model based on prev evaluation
checkpoint_filepath = os.path.join(models_dir, f'{pretrained_model_name}.h5')

pretrained_model = keras.models.load_model(checkpoint_filepath)

pretrained_model.summary()

In [None]:
# Evaluate our pretrained model on the test set of the new dataset
test_loss, test_acc = pretrained_model.evaluate(test_images, test_labels)
print(f'Model - Test accuracy: {test_acc}')

## Answer to question 9:
As can be seen, the accuracy on the test set is 81%. So, the model is able to recognize the images in the new dataset. However, the accuracy is not as good as the original dataset (EMNIST Dataset).

## Question 10:
Can you improve the performance on this dataset by adding additional trainable layers and fine-tuning the network?

### Train_Test_split technique

Here, we are applying the train/test split technique in order to prevent overfitting of the model to the entire dataset. We also use a validation set, which can act as an additional hyperparamter.

In [None]:
# Load our pre-trained model based on prev evaluation
checkpoint_filepath = os.path.join(models_dir, f'{pretrained_model_name}.h5')
pretrained_model = keras.models.load_model(checkpoint_filepath)

# Remove the last layer(s) - We only remove the output layer.
pretrained_model.pop()

# Make sure all layers in the pre-trained model are not trainable
for layer in pretrained_model.layers:
    layer.trainable = False

In [None]:
# Model / data parameters
num_classes = 26
epochs = 15
batch_size = 8
new_ds_best_model_name = f"new_ds_model_ep_{epochs}_bs_{batch_size}_adam"

# Create a new model with the modified pre-trained model and the new classification layer
new_model = tf.keras.Sequential([
    pretrained_model,
    # layers.Dense(1024, activation='relu'),
    # layers.Dropout(0.5),
    # layers.Dense(512, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(num_classes, activation="softmax")
])

# Train the model with the ModelCheckpoint callback to save the best model in terms of validation accuracy
checkpoint_filepath = os.path.join(models_dir, f'{new_ds_best_model_name}.h5')
checkpoint = ModelCheckpoint(checkpoint_filepath, monitor='val_accuracy', mode="max", save_best_only=True)

# Use the TensorBoard callback to save logs
logdir = os.path.join(logs_dir, "logs_q10", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

# Compile the new model
new_model.compile(optimizer=Adam(learning_rate=0.0005), loss="categorical_crossentropy", metrics=["accuracy"])


# Train the new model on the new dataset
new_model.fit(train_images, train_labels, epochs=epochs, batch_size=batch_size, validation_split=0.3, callbacks=[tensorboard_callback, checkpoint])

In [None]:
logdir = os.path.join(logs_dir, "logs_q10")

%tensorboard --port 6002 --logdir $logdir

In [None]:
# Load the best model
checkpoint_filepath = os.path.join(models_dir, f'{new_ds_best_model_name}.h5')
best_model = keras.models.load_model(checkpoint_filepath)

# Evaluate our model on the test set
test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f'Model - Test accuracy: {test_acc}')

As demonstrated above, using a pre-trained model (trained on the EMNIST Dataset) and re-training only the output layer improved accuracy from 81% to 92%. Training the output layer with the Binary Alphadigits dataset and adjusting hyperparameters appear to have aided in the accuracy improvement.

-

In this evaluation, we use the train/test split technique to evaluate the model. However, due to the short size of our dataset, we will additionally utilize the k-fold cross validation technique to evaluate the network and obtain a more accurate result.

### K-fold cross validation technique

K-fold cross is a unique technique that can evaluate the performance of the model when given new data.



In [None]:
# Load our pre-trained model based on prev evaluation
checkpoint_filepath = os.path.join(models_dir, f'{pretrained_model_name}.h5')
pretrained_model = keras.models.load_model(checkpoint_filepath)

# Remove the last layer(s)
pretrained_model.pop()

# Make sure all layers in the pre-trained model are not trainable
for layer in pretrained_model.layers:
    layer.trainable = False

In [None]:
# Model / data parameters
num_classes = 26

# Helper function to create a model
def create_model():
  # Create a new model with the modified pre-trained model and the new classification layer
  new_model = tf.keras.Sequential([
      pretrained_model,
      # layers.Dense(1024, activation='relu'),
      # layers.Dropout(0.5),
      # layers.Dense(512, activation='relu'),
      layers.Dropout(0.3),
      layers.Dense(num_classes, activation="softmax")
  ])

  # Compile the new model
  new_model.compile(optimizer=Adam(learning_rate=0.0005), loss="categorical_crossentropy", metrics=["accuracy"])

  return new_model


## Decisions

Similar to training a model, we had to make some decisions for the k-fold cross validation.

We settled on 5 folds, 15 epochs, and a batch_size of 8 because the combination produced the best results. We tried out other hyperparameters but could not match the high accuracy produced by this combination.

For the folds specifically, we did try 10 folds and 15 folds. Since the dataset is so small, however, we settled on 5 folds as it was sufficient enough.

In [None]:
k = 5  # Number of folds
epochs = 15
batch_size = 8

kfold = KFold(n_splits=k, shuffle=True, random_state=42)

scores = []
fold = 1
best_val_accuracy = 0

new_ds_best_model_name = f"new_ds_model_k_fold_cross_val_ep_{epochs}_bs_{batch_size}_adam"
best_model_path = os.path.join(models_dir, f'{new_ds_best_model_name}.h5')

for train_index, val_index in kfold.split(train_images, train_labels):
    print(f"Fold {fold}/{k}")
    X_train, X_val = train_images[train_index], train_images[val_index]
    y_train, y_val = train_labels[train_index], train_labels[val_index]

    model = create_model()

    checkpoint = ModelCheckpoint(f"best_model_fold_{fold}.h5", monitor='val_accuracy', save_best_only=True, mode='max')

    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=epochs, batch_size=batch_size, callbacks=[checkpoint])

    # Load the best model for the current fold
    best_fold_model = keras.models.load_model(f"best_model_fold_{fold}.h5")
    val_score = best_fold_model.evaluate(X_val, y_val, verbose=0)
    print(f"Best validation accuracy for fold {fold}: {val_score[1]}")

    # Maintain the best model in terms of validation accuracy across all folds.
    if val_score[1] > best_val_accuracy:
        best_val_accuracy = val_score[1]
        best_fold_model.save(best_model_path)

    scores.append(val_score[1])
    fold += 1

average_val_accuracy = np.mean(scores)
print(f"Average validation accuracy across {k} folds: {average_val_accuracy}")

In [None]:
# Load the best model
best_model = keras.models.load_model(best_model_path)

# Evaluate our model on the test set
test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f'Model - Test accuracy: {test_acc}')

Performance comparison:

K-fold cross validation obtains 91% accuracy on the test set, which is roughly identical to train_test accuracy (92%).

## Answer to question 10:
After removing the output layer from pretrained model and training a new model on the new dataset, we could achieve accuracy of nearly 91% on test set with 5-fold cross validation and train-test split techniques. As a result, the fine-tuned model is improved by approximately 10% compared to using only the pre-trained model.

## Question 11:
Compare the performance of the model you built in step (3) with the performance of a brand-new model trained only on the Binary AlphaDigits dataset.

-

Build a brand new model and train it on the Binary AlphaDigits dataset and evaluate it using 5-fold cross validation


In [None]:
#code "to load dataset" included in case dataset not already loaded in notebook
with np.load(os.path.join(data_dir, "binaryalphadigs.npz")) as f:
  (ad_images, ad_labels) = f["images"], f["labels"]

# Remove first class (index 0)
ad_labels = np.delete(ad_labels, 0, axis=1)

In [None]:
def preprocess_image(target_image):
    reshaped_img = tf.reshape(tf.convert_to_tensor(target_image, dtype=tf.float32), (20, 16, 1))
    return tf.image.resize_with_pad(
        image=reshaped_img,
        target_height=28,
        target_width=28,
        method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
    )

def preprocess_batch(images_batch):
    return tf.map_fn(preprocess_image, images_batch, dtype=tf.float32)


# Convert the dataset to TensorFlow tensors
ad_images = tf.convert_to_tensor(ad_images, dtype=tf.float32)

# resize the images
ad_images_processed = preprocess_batch(ad_images)

# Convert the dataset to numpy array
ad_images_processed = ad_images_processed.numpy()

In [None]:
# Split data into 80% train and 20% test subsets
train_images, test_images, train_labels, test_labels = train_test_split(ad_images_processed, ad_labels, test_size=0.2, random_state=42)

In [None]:
# Model / data parameters
num_classes = 26
input_shape = (28, 28, 1)

# Helper function to create a model
def create_model():
  ad_only_layers = tf.keras.Sequential(
    [keras.Input(shape=input_shape),
    layers.Conv2D(64, kernel_size=(5, 5), activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dropout(0.5),
    layers.Dense(1024, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(512, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation="softmax")]
    )

  # Compile the new model
  ad_only_layers.compile(optimizer=Adam(learning_rate=0.001), loss="categorical_crossentropy", metrics=["accuracy"])

  return ad_only_layers


In [None]:
k = 5  # Number of folds
epochs = 10
batch_size = 8

kfold = KFold(n_splits=k, shuffle=True, random_state=42)
scores = []
fold = 1
best_val_accuracy = 0

new_ds_best_model_name = f"new_ds_model_k_fold_cross_val_ep_{epochs}_bs_{batch_size}_adam_without_transfer_learning"
best_model_path = os.path.join(models_dir, f'{new_ds_best_model_name}.h5')

for train_index, val_index in kfold.split(train_images, train_labels):
    print(f"Fold {fold}/{k}")
    X_train, X_val = train_images[train_index], train_images[val_index]
    y_train, y_val = train_labels[train_index], train_labels[val_index]

    model = create_model()

    checkpoint = ModelCheckpoint(f"best_model_fold_{fold}.h5", monitor='val_accuracy', save_best_only=True, mode='max')

    model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=epochs, batch_size=batch_size, callbacks=[checkpoint])

    # Load the best model for the current fold
    best_fold_model = keras.models.load_model(f"best_model_fold_{fold}.h5")
    val_score = best_fold_model.evaluate(X_val, y_val, verbose=0)
    print(f"Best validation accuracy for fold {fold}: {val_score[1]}")

    # Maintain the best model in terms of validation accuracy across all folds.
    if val_score[1] > best_val_accuracy:
        best_val_accuracy = val_score[1]
        best_fold_model.save(best_model_path)

    scores.append(val_score[1])
    fold += 1

average_val_accuracy = np.mean(scores)
print(f"Average validation accuracy across {k} folds: {average_val_accuracy}")

In [None]:
# Load the best model
best_model = keras.models.load_model(best_model_path)

# Evaluate our model on the test set-
test_loss, test_acc = best_model.evaluate(test_images, test_labels)
print(f'Model - Test accuracy: {test_acc}')

## Answer to question 11:
As can be observed, the brand-new model trained solely on the Binary AlphaDigits dataset achieves a test-set accuracy of nearly 83%. As a result, the brand new model outperforms the pre-trained model by 2%. However, the accuracy of the brand new model is around 8% lower than that of the fine-tuned model, which is based on the pre-trained model.