<a href="https://colab.research.google.com/github/matthewlai12/ECE-Lab/blob/main/ECE449Lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**ECE 449 Lab 3: Matthew Lai**
In this lab, our goal is to use Convolutional Neural Networks (CNNs) on a well known dataset. The dataset is the MNIST digits database that is distibuted through with Keras. We are going to design and parameterize a basic CNN using a stratified cross-validation experimental approach. We will then evaluate the performance of this CNN on a predetermined out-of-sample test set.

##Section 1: Data Collection
In this section we are simply gathering the data from Keras

In [1]:
from tensorflow.keras.datasets import mnist

# Load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

##Section 2: Data Preprocessing
In this section our aim is to one-hot encode the data and prepare it for model training.

In [3]:
from tensorflow.keras.utils import to_categorical

# One-hot encode our dataset
y_train_encoded = to_categorical(y_train)
y_test_encoded = to_categorical(y_test)

##Section 3: Model Training
In this section, our aim is to train our model on 4 different hyperparameters.
After we run them all, we will determine the best one and run it on the entire dataset again.


In [10]:
from tensorflow.keras import layers, models
import tensorflow
from sklearn.model_selection import StratifiedKFold
import numpy as np


# Set up our hyperparameters
hyperparameters = [
    {'filters': 16, 'learning_rate': 0.001},
    {'filters': 16, 'learning_rate': 0.01},
    {'filters': 32, 'learning_rate': 0.001},
    {'filters': 32, 'learning_rate': 0.01}
]

for params in hyperparameters:
  # Model setup
  model = models.Sequential()

  # Define filters and convolutional layers here

  model.add(layers.Conv2D(filters=params['filters'], kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))

  # Add a maxpooling layer
  model.add(layers.MaxPooling2D(pool_size=(2, 2)))

  # Flatten the output and give it to a fully conencted layer
  model.add(layers.Flatten())

  # One hidden layer maps the flattened neurons to output
  model.add(layers.Dense(10, activation='softmax'))
  optimizer = tensorflow.keras.optimizers.Adam(learning_rate=params['learning_rate'])
  model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

  # Train the model using stratified 5-fold cross validation

  skf = StratifiedKFold(n_splits=5)
  accuracies = []
  for train_index, val_index in skf.split(X_train, y_train):
      X_train_fold, X_val_fold = X_train[train_index], X_train[val_index]
      y_train_fold, y_val_fold = y_train_encoded[train_index], y_train_encoded[val_index]

      # Train the model
      model.fit(X_train_fold, y_train_fold, epochs=10, batch_size=128, verbose=1)

      # Evaluate on validation fold
      val_loss, val_accuracy = model.evaluate(X_val_fold, y_val_fold, verbose=0)
      accuracies.append(val_accuracy)

  print(f'Hyperparameters: {params}, Cross-validation accuracies: {accuracies}, Mean accuracy: {np.mean(accuracies)}')


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 35ms/step - accuracy: 0.7995 - loss: 6.6447
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 34ms/step - accuracy: 0.9544 - loss: 0.3912
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 34ms/step - accuracy: 0.9704 - loss: 0.1779
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 33ms/step - accuracy: 0.9768 - loss: 0.1103
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 41ms/step - accuracy: 0.9814 - loss: 0.0767
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 31ms/step - accuracy: 0.9840 - loss: 0.0605
Epoch 7/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 31ms/step - accuracy: 0.9853 - loss: 0.0509
Epoch 8/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 33ms/step - accuracy: 0.9901 - loss: 0.0322
Epoch 9/10
[1m375/375[

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 36ms/step - accuracy: 0.7530 - loss: 17.0274
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 35ms/step - accuracy: 0.9363 - loss: 0.2087
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 34ms/step - accuracy: 0.9501 - loss: 0.1678
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 37ms/step - accuracy: 0.9567 - loss: 0.1414
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 33ms/step - accuracy: 0.9587 - loss: 0.1298
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 31ms/step - accuracy: 0.9595 - loss: 0.1298
Epoch 7/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 33ms/step - accuracy: 0.9603 - loss: 0.1301
Epoch 8/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 32ms/step - accuracy: 0.9647 - loss: 0.1162
Epoch 9/10
[1m375/375[0m [32m━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 50ms/step - accuracy: 0.8277 - loss: 3.8659
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 45ms/step - accuracy: 0.9702 - loss: 0.1672
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 45ms/step - accuracy: 0.9791 - loss: 0.0860
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 48ms/step - accuracy: 0.9867 - loss: 0.0467
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 49ms/step - accuracy: 0.9897 - loss: 0.0335
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 52ms/step - accuracy: 0.9917 - loss: 0.0264
Epoch 7/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 48ms/step - accuracy: 0.9916 - loss: 0.0261
Epoch 8/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 51ms/step - accuracy: 0.9916 - loss: 0.0260
Epoch 9/10
[1m375/375[0m [32m━━━

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 46ms/step - accuracy: 0.7729 - loss: 20.8027
Epoch 2/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 46ms/step - accuracy: 0.9480 - loss: 0.1664
Epoch 3/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 47ms/step - accuracy: 0.9582 - loss: 0.1353
Epoch 4/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 47ms/step - accuracy: 0.9609 - loss: 0.1279
Epoch 5/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 47ms/step - accuracy: 0.9615 - loss: 0.1268
Epoch 6/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 48ms/step - accuracy: 0.9653 - loss: 0.1154
Epoch 7/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 49ms/step - accuracy: 0.9633 - loss: 0.1199
Epoch 8/10
[1m375/375[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 48ms/step - accuracy: 0.9646 - loss: 0.1234
Epoch 9/10
[1m375/375[0m [32m━━

Now that we have run the 4 hyper parameters, we are able to look and locate which hyperparameter performed the best. We will now run this model on the entire dataset and then the test set for evaluation.

In [11]:
# Define the best hyperparameters from cross-validation
filter_hyper = 32
learning_rate_hyper = 0.001

# Create our final CNN
model = models.Sequential()
model.add(layers.Conv2D(filters=filter_hyper, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
optimizer = tensorflow.keras.optimizers.Adam(learning_rate=learning_rate_hyper)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train_encoded, epochs=10, batch_size=128, verbose=1)

# Test the model
test_loss, test_accuracy = model.evaluate(X_test, y_test_encoded, verbose=1)

# Output the test accuracy
print(f'Accuracy: {test_accuracy * 100:.2f}%')


Epoch 1/10


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 46ms/step - accuracy: 0.8463 - loss: 3.9948
Epoch 2/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 45ms/step - accuracy: 0.9678 - loss: 0.2074
Epoch 3/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 43ms/step - accuracy: 0.9799 - loss: 0.0894
Epoch 4/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 44ms/step - accuracy: 0.9842 - loss: 0.0624
Epoch 5/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 46ms/step - accuracy: 0.9886 - loss: 0.0374
Epoch 6/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 48ms/step - accuracy: 0.9901 - loss: 0.0324
Epoch 7/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 47ms/step - accuracy: 0.9906 - loss: 0.0281
Epoch 8/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 48ms/step - accuracy: 0.9903 - loss: 0.0299
Epoch 9/10
[1m469/469[0m [32m━━━

##Section 4: Model Evaluation
After running our 4 hyper parameters, we chose to run the model on the dataset again, this time with our best performing hyperparameters. We then ran it on our test set for the following results.

Our best performing hyperparameter was:
32 Filters
0.001 Learning rate

Training Set Evaluation:

Cross-validation accuracies: [0.9741666913032532, 0.9856666922569275, 0.9872499704360962, 0.9890000224113464, 0.9961666464805603]

Mean accuracy: 0.9864500045776368

Test Set Evaluation:

Test Accuracy: 97.58% or 0.9758

##Section 5: Final Report
The architecture of my CNN is closely related to the model built in the lab session, where we set and explore various settings in relation to the model. In the lab session, we discussed adding filters (variable in hyperparameters) and convolutional layers. We defined pooling layers (2x2) to help reduce the spacial dimentions of the data. Next we flattened the data to prepare it for the connected layers. We then set the output layer using softmax activation.

The parameter exploration I used was to make 4 hyperparameters. This was also outlined in the lab session, where we were given the hyperparameters we needed to test. For every combination, I ran the hyperparameteres on the model. With 2 sets of 2 parameters, we had a total of 4 hyperparameters. We used 5-fold cross validation on the model to ensure thourough testing.

In the end I ended up choosing the combination with 32 filters, and a 0.001 learning rate as this is the model that performed the best. We ended up with an accuracy of 0.9758. The report I chose to compare with had an accuracy of 0.9891. Our values were similar, however his was greater by about 1.5%. Our CNNs were similar but had a few differences in the way they were parameterized. Overall I am happy with the performance of my CNN.

In this lab, we explored training a CNN model on the MNIST digits dataset. We explored setting up the model using 4 different hyperparameters accross 5-fold cross validation. After we determined the best model, we ran it again on the training dataset as well as the test set. Finally we discussed our methodology, our training results, our testing results, and compared it to another CNN model that also used the MNIST.

##References:

1. This is the article I chose to compare my CNN against. It is a article written on medium that clearly showcased the process as well as the results. https://prasad-jayanti.medium.com/image-classification-with-mnist-data-286003b056cb

2. ECE 449 Lab 3 Material (Lab pdf, Lab ppt)

3. ChatGPT used for syntax errors as well as breaking down concepts.