# **Assignment 5**

Import the MNIST dataset from Keras.
We are going to use the following methods to perform digit classification on this dataset:

*   Logistic Regression
*   Naive Bayes
*   Random Forest
*   Dense Neural Network
*   Convolutional Neural Network

For each method, choose just one specific initial configuration (that is, one value for any hyperparameters, layer configuration, optimizer, etc) and train them for digit classification.

Evaluate all 5 methods on precision, recall, and accuracy, and compare the results of the 5 methods.

In [None]:
from tensorflow import keras

In [None]:
from keras.datasets import mnist

In [None]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from sklearn.metrics import classification_report, confusion_matrix

## **Logistic Regression**

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(10, input_dim=28 * 28, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7ff6082a79a0>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 958    0    3    1    0    4   10    1    3    0]
 [   0 1106    2    2    0    2    4    2   17    0]
 [   7    5  920   15   14    1   13   15   37    5]
 [   4    1   25  915    0   25    2   12   18    8]
 [   1    4    6    1  911    0   10    2    9   38]
 [  11    3    7   38   10  763   14    6   33    7]
 [  12    3    6    1   12   12  909    1    2    0]
 [   3   15   26    7    8    0    0  935    2   32]
 [   7   10    9   26    9   25   12   12  851   13]
 [  11    9    4   11   42    8    0   21    6  897]]
Classification Report:
               precision    recall  f1-score   support

           0       0.94      0.98      0.96       980
           1       0.96      0.97      0.97      1135
           2       0.91      0.89      0.90      1032
           3       0.90      0.91      0.90      1010
           4       0.91      0.93      0.92       982
           5       0.91      0.86      0.88       892
           6       0.93      0.95      0.94    

## **Naive Bayes**

In [None]:
from sklearn.naive_bayes import GaussianNB

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = GaussianNB()

In [None]:
model.fit(x_train, y_train)

In [None]:
y_pred = model.predict(x_test)

In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Confusion Matrix:
 [[ 870    0    3    5    2    5   31    1   35   28]
 [   0 1079    2    1    0    0   10    0   38    5]
 [  79   25  266   91    5    2  269    4  271   20]
 [  32   39    6  353    2    3   51    8  409  107]
 [  19    2    5    4  168    7   63    7  210  497]
 [  71   25    1   20    3   44   40    2  586  100]
 [  12   12    3    1    1    7  895    0   26    1]
 [   0   15    2   10    5    1    5  280   39  671]
 [  13   72    3    7    3   11   12    4  648  201]
 [   5    7    3    6    1    0    1   13   18  955]]
Classification Report:
               precision    recall  f1-score   support

           0       0.79      0.89      0.84       980
           1       0.85      0.95      0.90      1135
           2       0.90      0.26      0.40      1032
           3       0.71      0.35      0.47      1010
           4       0.88      0.17      0.29       982
           5       0.55      0.05      0.09       892
           6       0.65      0.93      0.77    

## **Random Forest**

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = RandomForestClassifier(n_estimators=100)

In [None]:
model.fit(x_train, y_train)

In [None]:
y_pred = model.predict(x_test)

In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Confusion Matrix:
 [[ 969    0    0    0    0    1    4    1    4    1]
 [   0 1123    2    3    0    1    3    0    2    1]
 [   6    0  999    4    2    0    4   10    7    0]
 [   0    0   11  974    0    7    0    9    6    3]
 [   1    0    2    0  955    0    5    1    3   15]
 [   3    1    1   14    3  858    5    1    5    1]
 [   5    3    1    1    3    4  936    0    5    0]
 [   1    6   18    1    2    0    0  989    3    8]
 [   3    0    4    9    4    4    7    3  932    8]
 [   4    6    1   10   13    3    1    5    5  961]]
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.97      0.96      1032
           3       0.96      0.96      0.96      1010
           4       0.97      0.97      0.97       982
           5       0.98      0.96      0.97       892
           6       0.97      0.98      0.97    

## **Dense Neural Network**

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=28 * 28, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7ff609ef8970>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 966    0    1    1    0    5    4    2    1    0]
 [   0 1117    3    2    0    1    3    2    7    0]
 [   7    1  988    9    5    1    5    7    8    1]
 [   0    1   13  968    0    9    0   10    7    2]
 [   2    1    6    0  925    0    9    3    3   33]
 [   8    2    1   22    4  831   10    1    8    5]
 [  11    3    3    0    6   10  922    1    2    0]
 [   2    9   22    6    1    1    0  974    0   13]
 [   4    3    6   17    6    4   12    9  909    4]
 [   7    7    1   16   19    5    1   11    2  940]]
Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.99      0.97       980
           1       0.98      0.98      0.98      1135
           2       0.95      0.96      0.95      1032
           3       0.93      0.96      0.94      1010
           4       0.96      0.94      0.95       982
           5       0.96      0.93      0.94       892
           6       0.95      0.96      0.96    

## **Convolutional Neural Network**

In [None]:
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7ff608d9e250>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 971    0    1    1    0    1    3    1    2    0]
 [   0 1125    2    1    0    1    4    0    2    0]
 [   6    3  990    3    6    0    5    6    7    6]
 [   0    0    1  984    0    9    0    8    4    4]
 [   1    0    3    0  956    0    6    2    2   12]
 [   6    1    0    6    0  868    5    1    3    2]
 [  10    3    2    1    2    5  932    2    1    0]
 [   2    3   19    9    1    1    0  980    3   10]
 [  12    0    7   10    5    6    2    4  919    9]
 [  11    5    1    7   10    4    0   12    3  956]]
Classification Report:
               precision    recall  f1-score   support

           0       0.95      0.99      0.97       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.96      0.96      1032
           3       0.96      0.97      0.97      1010
           4       0.98      0.97      0.97       982
           5       0.97      0.97      0.97       892
           6       0.97      0.97      0.97    

## **Summary**

*Logistic Regression*:
*   Accuracy: 0.92
*   Precision: 0.92
*   Recall: 0.92

*Naive Bayes*:
*   Accuracy: 0.56
*   Precision: 0.69
*   Recall: 0.56

*Random Forest*:
*   Accuracy: 0.97
*   Precision: 0.97
*   Recall: 0.97

*Dense Neural Network*:
*   Accuracy: 0.95
*   Precision: 0.95
*   Recall: 0.95

*Convolutional Neural Network*:
*   Accuracy: 0.97
*   Precision: 0.97
*   Recall: 0.97
<br><br>

### **Conclusion**:

Random Forest and Convolutional Neural Network are the models that perform best. Dense Neural Network and Logistic Regression perform quite well too. These 4 models have an equal precision, recall, and accuracy, which means that the model is correctly identifying all the positive and negative samples in the dataset.
Finally, the Naive Bayes model doesn't perform very well. This can be explained by the fact that it is a simple probabilistic model that does not require optimization through gradient descent, i.e. no hyperparameters.

## **Optimized Random Forest**

Choose one or more hyperparameters for the Random Forest method and use grid search cross-validation to identify the optimal values for those hyperparameters.

Evaluate the model's performance when using the optimal hyperparameter values, and comment on how the performance metrics have changed relative to your initial configuration.

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = RandomForestClassifier()

In [None]:
param_grid = {'n_estimators': [50, 100, 150, 200],
              'max_depth': [10, 20, 30, None]}

In [None]:
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3)
grid_search.fit(x_train, y_train)

In [None]:
print("Optimal Hyperparameters: ", grid_search.best_params_)

Optimal Hyperparameters:  {'max_depth': None, 'n_estimators': 150}


In [None]:
model = RandomForestClassifier(n_estimators=grid_search.best_params_['n_estimators'], max_depth=grid_search.best_params_['max_depth'])
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Confusion Matrix:
 [[ 969    0    0    0    0    2    3    1    4    1]
 [   0 1124    2    3    1    2    2    0    1    0]
 [   6    0  998    6    3    1    4    9    5    0]
 [   0    0    9  974    0    6    0    9   10    2]
 [   1    0    2    0  958    0    5    0    2   14]
 [   3    0    2   10    2  865    2    2    4    2]
 [   5    3    1    0    5    4  936    0    4    0]
 [   1    3   19    0    0    0    0  992    3   10]
 [   4    0    3    9    3    5    4    4  932   10]
 [   5    5    2   12   10    2    1    5    4  963]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.97      0.96      1032
           3       0.96      0.96      0.96      1010
           4       0.98      0.98      0.98       982
           5       0.98      0.97      0.97       892
           6       0.98      0.98      0.98    

### *Observations*
We can see that the precision, accuracy and recall didn't change after optimizing the hyperparameters. It could mean that the default hyperparameters used by the algorithm are already optimal for the given dataset. In other words, the model did not need any fine-tuning with hyperparameters.

##**Dense Neural Network - playing with parameters**

Using just the Dense Neural Network, vary each of the following and (1) comment on how each variation changes the accuracy or precision of the dense neural network results and/or varies the training time and (2) comment on why this result might be expected.  For each of the following, you only need to use one alternate value.
- Number of epochs
- Batch size
- Number of dense layers
- Change the optimizer from "rmsprop" to "sgd" (i.e. to stochastic gradient descent)

### *Changing the number of epochs*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=28 * 28, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=15, batch_size=32, verbose=1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7f8d75e90730>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 963    0    1    2    0    4    5    3    1    1]
 [   0 1118    3    2    0    1    4    1    6    0]
 [   5    1  998    4    4    1    4    7    6    2]
 [   0    0    9  968    0    7    1    8   12    5]
 [   1    0    8    0  947    0    3    2    2   19]
 [   7    1    1    9    2  852    8    1    6    5]
 [   9    3    3    0    7    9  923    0    4    0]
 [   1    8   14    5    2    1    0  983    2   12]
 [   3    2    4   11    4    6    9    7  925    3]
 [   6    7    1   11   16    3    2    9    3  951]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.98      0.98       980
           1       0.98      0.99      0.98      1135
           2       0.96      0.97      0.96      1032
           3       0.96      0.96      0.96      1010
           4       0.96      0.96      0.96       982
           5       0.96      0.96      0.96       892
           6       0.96      0.96      0.96    

### *Changing the batch size*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=28 * 28, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=16, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d78da1b20>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 971    0    1    0    0    3    2    1    2    0]
 [   0 1123    2    1    0    1    3    1    4    0]
 [   6    1  998    6    4    0    2    7    8    0]
 [   0    0    5  970    0   11    0   11    8    5]
 [   1    0    4    1  951    0    4    1    2   18]
 [   5    1    1   14    1  854    9    1    2    4]
 [   7    3    0    1    8    7  928    0    4    0]
 [   0   10    9    3    0    1    0  995    0   10]
 [   4    2    4   10    5    6    8    6  927    2]
 [   5    7    1    7   13    1    1    8    5  961]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.98      0.99      0.98      1135
           2       0.97      0.97      0.97      1032
           3       0.96      0.96      0.96      1010
           4       0.97      0.97      0.97       982
           5       0.97      0.96      0.96       892
           6       0.97      0.97      0.97    

### *Changing the number of dense layers*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=28 * 28, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d6679ba90>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 964    0    1    1    0    4    4    4    2    0]
 [   0 1121    4    0    0    1    4    1    4    0]
 [   5    3  994    8    5    0    4    8    4    1]
 [   0    1    6  982    0    1    1   10    5    4]
 [   1    0    7    1  944    0    6    2    2   19]
 [   9    2    0   25    2  824   12    3   10    5]
 [   9    3    1    1    3    5  930    1    5    0]
 [   0    9   15    1    2    0    0  991    1    9]
 [   4    1    2   15    6    6   10    9  917    4]
 [   5    7    1   12   13    0    2    9    2  958]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.98      0.98       980
           1       0.98      0.99      0.98      1135
           2       0.96      0.96      0.96      1032
           3       0.94      0.97      0.96      1010
           4       0.97      0.96      0.96       982
           5       0.98      0.92      0.95       892
           6       0.96      0.97      0.96    

### *Changing the optimizer*

In [None]:
from keras.optimizers import RMSprop

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28 * 28))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28 * 28))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=28 * 28, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=RMSprop(lr=0.001), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d643f4a00>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 971    0    2    1    1    0    1    1    3    0]
 [   0 1123    4    1    0    0    2    1    4    0]
 [   3    2 1006    4    2    0    3    3    9    0]
 [   0    0    3  992    0    2    0    5    6    2]
 [   3    0    4    0  958    0    4    2    1   10]
 [   2    0    0   19    0  851    6    3    9    2]
 [   6    3    1    1    2    2  938    0    5    0]
 [   1    3   10    4    1    0    0  999    3    7]
 [   5    0    3    2    5    3    2    3  948    3]
 [   5    3    0    8    8    4    0    5    1  975]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.97      0.97      0.97      1032
           3       0.96      0.98      0.97      1010
           4       0.98      0.98      0.98       982
           5       0.99      0.95      0.97       892
           6       0.98      0.98      0.98    

### *Observations*

**Increasing the number of epochs:**<br>
We increased the number of epochs from 10 to 15. We observe that the model takes more time to run which makes sense because of the nature of an epoch. Indeed, an epoch corresponds to the number of times the model is presented with the entire training dataset.<br>
We can also observe a slight increase in accuracy, precision and recall. It went from 0.95 to 0.96. However, it is important to understand that increasing the number of epochs won't necessarily increase the performance of the model as he model may overfit the data and perform poorly on new, unseen data.<br><br>

**Decreasing batch size:**<br>
We decreased the batch size from 32 to 16. We observe that the model takes more time to train which makes sense as a smaller batch size can lead to faster convergence but slower training time due to the increased number of iterations needed.<br>
We can also observe an increase in accuracy, precision, and recall. It went from 0.95 to 0.97, which makes sense as decreasing the batch size can lead to improvements in precision, accuracy, and recall because the smaller batches allow the model to learn more quickly from each individual example. <br><br>

**Increasing the number of dense layers:**<br>
We added an additional dense layer with 64 units and relu activation. We observe that the model takes a little bit more time to run. This can be explained by the fact that increasing the complexity of the model makes it more difficult to optimize during training.<br>
Moreover, we can observe a slight increase in accuracy, precision and recall. It went from 0.95 to 0.96. It's possible that the added complexity of the dense layer allows the model to learn more nuanced features and make more accurate predictions.<br><br>

**Changing the optimizer:**<br>
We changed the optimizer from SGD to RMSprop. We observe that it took longer to the model to train. Because RMSprop adapts the step sizes for each weight, it requires more computation than SGD, which uses a fixed step size. This can lead to longer training times, especially for larger models or datasets.<br>
Finally, we can see that the performance (accuracy, recall, precision) increased to 0.98, which is the highest score for the model. This can be explained by the fact that RMSprop provides more stable and consistent updates. Therefore, it could be that the optimization process is better able to find the optimal weights for the model. Additionally, the adaptive step sizes in RMSprop could help the model avoid getting stuck in local minima and find a more global optimum.

## **Convolutional Neural Network - playing with parameters**

Using just the Convolutional Neural Network, vary each of the following and (1) comment on how the variation changes the accuracy or precision of the neural network results and/or varies the training time  and (2) comment on why this result might be expected.  For each of the following, you only need to use one alternate value.
- Number of layers
- Number of filters
- Filter size
- Pool size
- Change MaxPooling2D to AveragePooling2D

### *Changing the number of layers*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d64370640>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 972    0    1    0    0    1    2    2    1    1]
 [   0 1130    5    0    0    0    0    0    0    0]
 [   1    2 1028    0    0    0    0    1    0    0]
 [   0    0    4  997    0    2    0    4    3    0]
 [   1    1    3    0  953    0    2    2    1   19]
 [   2    0    1    4    0  879    1    1    1    3]
 [   8    2    1    0    1    4  940    0    2    0]
 [   1    4   23    3    0    0    0  989    1    7]
 [   8    0   11    2    2    1    1    3  941    5]
 [   2    6    1    2    2    1    0    1    0  994]]
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.99      1.00      0.99      1135
           2       0.95      1.00      0.97      1032
           3       0.99      0.99      0.99      1010
           4       0.99      0.97      0.98       982
           5       0.99      0.99      0.99       892
           6       0.99      0.98      0.99    

### *Changing the number of filters*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d641efeb0>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 972    0    1    1    0    1    1    2    2    0]
 [   0 1123    1    2    1    1    4    0    3    0]
 [   8    5  980    4    6    0    5    7   16    1]
 [   1    0    1  989    1    5    0    7    5    1]
 [   1    0    2    0  969    0    2    1    2    5]
 [   5    1    0   11    0  858    5    3    8    1]
 [  12    3    1    1    4    6  925    1    5    0]
 [   1    3   17    7    2    1    0  990    3    4]
 [   7    0    4    7    4    3    0    4  942    3]
 [  11    5    1    8   16    4    0   15    6  943]]
Classification Report:
               precision    recall  f1-score   support

           0       0.95      0.99      0.97       980
           1       0.99      0.99      0.99      1135
           2       0.97      0.95      0.96      1032
           3       0.96      0.98      0.97      1010
           4       0.97      0.99      0.98       982
           5       0.98      0.96      0.97       892
           6       0.98      0.97      0.97    

### *Changing the filter size*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(32, (5, 5), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d641448b0>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 974    0    1    0    0    1    0    2    2    0]
 [   0 1125    3    3    0    1    1    0    2    0]
 [   1    5 1016    3    1    0    0    3    3    0]
 [   0    0    3  993    0    5    0    5    2    2]
 [   1    0    2    0  965    0    1    1    2   10]
 [   1    1    0    3    0  883    1    1    2    0]
 [   7    2    1    1    2    6  936    0    3    0]
 [   1    2   14    2    0    0    0 1001    3    5]
 [   6    0    7    4    3    4    1    4  940    5]
 [   7    6    1    2    6    3    0    7    1  976]]
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.97      0.98      0.98      1032
           3       0.98      0.98      0.98      1010
           4       0.99      0.98      0.99       982
           5       0.98      0.99      0.98       892
           6       1.00      0.98      0.99    

### *Changing the pool size*

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((1, 1)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d6401ec10>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 970    0    1    1    0    1    4    2    1    0]
 [   0 1121    3    1    0    1    5    1    3    0]
 [   2    6  992    2    4    1    2    6   14    3]
 [   0    0   10  968    1   10    0    7    6    8]
 [   1    0    4    0  954    0    3    3    2   15]
 [   4    1    1    6    0  853    8    2   12    5]
 [   8    3    2    0    5    7  930    0    3    0]
 [   1    4   20    4    1    1    0  981    3   13]
 [   5    1    7    7    4    2    4    5  933    6]
 [   7    5    1    3    7    4    0   11    3  968]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.98      0.99      0.99      1135
           2       0.95      0.96      0.96      1032
           3       0.98      0.96      0.97      1010
           4       0.98      0.97      0.97       982
           5       0.97      0.96      0.96       892
           6       0.97      0.97      0.97    

### *Changing MaxPooling2D to AveragePooling2D*

In [None]:
from keras.layers import AveragePooling2D

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train = x_train.reshape((60000, 28, 28, 1))
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1))
x_test = x_test.astype('float32') / 255

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(AveragePooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

In [None]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD(lr=0.01), metrics=['accuracy'])

  super().__init__(name, **kwargs)


In [None]:
model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f8d175617f0>

In [None]:
y_pred = model.predict(x_test)
y_pred_classes = np.argmax(y_pred, axis=1)



In [None]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_classes))
print("Classification Report:\n", classification_report(y_test, y_pred_classes))

Confusion Matrix:
 [[ 954    0    2    1    0    8    9    2    3    1]
 [   0 1118    2    1    0    2    3    2    7    0]
 [   3   10  944    9   11    4    8   10   28    5]
 [   0    0   23  929    0   23    1   10   14   10]
 [   1    2    4    0  934    0    7    2    5   27]
 [   5    2    1   14    5  835    6    2   16    6]
 [   8    3    8    0    9   18  905    2    5    0]
 [   1    7   24    4    8    1    0  949    1   33]
 [   4    7    7   15    9   19    5    7  894    7]
 [   6    6    1    6   20    8    1   15    9  937]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.97      0.97       980
           1       0.97      0.99      0.98      1135
           2       0.93      0.91      0.92      1032
           3       0.95      0.92      0.93      1010
           4       0.94      0.95      0.94       982
           5       0.91      0.94      0.92       892
           6       0.96      0.94      0.95    

### *Observations*

**Increasing the number of layers:**<br>
We added one more layer to the model. It took way longer for the model to train but the performance (accuracy, recall, precision) increased slightly. It went from 0.97 to 0.98. This makes sense because adding an additional layer can increase the model's capacity to learn more complex patterns and features in the input data. The additional layer may have learned to identify new and more intricate features that were not previously captured by the existing layers.
It is also possible that the additional layer helped to regularize the model, preventing overfitting by adding a more complex and robust structure. However, this increased capacity comes at the cost of additional computational resources and longer training times.<br><br>

**Increasing the number of filters:**<br>
We have increased the number of filters in the Conv2D layer from 32 to 64. The model took longer to run which makes sense because increasing the number of filters in a Conv2D layer typically results in an increase in the number of computations that need to be performed during the forward pass of the neural network. This increase in computation can lead to longer training times.<br>
We can also observe that the performance of the model (accuracy, precision and recall) has not changed. This suggests that the additional computation time did not lead to any significant improvement in the model's performance. This is likely due to the fact that the original model was already able to capture the important features in the input images with 32 filters.<br><br>

**Increasing the filter size:**<br>
We have changed the size of the filter in the Conv2D layer from (3,3) to (5,5). It took sligthly longer for the model to train but the performance (accuracy, recall, precision) increased slightly. It went from 0.97 to 0.98. This can be explained by the fact that a larger filter size adds more parameters and increases the receptive field of the convolutional layer. In other words, the layer can capture more spatial information from the input image and increases the model's capacity to learn complex features and patterns. However, it requires more computation to perform the convolution operation. This can increase the time it takes to train the model and make predictions.<br>
Nevertheless, it's important to note that increasing the filter size may also increase the risk of overfitting the model to the training data.<br><br>

**Decreasing the pool size:**<br>
We have changed the pool size of the MaxPooling2D layer from (2, 2) to (1, 1). The model took slightly longer to run because the pooling operation is being applied to each individual pixel, which slighlty increases the number of operations required.<br>
We also observe that accuracy, precision and recall remain the same. This is not surprising, since the MaxPooling2D layer is a non-parametric operation that does not modify the content of the feature maps, but rather aggregates information across neighboring pixels. Therefore, changing the filter size does not have a significant impact on the performance of the model.<br><br>

**Changing MaxPooling2D to AveragePooling2D:**<br>
We changed MaxPooling 2D to AveragePooling2D. We observed that the training time remains the same. Indeed, modern deep learning frameworks like TensorFlow are optimized to efficiently compute pooling operations, which can further reduce any differences in training time.<br>
We can also observe that the performance of the model (accuracy, precision, recall) decreased from 0.97 to 0.94, which is the lowest score for this model in our study. This can be explained by the fact that AveragePooling2D changed the way the model down-samples the feature maps. MaxPooling2D tends to preserve the most important features in the pooling regions, while AveragePooling2D tends to smooth out the feature maps and potentially lose some information. As a result of this change, the accuracy, recall, and precision of the model might have decreased because the model is no longer able to capture important features as effectively as it did with MaxPooling2D.