We start by loading the *MNIST* dataset using the `mnist` class from the `keras.datasets` module. We will later build our model using the `Sequential()` class of the `keras.models` module. To do this, we will add dense layers to our `model` object.

In [1]:
import warnings
warnings.filterwarnings("ignore")

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

We will now work on a famous optical character recognition dataset called *MNIST*. This dataset consists of 70,000 grayscale images of handwritten digits. In the following, we'll load the dataset and do some data preprocessing. As we'll see, each image is represented as 28x28 pixel data. This is a two-dimensional vector. We'll convert this to a vector of length 784, which will be single-dimensional. We'll also normalize each vector by dividing each element by 255 (which is the maximum value of the RGB color scale).

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

input_dim = 784  # 28*28
output_dim = nb_classes = 10
batch_size = 128
nb_epoch = 20

X_train = X_train.reshape(60000, input_dim)
X_test = X_test.reshape(10000, input_dim)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Next, we'll *one-hot code* our *target variable* using the `to_categorical()` function from the `keras.utils` module:

In [3]:
Y_train = to_categorical(y_train, nb_classes)
Y_test = to_categorical(y_test, nb_classes)

### Q1. Build an ANN and train and test it using the MNIST data. This ANN should consist of two hidden layers and one output layer. All of the hidden layers should be dense. The first layer and the second layer should have neuron sizes of 32 and 16, respectively. Train this model for 20 epochs, and compare our training and test set performance with the previous parameters. Is there any difference? If so, why?

Now we will create two dense layers of size 32 and 16. The third layer will contain a `softmax` activation function which computes the probability of being in one of the ten different number classes.

In [4]:
model = Sequential()
# The first dense layer
model.add(Dense(32, input_shape=(784,), activation="relu"))
# The second dense layer
model.add(Dense(16, activation="relu"))
# The last layer is the output layer
model.add(Dense(10, activation="softmax"))

We can summarize our dense layers:

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 32)                25120     
                                                                 
 dense_1 (Dense)             (None, 16)                528       
                                                                 
 dense_2 (Dense)             (None, 10)                170       
                                                                 
Total params: 25818 (100.85 KB)
Trainable params: 25818 (100.85 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Next, we will compile our model.

In [6]:
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

Then train our model for 20 epochs.

In [7]:
# Setting `verbose=1` prints out some results after each epoch
model.fit(X_train, Y_train, batch_size=batch_size, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x7d7e75a361d0>

We see that our model got to slightly below 95% accuracy. With two dense layers of neuron size 128, it achieved about 97% accuracy. As a final step, we will evaluate our model on the test set.

In [8]:
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.20107264816761017
Test accuracy: 0.9416999816894531


The model achieved about 94% accuracy in both the training and the test set. This is worse peformance than our previous model with two dense layers and 1028 neurons. As the layers include less number of neurons, the model is simpler than the one previously and cannot learn complex relationships between spatial features in the input data. This resulted in a lower performance. It seems, MNIST data requires more neurons in the intermediate layers.

### Q2. In this task, build another ANN. This ANN should have five hidden layers and one output layer. All of the layers should be dense. The neuron numbers for the hidden layers should be 1024, 512, 256, 128, and 64. Train this model for 20 epochs, and test it using the same data from the previous task. Then compare the results. Is there any difference? If so, why?

First, we will create five dense layers of size 1024, 512, 256, 128, and 64. The sixth layer will contain a softmax activation function which computes the probability of being in one of the ten different number classes.

In [10]:
model = Sequential()
# The first dense layer
model.add(Dense(1024, input_shape=(784,), activation="relu"))
# The second dense layer
model.add(Dense(512, activation="relu"))
# The third dense layer
model.add(Dense(256, activation="relu"))
# The fourth dense layer
model.add(Dense(128, activation="relu"))
# The fifth dense layer
model.add(Dense(64, activation="relu"))
# The last layer is the output layer
model.add(Dense(10, activation="softmax"))

We can summarize our dense layers:

In [11]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 1024)              803840    
                                                                 
 dense_10 (Dense)            (None, 512)               524800    
                                                                 
 dense_11 (Dense)            (None, 256)               131328    
                                                                 
 dense_12 (Dense)            (None, 128)               32896     
                                                                 
 dense_13 (Dense)            (None, 64)                8256      
                                                                 
 dense_14 (Dense)            (None, 10)                650       
                                                                 
Total params: 1501770 (5.73 MB)
Trainable params: 1501

As before, we will compile our model.

In [12]:
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])

Then train our model for 20 epochs.

In [13]:
model.fit(X_train, Y_train, batch_size=batch_size, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x7d7e58e58220>

We see that our model got to slightly below 99% accuracy. With five dense layers of neuron size 1024, 512, 256, 128, and 64, it achieved about 99% accuracy. As a final step, we will evaluate our model on the test set.

In [14]:
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.08893519639968872
Test accuracy: 0.9728999733924866


In this case, the model achieved almost 99% accuracy in the training set and 98% accuracy in the test set. The model here is more complex than the model of the previous task and the model previously. Because this model contains more neurons it the intermediate layers, it achieved higher performance in the training set. The additional intermediate dense layers allowed this model to learn complex relationships between spatial features. However, the difference between the training score and test score widened a little bit. It may be a sign that this model has started to overfit.