We start by loading the *MNIST* dataset using the `mnist` class from the `keras.datasets` module. We will later build our model using the `Sequential()` class of the `keras.models` module. To do this, we will add dense layers to our `model` object.

In [1]:
import warnings
warnings.filterwarnings("ignore")

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras import optimizers

We will now work on a famous optical character recognition dataset called *MNIST*. This dataset consists of 70,000 grayscale images of handwritten digits. In the following, we'll load the dataset and do some data preprocessing. As we'll see, each image is represented as 28x28 pixel data. This is a two-dimensional vector. We'll convert this to a vector of length 784, which will be single-dimensional. We'll also normalize each vector by dividing each element by 255 (which is the maximum value of the RGB color scale).

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

input_dim = 784  # 28*28
output_dim = nb_classes = 10
batch_size = 128
nb_epoch = 20

X_train = X_train.reshape(60000, input_dim)
X_test = X_test.reshape(10000, input_dim)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Next, we'll *one-hot code* our *target variable* using the `to_categorical()` function from the `keras.utils` module:

In [3]:
Y_train = to_categorical(y_train, nb_classes)
Y_test = to_categorical(y_test, nb_classes)

### Q1. In this task, you'll implement several ANN models with different batch sizes. Specifically, do the following:

In [11]:
def compile_and_fit_model(batch_size, lr):
  model = Sequential()
  # The first dense layer
  model.add(Dense(128, input_shape=(784,), activation="relu"))
  # The second dense layer
  model.add(Dense(64, activation="relu"))
  # The last layer is the output layer
  model.add(Dense(10, activation="softmax"))

  # Compile
  model.compile(optimizer=optimizers.SGD(learning_rate=lr), loss='categorical_crossentropy',
              metrics=['accuracy'])

  # Train for 20 epochs
  # Setting `verbose=1` prints out some results after each epoch
  model.fit(X_train, Y_train, batch_size=batch_size, epochs=20, verbose=1)

  return model

In [8]:
def evaluate_model():
  score = model.evaluate(X_test, Y_test, verbose=0)
  print('Test score:', score[0])
  print('Test accuracy:', score[1])

a. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use 8 as the mini-batch size.

In [12]:
model = compile_and_fit_model(8, 0.1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Next, let's review the accuracy and score of our model.

In [13]:
evaluate_model()

Test score: 0.13997013866901398
Test accuracy: 0.9782999753952026


### b. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use 128 as the mini-batch size.

In [14]:
model = compile_and_fit_model(128, 0.1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


As before, let's review the accuracy and score of our model.

In [15]:
evaluate_model()

Test score: 0.07275497168302536
Test accuracy: 0.9771000146865845


### c. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use the full sample as the mini-batch size.

In [16]:
X_train.shape

(60000, 784)

In [17]:
model = compile_and_fit_model(X_train.shape[0], 0.1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Once again, let's review the accuracy and score of our model.

In [18]:
evaluate_model()

Test score: 1.2646130323410034
Test accuracy: 0.7336000204086304


### d. Compare the results of each model. Which batch size performed best?

The best performance in the training and test sets is achieved using 8 as the mini-batch size. However, the difference between the scores of 128 and 8 as the mini-batch size are not significant. Therefore, it would be worthwhile to use 128 as the mini-batch size to reduce computational costs. Passing in the full sample as the mini-batch size gives poor results.

### Q2. In this task, you'll implement several ANN models with different learning rates for the stochastic gradient descent. In all of the models below, use 128 as your mini-batch size.

### a. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use 0.01 as the learning rate.

In [19]:
model = compile_and_fit_model(128, 0.01)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


As before, let's review the accuracy and score of our model.

In [20]:
evaluate_model()

Test score: 0.164168119430542
Test accuracy: 0.9503999948501587


### b. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use 100 as the learning rate.

In [21]:
model = compile_and_fit_model(128, 100)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Once again, let's review the accuracy and score of our model.

In [22]:
evaluate_model()

Test score: 37.32565689086914
Test accuracy: 0.11349999904632568


### c. Implement a three-layer ANN model with 128, 64, and 10 neurons in the layers. Use 0.0000001 as the learning rate.

In [23]:
model = compile_and_fit_model(128, 0.0000001)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Next, let's review the accuracy and score of our model.

In [24]:
evaluate_model()

Test score: 2.2973341941833496
Test accuracy: 0.12099999934434891


### d. Compare the results of each model. Which learning rate performed best?

The learning rate of `0.1` was a feasible choice for the hyperparameter and converged quickly. The learning rate of `100` was a poor choice for the hyperparameter and the model diverged. The learning rate of `0.0000001` was also a poor choice for the hyperparameter as the model did not converge after 20 epochs.