In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

from keras.models import Sequential
from keras.datasets import mnist
from keras.layers import Dense
from keras.utils import to_categorical

In [None]:
def plot_loss_curve(history):
    plt.figure(figsize=[8,6])
    plt.plot(history.history['loss'], linewidth=2)
    plt.plot(history.history['val_loss'], linewidth=2)
    plt.legend(['Training Loss', 'Validation Loss'], fontsize=14)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Loss', fontsize=16)
    plt.title('Loss Curve', fontsize=18)

In [None]:
def plot_accuracy_curve(history):
    plt.figure(figsize=[8,6])
    plt.plot(history.history['accuracy'], linewidth=2)
    plt.plot(history.history['val_accuracy'], linewidth=2)
    plt.legend(['Training Accuracy', 'Validation Accuracy'], fontsize=14)
    plt.xlabel('Epoch', fontsize=16)
    plt.ylabel('Accuracy', fontsize=16)
    plt.title('Accuracy Curve', fontsize=18)

### Import the data

We will build a model for recognizing handwritten digits, 0 to 9. The MNIST dataset is a common dataset used to showcase machine learning and deep learning methods. 

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Let's look at the shape of our data.

In [None]:
X_train.shape, X_test.shape

We see that we have 60k samples in the training dataset and 10k in the test dataset. Each image is the size of 28x28 pixels.

Let's look at what some of these images are like. Change the index below to look at different training samples.

In [None]:
idx = 10
plt.figure(figsize=[6, 6])
plt.imshow(X_train[idx,:,:])
plt.title(y_train[idx])
plt.show()

### Preprocess the data

The way we feed the data into the neural network is by "flattening" the images so that instead of the matrix format they are in the array format.

In [None]:
n_features = np.prod(X_train.shape[1:])

In [None]:
n_features

We now need to transform our images so that instead of a 28x28 matrix we have a 784-long array.

In [None]:
X_train = X_train.reshape(X_train.shape[0], n_features)
X_test = X_test.reshape(X_test.shape[0], n_features)

Let's check it did what it should have.

In [None]:
X_train.shape

The values for our pixels are 0-255. We would like to normalize them so they are between 0 and 1.

In [None]:
X_train = X_train / 255
X_test = X_test / 255

Finally, we need to transform our labels. In the MNIST dataset, labels are integers 0-9. Neural network we will build will have as many neurons in the output layers as there are labels, each neuron representing one class. We now want to transform our labels using so-called `one-hot encoding`. In this process, each integer label is converted into an array with all elements equal to zero, except the one corresponding to the label's class. For example, 0 will be represented as \[0 0 0 0 0 0 0 0 0 0\], 1 as \[0 1 0 0 0 0 0 0 0 0], etc.

In [None]:
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

### Build the neural network model

There are two ways to define models in Keras. In `Sequential` we stack up the layers one by one. It is a pretty intiutive method and the one we will use now. `Functional API` models are used for more complex use cases such as models with shared layers or multiple outputs.

In [None]:
model = Sequential()

When adding the layers, we need to specify the number of units, activation function, and for the first layer the shape of the input, i.e. the number of the features. For the later layers the network figures out the input shape on its own, i.e. the input shape parameter for the next layer is equal to the number of nodes of the previous layer.

We will use ReLU activation function for all hidden layers, and softmax for the last layer. Softmax can be thought of as logistic regression for the case of multiple classes. It outputs probabilities that the sample belongs to one of the (in our case 10) classes. 

In [None]:
model.add(Dense(32, activation='relu', input_shape=(n_features,)))
model.add(Dense(10, activation='softmax'))

We next need to configure the network by specifying the optimizer we will use, the loss function and the metric.

In [None]:
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

We can use `.summary()` to find out how many parameters our model has to learn.

In [None]:
model.summary()

In [None]:
history = model.fit(X_train, y_train, batch_size=128, epochs=5, validation_data=(X_test, y_test))

In [None]:
loss, accuracy  = model.evaluate(X_test, y_test, verbose=False)

In [None]:
accuracy

In [None]:
plot_loss_curve(history)

In [None]:
plot_accuracy_curve(history)

While keeping everything else fixed, run the same model for 10, 20, 30 epochs.
* What is the accuracy of different neural networks?
* How do the loss curves look like?

While keeping everything else fixed, run the same model for the hidden layer with 64, 128, 256, and 512 units.
* What is the accuracy of different neural networks?
* How do the loss curves look like?

While keeping everything else fixed, run the same model with 2, 3, and 4 hidden layers.
* What is the accuracy of different neural networks?
* How do the loss curves look like?

While keeping everything else fixed, try other optimization methods (RMSprop, Adam).
* What is the accuracy of different neural networks?
* How do the loss curves look like?

If you feel like an extra task, add regularization to the model and see how it changes the performance.