In [None]:
from IPython.core.display import HTML
with open ("../style.css", "r") as file:
    css = file.read()
HTML(css)

# Building a Neural Network with Keras

In [None]:
import gzip
import pickle
import numpy  as np
import keras
import tensorflow as tf

The following magic command is necessary to prevent the Python kernel to die because of linkage problems.

In [None]:
%env KMP_DUPLICATE_LIB_OK=TRUE

The function $\texttt{vectorized_result}(d)$ converts the digit $d \in \{0,\cdots,9\}$ and returns a NumPy vector $\mathbf{x}$ of shape $(10, 1)$ such that
$$
\mathbf{x}[i] = 
\left\{
  \begin{array}{ll}
     1 & \mbox{if $i = j$;} \\
     0 & \mbox{otherwise.}
  \end{array}  
\right.
$$
This function is used to convert a digit $d$ into the expected output of a neural network that has an output unit for every digit.

In [None]:
def vectorized_result(d):
    e    = np.zeros((10, ), dtype=np.float32)
    e[d] = 1.0
    return e

The function $\texttt{load_data}()$ returns a pair of the form
$$ (\texttt{training_data}, \texttt{test_data}) $$
where 
<ul>
<li> $\texttt{training_data}$ is a list containing 60,000 pairs $(\textbf{x}, \textbf{y})$ s.t. $\textbf{x}$ is a 784-dimensional `numpy.ndarray` containing the input image and $\textbf{y}$ is a 10-dimensional `numpy.ndarray` corresponding to the correct digit for x.</li>
<li> $\texttt{test_data}$ is a list containing 10,000 pairs $(\textbf{x}, y)$.  In each case, 
     $\textbf{x}$ is a 784-dimensional `numpy.ndarray` containing the input image, 
     and $y$ is the corresponding digit value.
</ul>

In [None]:
def load_data():
    with gzip.open('../mnist.pkl.gz', 'rb') as f:
        train, validate, test = pickle.load(f, encoding="latin1")
    X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
    X_test  = np.array([np.reshape(x, (784, )) for x in test [0]])
    Y_train = np.array([vectorized_result(y) for y in train[1]])
    Y_test  = np.array([vectorized_result(y) for y in test [1]])
    return (X_train, X_test, Y_train, Y_test)

In [None]:
X_train, X_test, Y_train, Y_test = load_data()

Let us see what we have read:

In [None]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

Below, we create a neural network with two hidden layers.
- The first hidden layer has 60 nodes and uses the <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">ReLU function</a> 
  as activation function.
- The second hidden layer uses 30 nodes and also uses the ReLu function.  
- The output layer uses the <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax function</a> as 
  activation function.  This function is defined as follows:
  $$ \sigma(\mathbf{z})_i := \frac{e^{z_i}}{\sum\limits_{d=0}^{9} e^{z_d}}  $$
  Here, $N$ is the number of output nodes and $z_i$ is the sum of the inputs of the $i$-th output neuron.
  This function guarantees that the outputs of the 10 output nodes can be interpreted as probabilities, since 
  there sum is equal to $1$.
- The <em style="color:blue">loss function</em> used is the <em style="color:blue">cross-entropy</em>.  
  If a neuron outputs the value $a$, when it should output the value $y \in \{0,1\}$, the cross entropy cost of 
  this neuron is defined as
  $$ C(a, y) := - y \cdot \ln(a) - (1-y)\cdot \ln(1-a). $$
- The cost function is minimized using stochastic gradient descent with a learning rate of $0.3$.

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Dense( 80, activation='relu', input_dim=784))
model.add(keras.layers.Dense( 40, activation='relu'               ))
model.add(keras.layers.Dense( 40, activation='relu'               ))
model.add(keras.layers.Dense( 10, activation='softmax'            ))
model.compile(loss       = 'categorical_crossentropy', 
              optimizer  = tf.keras.optimizers.SGD(lr=0.3), 
              metrics    = ['accuracy'])
model.summary()

In [None]:
%%time
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=30, batch_size=100, verbose=1)