# A First Neural Network

We will build a shallow neural network to classify MNIST digits based on the Keras framework.

Keras is an abstraction on top of deeplearning frameworks such as tensorflow and pytorch.
It allows us to design and train neural networks using a high level API.

The notebook is based on the https://www.deeplearningillustrated.com/ notebook on a first shallow network. 

You can run the notebook locally or on Google Colab by pressing the button below. 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/the-deep-learners/deep-learning-illustrated/blob/master/notebooks/shallow_net_in_keras.ipynb)

### Installation

We install tensorflow 1.9.0.
This will be the backend for executing the neural network training. 
It will also automatically install the Keras API.
There are major incompatabilities between different tensorflow versions.
In order to preserve compatability with the code on this notebook we force installation of version `1.9.0`. 

In [None]:
!conda install -y tensorflow=1.9.0

#### Load dependencies

Run the cell below to load the major dependencies we need for this notebook.

In [None]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot as plt

### Load data

We will load the MNIST data via keras.
It is the same format and composition as we have used before. 
* 60000 training samples
* 10000 dedicated test samples

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
X_train.shape

In [None]:
y_train.shape

#### Visualize Samples

The cell below can be used to plot some of the sample numbers from the training set. 

In [None]:
import random
plt.figure(figsize=(5,5))
for k in range(12):
    plt.subplot(3, 4, k+1)
    plt.imshow(X_train[random.randrange (0,60000,1)], cmap='Greys')
    plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
X_test.shape

In [None]:
y_test.shape

#### Target Encoding

The cell below shows us the encoding of the first twelve targets.
We can see that the targets are encoded as integer values from 0 to 9.

In [None]:
y_test[0:12]

In [None]:
plt.imshow(X_test[0], cmap='Greys')

#### Preprocess data

Minor pre-processing is done on the data.
We ensure that the values are stored as 'float32' as opposed to 'integer'.

In [None]:
X_train = X_train.reshape(60000, 784).astype('float32')
X_test = X_test.reshape(10000, 784).astype('float32')

We can also normalize the grayscale values by dividing trough 255. 
Remember that we are dealing with images and that the individual features are pixel-wise gray scale values between 0 and 255. 

By dividing by 255 we transform the feature value to the scale in the range [0,1]


In [None]:
X_train /= 255
X_test /= 255

In [None]:
X_test[0]

#### Mapping to Output Layer

In order to map the target values to something that is easier to model as an output-layer,
we use an encoding that encodes a number between 0 and 9 in form a 10-dimensional vector of 0 and 1 values.

This is a typical approach in deep learning to reduce the network to the number of classes `n` at the output layer.

In [None]:
n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

The number 7 encoded in the 10-dimensional output vector.

In [None]:
y_test[0]

### Design neural network architecture

The cell below defines the architecture of our network. 

`Sequential()` instantiates the model. The term `Sequential` just expresses that we will add a sequence of `layer`s to the model. 

In the model below there are three layers defined:

* Input Layer: This is defined implicitly as an argument of the first layer. 
* Dense Layer: The first dense layer is defined with 64 neurons 
* Dense Layer: A second dense layer which is our output layer and has 10 neurons.


In [None]:
model = Sequential()
model.add(Dense(64, activation='sigmoid', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

#### Dense Layer

A dense layer is a layer where each neuron is fully connected with the input of the preceding layer. 

In our case in the first dense layer each neuron is connected to all inputs.
The second dense layer (which is also our output layer) is fully connected to the 64 neurons of the first dense layer. 

Below cells show the number of params per layer.

In [None]:
# 784 weights per neuron
(64 * 784)

In [None]:
# In addition to the weights for the input there is also an additional bias weight per neuron.
(64 * 784) + 64

In [None]:
(10 * 64) + 10

### Configure model

After we have identified the model architecture we have to do some additional configuration. 

Most of this configuration can be interpreted as hyperparameters. 

* loss: is a special function to measure at each training step the size of the error the model makes on its prediction. `mean_squared_error` is a standard function to measure this error in a way that is very well suited for using a mathematical optmization function that helps us to efficiently find the best parameter combination. 
* SGD: is referring to Stochastic Gradient Descent. This is the mathmatical optimization function that we use in order to find the optimal values for the weights. Remember that a large number of double valued parameters means we have a huge parameter space. That is why we need the optimization function. 
* lr: That is the learning rate the we define. The value that governs how much a single sample is allowed to contribute. 
* metrics: Just governs how we measure the success at each step.

In [None]:
model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01), metrics=['accuracy'])

### Training

In order to run the training we have to supply the following configuration:
* epochs: how many times we want to iterate over the dataset
* batch_size: how many samples should be used in each stochastic gradient optmization step. 
* train and validation datasets
* verbose: this defines the level of log output we receive during training. 

In [None]:
model.fit(X_train, y_train, batch_size=128, epochs=200, verbose=1, validation_data=(X_test, y_test))

### Evaluation

In [None]:
model.evaluate(X_test, y_test)