# Intermediate Neural Network  in Keras
Build a intermediate neural network from the MNIST hand written digit images.  Keras is a API to use with Tensorflow. Each image is 28x28 pixels. 
This example is using 2 hidden layers neural network, the Relu as activation function and the cross entropy as cost function

#### Set the seed for reproducibility

In [12]:
import numpy as np
np.random.seed=42

#### Load dependencies

In [13]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

The image is cut into rows of 28 pixels, so 28 rows. This array will be the input to the neural network. The output layer should be a 10 dimension array set to 0 or 1 to the matching digit.
Dense means the hidden layer of the neural network is connected to all the input layer and output layer. 
The first choice is to use 64 nodes in the hidden layer.
SGD is the stochastic gradian descent
#### Load data|
60000 digits in the training set and 10000 test set. 

In [17]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

#### Preprocessing the data
Transform the matrix to a uni dimension array. And transform 255 value to be a 1.

In [18]:
X_train = X_train.reshape(60000,784).astype('float32')
X_test = X_test.reshape(10000,784).astype('float32')
X_train/= 255
X_test /=255

Transform the y to one-hot categorical variable

In [19]:
n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

#### Design a neural network
Use the tanh activation function. The input is 64 input, the output is 10

In [20]:
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

In [21]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_5 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_6 (Dense)              (None, 10)                650       
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


50240 = 784 *64 + 64 
#### Configure the model
Using the stochastic gradient descent with a learning rate 0.01 to work on local minima, and will focus on accuracy. The mean_squared_error is a quadratic function, while categorical_crossentropy

In [22]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy'])

#### Train and Test the mode
As there is 10 ouput, the random guess is at an accuracy of 10%.
To get better results define the number of iterations on the neural network by setting the epochs.

In [23]:
model.fit(X_train, y_train, batch_size=128, epochs=200, verbose=1, validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200

KeyboardInterrupt: 

On 1 layer NN with tanh activation function a learning rate at 0.01: After 100 epochs :  loss: 0.0170 - acc: 0.8974 - val_loss: 0.0162 - val_acc: 0.902
Still one layer NN, with relu and crossentropy cost function the results after 20 epocs is loss: 0.2335 - acc: 0.9348 - val_loss: 0.2296 - val_acc: 0.934

With a second hidden layer, same crossentropy, the results at 20 epocs are 1s - loss: 0.1911 - acc: 0.9456 - val_loss: 0.1861 - val_acc: 0.946
