## Shallow Net - Keras


In [1]:
import numpy as np  
np.random.seed(42)
import keras
from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD

Using TensorFlow backend.


In [0]:
# Load data
# mnist has 60,000 training examples, and 10,000 test examples
(X_train, y_train) , (X_test, y_test) = mnist.load_data()


In [3]:
X_train.shape

(60000, 28, 28)

In [4]:
y_train.shape

(60000,)

In [5]:
y_train[0:10] # first is a 5, then a 0, then 4 etc.

array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4], dtype=uint8)

In [0]:
# Preprocess
# reshape the 60000 28x28 images into a vector of 784
X_train = X_train.reshape(60000, 784).astype('float32')
X_test = X_test.reshape(10000, 784).astype('float32')

# Step 2: Change all values to be btw 0 - 1
X_train /= 255
X_test /= 255

In [0]:
# We have 10 classes of digits
# We need to convert this to use one-hot encoding so that instead of the NN seeing 
# 4,2 it will see arrays:
# [0,0,0,0,1,0,0,0,0,0] as 4; [0,0,1,0,0,0,0,0,0,0] as 2
n_classes = 10
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

In [8]:
y_train[0:3]

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]], dtype=float32)

In [9]:
# Design our NN
# 784 inputs per image
# 64 hidden neurons
# 10 outputs
# Since each layer feeds into the next these are sequential models

model = Sequential()
# Add in a dense (fully connected) layer with 64 neurons
# We use the sigmoid activation function and we also need to
# specify that the input shape is 784 x 1
# Middle Layer
model.add(Dense((64), activation = 'sigmoid', input_shape = (784,)))
# On top of that we need to add our output layer
# Also dense - we have 10 classes of digits, and a diff activation fn here
model.add(Dense((10), activation = 'softmax'))






In [10]:
model.summary() # see our model

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


Notice how the Param for the first dense layer is 50240 (this is equivalent to (784*64) + 64 -> each of the input values has a weight + bias for each of the 64 neurons. 

The 2nd layer too (64 * 10) + 10 -> 650

In [11]:
# Configure model
model.compile(loss = 'mean_squared_error', optimizer=SGD(lr = 0.01), metrics = ['accuracy'])




In [12]:
# epoch = how many times do you want to run over the data
# The more epochs you run for the better the NN becomes at learning.
model.fit(X_train, y_train, batch_size=128, epochs = 100, verbose = 1, 
          validation_data = (X_test, y_test))




Train on 60000 samples, validate on 10000 samples
Epoch 1/100





Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Ep

<keras.callbacks.History at 0x7f08f6bf5eb8>

The training and the testing accuracy are both 9% after 1 epoch which is not great - we had 10 classes so it should get this accuracy by random guessing ;).

Notice however that as we run the NN for more epochs we get better accuracy. Even with this shallow net, when we run it with 20 epochs the accuracy gets to around 30%, improving to around 70% if the epochs are increased to 100.