# Techforum : Deep Learning (part 2/3)

## Simple Softmax Regression using Keras

Objective:
- Discover [Keras](https://keras.io/) : a high-level neural networks API running on top of [TensorFlow](https://www.tensorflow.org/) 
- See how it makes easier/faster to write code than using directly Tensorflow :
    - Same example as in part 1, but rewrited to use Keras APIs

Note : this toy-example is still ok to run on a CPU laptop :)
    
Next : Improve the accuracy results using Deep Learning : Convolutional Neural Networks (Convnets) in Keras

Notebook inspired by : https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py

In [1]:
import tensorflow as tf

import timeit

# Use Tensorflow tutorial's helper to load/prepare the MNIST dataset
from tensorflow.examples.tutorials.mnist import input_data

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout   # Keras Building blocks for Deep Neural Net


Using TensorFlow backend.


Couldn't import _dotparser, loading of dot files will not be possible.


### Load the MNIST dataset

In [2]:
# Import data (Thanks to helpers provided in Tensorflow tutorials !)
mnist = input_data.read_data_sets('./', one_hot=True)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


In [3]:
# Define for convenience a few (Python/Numpy) variables to handle the dataset 
x_train = mnist.train.images
y_train = mnist.train.labels
x_test = mnist.test.images
y_test = mnist.test.labels


In [4]:
# Not important : Just a counter to separate logs directory between each training experiments
experiments = 1

### Define some Hyperparameters for the network

In [5]:
# How fast the network will learn, by making more or less small updates during training
#    too low, and the network will take too much time to learn
#    too high, and the network might never converge to a solution
learning_rate = 0.5 

# Number of training epoch (in Keras : ~loop on the full training dataset)
epoch = 5 

# Number of images to process per batch iteration
batch_size = 100

# Path to home of the Tensorboard logs and Training Checkpoints
logs_path = "./logs/mnist/Keras/softmaxReg" 


### Build the model : just by stacking the layers

In [6]:
# The Sequential model is a linear stack of layers
model = Sequential()

# Build the same Softmax Regression network as in the example part 1
# Dense is just a regular densely-connected NN layer, here with 10 neurons
# and we apply a softmax activation on them
model.add(Dense(10, activation='softmax', input_shape=(784,)))



In [7]:
# Cool tool to display information about the model we have built
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


### Training in Keras : compile, fit (train), evaluate

In [8]:
# Define a callback to write a log for TensorBoard (called by model.fit())
tbCallBack = keras.callbacks.TensorBoard(log_dir=str(logs_path + str(experiments)), 
                                         histogram_freq=0, 
                                         write_graph=True, 
                                         write_images=False)

In [9]:
# Monitor execution time
start_time = timeit.default_timer()

In [10]:
# It possible to use one of the Keras or a Tensorflow optimizers as here :
opt= keras.optimizers.TFOptimizer(tf.train.GradientDescentOptimizer(learning_rate=learning_rate))

# Configure the model for training
model.compile(loss='categorical_crossentropy',
              optimizer=opt,  
              metrics=['accuracy'])

# Train the model for a fixed number of epochs (iterations on a dataset)
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epoch,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    callbacks=[tbCallBack])

# Return the loss value & metrics values for the model in test mode
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

#### Training is done !

print("Execution time= %4f sec" % (timeit.default_timer() - start_time)) 

# Not important : increment our counter to avoid mixing up with our logs between experiments in Jupyter
experiments+=1

Train on 55000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test loss: 0.2771788395
Test accuracy: 0.922
Execution time= 36.703303 sec


## Getting Deeper ...

Let's try to stack more layers to the neural Network and see how it affects the performance and computation time. see how easy it is using Keras :



In [11]:
#### The Sequential model is a linear stack of layers
model = Sequential()

#### Build the same Softmax Regression network as in the exampel part 1

# Just a regular densely-connected NN layer, here with 512 neurons
# relu (rectified linear unit) is applying a non-linearity 
model.add(Dense(units=512, activation='relu', input_shape=(784,)))

# Add regularization : Dropout consists in randomly setting a fraction rate 
# of input units to 0 at each update during training time which helps prevent overfitting
model.add(Dropout(rate=0.2))

# Continue to stack layers similarly
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(rate=0.2))

# And add the final layer
model.add(Dense(units=10, activation='softmax'))

# Cool tool to display information about the model we have built
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________



That's **85x** more parameters to opimizise during the training (and this is still a toy network !)

#### Let's train the updated model


The code below is exactly  the same as above

In [12]:
# This callback writes a log for TensorBoard and is called by model.fit()
tbCallBack = keras.callbacks.TensorBoard(log_dir=str(logs_path + str(experiments)), 
                                         histogram_freq=0, 
                                         write_graph=True, 
                                         write_images=False)

# It possible to use one of the Keras or a Tensorflow optimizers as shown here
opt= keras.optimizers.TFOptimizer(tf.train.GradientDescentOptimizer(learning_rate=learning_rate))

# Not important : Monitor execution time
start_time = timeit.default_timer()


#### Configure the model for training
model.compile(loss='categorical_crossentropy',
              optimizer=opt,  
              metrics=['accuracy'])

#### Train the model for a fixed number of epochs (iterations on a dataset)
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epoch,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    callbacks=[tbCallBack])

##### Return the loss value & metrics values for the model in test mode
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

#### Training is done !

print("Execution time= %4f sec" % (timeit.default_timer() - start_time)) 

# Not important : increment our counter to avoid mixing our logs 
# between experiments in Jupyter
experiments+=1


Train on 55000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test loss: 0.0743523620928
Test accuracy: 0.9782
Execution time= 114.278138 sec
