### Keras

Keras is a popular high level deep learning library and framework written in Python.  We can quickly prototype deep learning models using the Keras API.  Keras has been open source since 2015.  It's documentation can be found at:

https://keras.io/

Source code can be found at:

github.com/fchollet/keras

Keras can also be seen as a debugging tool.  Why is it such a good debugging tool?

There is a very robust community around Keras.  It is a very popular libarary and has an active community, that might be able to help you with any questions that may come up.

groups.google.com/forum/#forum/keras-users

keras-slack-autojoin.herokuapp.com

Also, Keras has an intuitive, high-level API, leads to fast prototyping.

It has modular building blocks.  Easy to build new customer layers.

Extensions:

github.com/fchollet/keras-resources

github.com/fchollet/keras/tree/master/examples

Models:

github.com/fchollet/deep-learning-models

Datasets:

github.com/fchollet/keras/tree/master/keras/datasets

You can view Keras as a deep learning front end, and you can use other backends.  Keras provides a high level entry point, but in the back are different engines that do the heavy lifting.

One choice is google tensorflow.  There is also theano and Microsoft CNTK.  We exclusively use Tensorflow here.

You can easily swap backends, and depending on your configuration, Keras runs seemlessly on CPU's and GPU's.

Layers are the core abstraction for Keras.

Sequential layers have:

input
output
input_shape, and
output_shape

Can get the weights as a list of numpy arrays:

layer.get_weights()

Can set layer weights with:

layer.set_weights(weights)

Each layer has a defining configuration:

layer.get_config()

First, instantiate a sequential model.  Then, add layers to it.  Next compile the model with a mandatory loss function, a mandatory optimizer, and optional evaluation metrics.

Next we use data to fit the model.

Next we evaluate the model, persist or deploy the model, or start a new experiment, etc., depending on what you want to do at that point.

There are two options in specifying the loss function in the step of compiling the model:

1.  Import from loss module (preferred), which looks something like this:

from keras.losses import mean_squared_error
model.compile(loss=mean_squared_error, optimizer=...)

    (See the code example notebooks, since the import statement may be different based on the versions of tensorflow, etc) 

2.  Use strings, which looks like:

model.compile(loss='mean_squared_error', optimizer=...)

The string approach is error-prone.

To define an optimizer, you have two different ways of doing this:

1.  Load optimizir from a module (preferred):

Instantiate an SGD object, an optimizer, which also lets you set some parameters:

sgd = SGD(lr=0.01,          # need to set the learning rate >= 0
          decay=1e-6,       # learning rate decay after updates
          momentum=0.9)     # this is the momentum parameter used for the SGD optimizer
Then:

model.compile(loss=..., optimizer=sgd)

2.  Again, can also pass a string for the optimizer value:

model.compile(loss=..., optimizer='sgd')

Again, the second approach is more error-prone.  If you just pass a string, then the default optimizer parameters will be used.

Once you are done with compiling your model, you fit it.
You must specify the batch size, and the number of epochs you
want to train, and optionally you can specify the validation data:

model.fit(x_train, y_train,
          batch_size=32,
          epochs=10,
          validation_data=(x_val, y_val))

Then evaluate the model on test data:
evaluate(x_test, y_test, batch_size=32)

Then predict on new data:

predict(x_test, batch_size=32)


Multi-layer perceptrons or MLP's can make up what's called densely connected networks.

This involves stacking dense layers on top of each other with activations.

Regularization is achieved using dropout, and we can build Keras dropout layers.

To initialize a dense layer, we need to do a few things:

from keras.layers import Dense

Dense(units,                    # Number of output neurons
      activation=None,          # activation function my name: sigmoid, or whichever
      use_bias=True,            # use a bias term or not: best not to change this
      kernel_initalizer='glorot_uniform'  # leave these alone
      bias_initializer='zeros')           # leave these alone
      
Dropout layers are much easier to specify:     

from keras.layers import Dropout

Dropout(rate,            # Fraction of units to drop in each forward pass: a value between 0 and 1
        seed=None)       # Random seed for reproducibility
        
        
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout

batch_size = 128
num_classes  = 10  # the number of classes is the number of categories
epochs = 20  # train the network for 20 epochs in total

We use the mnist dataset of handwriten digits.  The mnist datasets consist of 60,000 train samples and 10,000 smepls for test.  Each individual sample is a 28 by 28 image which has handwriten digits on it.  The labels are just encoded as the actual digits 0 to 9.

(x_train, y_train), (x_test, y_test) = mnist.load_data()


Next is data preprocessing:
The mnist images are 28 x 28 images and we need to flatten them to 
instead be 784 long vectors, so this is how we do that:

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

Next we make sure that they are of type float:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

And then divide each of them by 255, in order to make them
normalized, i.e. each will be a fraction of 1, between 0 and 1:

x_train /= 255
y_test /= 255

As the last step in preprocessing, we are one-hot encoding the labels of
the data and the labels are in the y_train and y_test.  We supply both
the data to be transformed and the number of categories:

y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes

Next, we actually start to put together the model:

model = Sequential()

The first layer.  In the first layer, we also need to specify the input shape.
This is one of the images transformed as we did above.  It is 784 units long.
Shapes in layers after the first layer are inferred.
In the last layer we specify that number of output classes amd also the 
activation function is softmas which is appropriate for classification.


model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))


model.summary()

The model will be compiled with a loss function of categorical crossentropy
and the optimizer will be stochastic gradient descent.  Our metric will be
accuracy:

model.compile(loss='categorical_crossentrophy',
              optimizer='sgd',
              metrics=['accuracy'])

Next we fit our model with the training data:

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data(x_test, y_test))

And then we evaluate:

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

The model.evalute gives back both test loss and accuracy, and you should get about 98% accuracy in this model.

[end]