# Keras and estimators API

In the previous example of hand-written digit recognition, we have used layers API to TensorFlow and we have had to do a lot of matrix multiplication, definining loss and optimizer ops manually.

We could put most of this logic into an estimator - a black box containing the logic of the neural network model, training, evaluation and prediction loops - giving an end user 3 methods: `fit`, `evaluate` and `predict`. Simple!

An example of library providing this higher level API is Keras. Implementing neural networks with Keras is 
as simple as constructing something using lego:

In [1]:
from keras.datasets import mnist
from keras import backend as K
from keras import utils

batch_size = 128
num_classes = 10
epochs = 200

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = utils.to_categorical(y_train, num_classes)
y_test = utils.to_categorical(y_test, num_classes)

Using TensorFlow backend.


('x_train shape:', (60000, 28, 28, 1))
(60000, 'train samples')
(10000, 'test samples')


In [2]:
from keras.models import Sequential
from keras.layers import Dense
#from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D

from keras.optimizers import Adam

In [3]:
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation='relu')) #channels last
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
#model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

## Compile the model
Keras is built on top of Theano (and now TensorFlow as well), both packages that allow you to define a *computation graph* in Python, which they then compile and run efficiently on the CPU or GPU without the overhead of the Python interpreter.

When compiing a model, Keras asks you to specify your **loss function** and your **optimizer**. The loss function we'll use here is called *categorical crossentropy*, and is a loss function well-suited to comparing two probability distributions.

Here our predictions are probability distributions across the ten different digits (e.g. "we're 80% confident this image is a 3, 10% sure it's an 8, 5% it's a 2, etc."), and the target is a probability distribution with 100% for the correct category, and 0 for everything else. The cross-entropy is a measure of how different your predicted distribution is from the target distribution. [More detail at Wikipedia](https://en.wikipedia.org/wiki/Cross_entropy)

In [28]:
adam = Adam(lr=0.0001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=["accuracy"]) #optimizer='adam')

## Train the model
This is the fun part: you can feed the training data loaded in earlier into this model and it will learn to classify digits

In [29]:
model.fit(x_train, y_train,
          batch_size=128, epochs=12,
          verbose=1, 
          validation_data=(x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x120564ed0>

## Finally, evaluate its performance

In [31]:
score = model.evaluate(x_test, y_test,verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

('Test score:', 0.030239809037391888)
('Test accuracy:', 0.99080000000000001)


## Exercise

Include dropout layer. How does the accuracy change as a function of the dropout probability?

# Estimator API

TODO: talk about estimator API, but we do not have time to give a full example

https://www.tensorflow.org/extend/estimators

https://www.tensorflow.org/get_started/tflearn