# Using Keras layers with TensorFlow

The Keras library provides abstractions for building neural networks in Python. It can leverage different deep learning libraries as the backend, effectively providing a standardized interface in top of them in the form of an API.

In this tutorial, the use of Keras layers within a TensorFlow workflow is demonstrated. In order to be able to use Keras you have to install it first (`pip install keras` or applying the `keras` environment to the SherlockML server that runs this notebook).

Tutorial taken from: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

In [None]:
import tensorflow as tf
from keras import backend as K
from keras.layers import Dense
from keras.objectives import categorical_crossentropy
from tensorflow.examples.tutorials.mnist import input_data
from keras.metrics import categorical_accuracy as accuracy

import numpy as np

## Define a session

In [None]:
sess = tf.Session()
K.set_session(sess)

## Define the structure of the graph (neural network)

The `placeholder` objects from TensorFlow are meant to be the prototypes of (input) layers of neural network. In this case we are specifying that each datapoint is an array of `float32` numbers with shape (,784).

In [None]:
img = tf.placeholder(tf.float32, shape=(None, 784))

Creating the structure of the neural network using `Dense` layers from Keras. Despite these not being TensorFlow objects, they can be made to interact with them, and TensorFlow syntax can be used. In this case the structure is:
$$
(\text{img}) \longrightarrow \text{Dense} \longrightarrow \text{Dense} \longrightarrow \text{Dense},
$$
with `img` representing the input datapoint. Disregarding the nonlinear activation functions (`relu` and `softmax`), the matrix structure of the neural network is:
$$
\biggl[10\times128\biggr]\biggl[128\times128\biggr]\biggl[128\times784\biggr]\biggl(784\biggr) = \biggl(10\biggr),
$$
where we have denoted $n$-component (column) vectors with $\biggl(n\biggr)$ and $m\times n$ matrices as $\biggl[n\times m\biggr]$.

In [None]:
x = Dense(128, activation='relu')(img)
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x)

Another `placeholder` object corresponds to the output (a one-hot encoding of the ten digits 0-9).

In [None]:
labels = tf.placeholder(tf.float32, shape=(None, 10))

## Define the loss function

Keras has specific objects corresponding to different possible loss functions to use. In this case we use the categorical cross entropy.

In [None]:
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))

## Load dataset

Load the MNIST dataset (available within TensorFlow).

In [None]:
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)

## Define a training step

TensorFlow provides `operation` objects corresponding to abstractions of operations. In graph language operations (or "ops") correspond to the nodes of the graph. In this case we define a training step, corresponding to optimizing the weights of the neural network using gradient descent to minimize the previously defined loss function.

The general structure is:
- `tensor`s correspond to edges in the graph. They have no values by themselves, but can be evaluated.
- `operation`s correspond to nodes in the graphs and describe calculations that consume and produce tensors.
- `session`s run TensorFlow operations and encapsulate the state of the TensorFlow runtime.

For more information on operations, tensors and sessions see: https://www.tensorflow.org/programmers_guide/low_level_intro

There is no need to explicitly define __stochastic__ gradient descent, as this is achieved just by passing only a subset (*batch*) of the whole dataset to a regular gradient descent.

In [None]:
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

## Initialize global variables

TensorFlow variables need to be initialized if we are using the low-level API (on the other hand high level abstractions like Keras automatically initialize the variables).

The `global_variables_initializer` function returns an operation (to be run by a `session`) that initializes all the global variables in the `tf.GraphKeys.GLOBAL_VARIABLES` variables collection in one go. Operations are objects that can be passed to the `run()` method of a `session` to be executed.

For more information see: https://www.tensorflow.org/programmers_guide/variables

In [None]:
init_op = tf.global_variables_initializer()

In [None]:
sess.run(init_op)

## Training

The dataset is already conveniently devided into batches of which we can specify the size. We loop over all the batches, performing a training step (gradient descent) at each iteration. The syntax below uses the `session` as a context variable: this is just a convenient way to use an operation's `run()` method. In fact, all operations are executed by the session.

Although we used Keras layers, all the operations are performed by TensorFlow objects, so the training phase doesn't give the nice visual feedback we get when using Keras...

From a TensorFlow perspective, what we are running the `train_step` operation and for each batch of data we pass the features to the `img` placeholder (tat goes into the input layer of the graph) and the labels to the `labels` placeholder. This is required by the definition of `train_step`, because the `loss` cost function has `preds` (output layer of the graph) and `labels` (placeholder for the true labels). Indeed, the output layer of the graph, once evaluated passing a datapoint as the input, produces the predicted labels: as usual the const function depends on the true labels and the predicted ones.

### Running TensorFlow operations

TensorFlow `operation`s (the nodes of the graph) are always run by a TensorFlow `session`. This can be done in either of three ways:
- Calling the session's `run()` method, passing it the operation. If the operation requires arguments to be specified, they are passed as a dictionary called `feed_dict`.
- Calling the operation's `run()` method with the session open as a context variable (obtained by calling the `session.as_default()` method).
- Calling the operation's `run()` method, passing it the session as the `session` keyword argument, plus the `feed_dict` dictionary if needed.

The cells below demonstrate these three ways of doing the same thing. Notice that executing more than one cell correspond to more training (more epochs).

In [None]:
#Call the session's run() method.
for i in range(100):
    batch = mnist_data.train.next_batch(50)
    sess.run(
        train_step,
        feed_dict={
            img: batch[0],
            labels: batch[1]
        }
    )

In [None]:
#Use the session as a context variable.
with sess.as_default():
    for i in range(100):
        batch = mnist_data.train.next_batch(50)
        train_step.run(
            feed_dict={
                img: batch[0],
                labels: batch[1]
            }
        )

In [None]:
#Using the operation's run() method.
for i in range(100):
    batch = mnist_data.train.next_batch(50)
    train_step.run(
        feed_dict={
            img: batch[0],
            labels: batch[1]
        },
        session=sess
    )

## Testing

Keras provides categorical accuracy (a generalization of the accuracy metric to the multi-class case) as a possible metric for testing. 

When passed TensorFlow tensors corresponding to the labels and the predictions (`labels` is the placeholder for the one-hot encoded digits, `pred` is the output layer of the graph), the categorical accuracy becomes a tensor itself, which can be evaluated with using its `eval()` method.

### Evaluating tensors

Tensors are evaluated in pretty much the same way operations are run: if they depend on other parts of the graph whose values are stored in a session, they need a session to be evaluated. This can happen in three possible ways:
- Pass the tensor to the session's `run()` method, along with any needed argument (e.g. the features of a datapoint) in the `feed_dict` dictionary.
- Call the `eval()` method of the tensor with the session open as a context variable, again passing any needed argument in `feed_dict`.
- Call the `eval()` method of the tensor, passing the session to it, along with the arguments.

In the test phase, we evaluate the categorical accuracy passing the 10000 test images as datapoints (`preds` will give a prediction for each) and their labels as the true labels w.r.t. which to compute the accuracy metric.

The output of the evaluation is a `NumPy` with a number of components equal to the number of test datapoints. Each component contains either 1 (correct prediction) or 0 (wrong prediction).

In [None]:
acc_value = accuracy(labels, preds)

with sess.as_default():
    acc = acc_value.eval(
        feed_dict={
            img: mnist_data.test.images,
            labels: mnist_data.test.labels
        }
    )

In [None]:
np.unique(acc, return_counts=True)

In [None]:
print("Categorical accuracy: {}%".format(100*(acc.sum()/len(acc))))

## Getting the prediction for a single datapoint

We can get the prediction for a single datapoint from the neural network by evaluating the `preds` tensor and passing one image from the dataset as the input.

A tensor is a somewhoat immaterial construct: tensors always need a `session` to be evaluated. When using the `eval()` method of a tensor, a session must always be specified: this is just a shortcut to avoid typing `session.run()`. A session can be passed in the keyword argument to `eval()` or can be specified as a Python context variable.

In [None]:
#Use the session as context variable.
with sess.as_default():
    one_pred = preds.eval(
        feed_dict={
            img: mnist_data.train.images[0].reshape((1,784))
        }
    )

The cells below are equivalent to the above one and demonstrates how a session is needed to evaluate a tensor, if the tensor depends on other parts of the graph, and the possible syntax to make that happen.

In [None]:
#Pass the session to the eval() method of the tensor.
one_pred = preds.eval(
    feed_dict={
        img: mnist_data.train.images[0].reshape((1,784))
    },
    session=sess
)

In [None]:
#Pass the tensor to the run() method of the session.
sess.run(
    preds,
    feed_dict={
        img: mnist_data.train.images[0].reshape((1,784))
    }
)

In [None]:
one_pred[0]

In [None]:
print("Predicted digit: "+str(np.argmax(one_pred[0])))
print("Probability: "+str(one_pred[0][np.argmax(one_pred[0])]))