# Implementing models using the low-level backend

Keras supports two different backends to accelerate model training and evaluation using GPUs. To make the code easier to read, both backends are wrapped using the same API and all of the code in Keras is implemented using these wrappers. In case you want to extend Keras with new layer types, optimization algorithms, or cost functions, this is the way to go.

Let's get the gist of it by implementing a simple classification algorithm: logistic regression. The model has two trainable parameters: a weight matrix `W` and a bias vector `b`, which are used to perform an affine projection on the input `x`. The probability that the input belongs to the positive class is given by the logistic function:

$$
\hat{y} = \frac{1}{1 + e^{-(Wx + b)}}
$$

and the final predicted class is the positive class if $\hat{y} > 0.5$ and the negative class otherwise.

In this tutorial, we will learn how to implement this model and learn its optimal parameters using gradient descent. That's basically what we do in deep learning (but with a lot more layers, fancier architectures, and more robust optimization algorithms).

In [2]:
from keras import backend as K
import numpy as np

Using Theano backend.


A placeholder can be seen as an "open slot" where you can put values later. These are used for function arguments, such as the input and output of the model.

In [3]:
x = K.placeholder(shape=(None, 5))
y = K.placeholder(shape=(None, 1))

A variable, on the other hand, can be seen as an automatically-managed shared memory between the host and the GPU (when using a GPU, of course). If running on a CPU, it is just a pointer to a normal Python array.

Note that we are initializing `W` with small random numbers following a Gaussian distribution, and `b` with zero (which is common practice for this type of model). 

In [4]:
W = K.variable(0.01*np.random.randn(5, 1))
b = K.variable(np.zeros(1))

As `variables` have actual values, we can inspect and even change them using `.get_value()` and `.set_value()`.

In [5]:
print('Initial weights: {}'.format(W.get_value()))
print('Initial bias: {}'.format(b.get_value()))

Initial weights: [[-0.00656898]
 [ 0.00647002]
 [-0.00780751]
 [-0.00170109]
 [ 0.00424325]]
Initial bias: [ 0.]


The backend also contains common element-wise functions and supports common Numpy notation (so remember to use K.dot for matrix multiplications!)

In [6]:
y_hat = K.sigmoid(K.dot(x, W) + b)

If we try to print `y_hat`, we get something strange:

In [7]:
print(y_hat)

sigmoid.0


`sigmoid.0` corresponds to the name of a *node* in the graph generated by Theano (or Tensorflow, if that's the backend you are using). Since `y_hat` depends on `x` and `y`, which do not have values yet, we cannot compute any values for it.

We can use backend-dependent functions to print a graph. For Theano, we can use `theano.pp` (simple) or `theano.printing.debugprint`:

In [8]:
from theano import pp
pp(y_hat)

'sigmoid(((<TensorType(float32, matrix)> \\dot HostFromGpu(<CudaNdarrayType(float32, matrix)>)) + HostFromGpu(<CudaNdarrayType(float32, vector)>)))'

In [9]:
from theano.printing import debugprint
debugprint(y_hat)

sigmoid [id A] ''   
 |Elemwise{add,no_inplace} [id B] ''   
   |dot [id C] ''   
   | |<TensorType(float32, matrix)> [id D]
   | |HostFromGpu [id E] ''   
   |   |<CudaNdarrayType(float32, matrix)> [id F]
   |DimShuffle{x,0} [id G] ''   
     |HostFromGpu [id H] ''   
       |<CudaNdarrayType(float32, vector)> [id I]


The code we wrote so far computes the output of a logistic regression model, but we still have to train it. Let's define a loss function and its gradients for each of the trainable parameters.

In [10]:
loss = K.mean(K.binary_crossentropy(y_hat, y))

params = [W, b]
gradients = K.gradients(loss, params)

lr = 0.1

# Let's compute the gradient descent updates for each of the trainable parameters
# in our model and store it on a list of tuples in the format (parameter to update, new_value)
updates = []

for p, g in zip([W, b], gradients):
    new_p = p - lr*g
    updates.append((p, new_p))

We can turn a backend *expression* into a function by calling `K.function`. We pass a list of inputs (note: it has to be a list, even if your function only has one input!), the function we want to compute, and optionally, a list of parameters that we want to be updated **after** calling the function.

In [11]:
train_fn = K.function([x, y], loss, updates=updates)

This will convert the graph into C++ and CUDA code and compile it (when using the Theano backend - Tensorflow will to something slightly different, but the workflow in Keras is exactly the same).

Now, we will generate a random dataset with 16 examples and use it to train our model using gradient descent. As the number of samples is small, gradient descent will do the trick here - but this is never the case with deep learning!

In [12]:
# Generating dummy dataset
X_batch = 2*np.random.randn(16, 5)
y_batch = np.random.randint(0, 2, size=(16,1))

In [13]:
for iteration in range(1000):
    iter_loss = train_fn([X_batch, y_batch])
    if iteration % 100 == 0:
        print('Iteration {} loss: {}'.format(iteration, iter_loss))

Iteration 0 loss: 0.6992310881614685
Iteration 100 loss: 0.43966320157051086
Iteration 200 loss: 0.41002073884010315
Iteration 300 loss: 0.39073270559310913
Iteration 400 loss: 0.37663042545318604
Iteration 500 loss: 0.3657509684562683
Iteration 600 loss: 0.3570306897163391
Iteration 700 loss: 0.3498280942440033
Iteration 800 loss: 0.34373539686203003
Iteration 900 loss: 0.3384825587272644


In [14]:
print('Final weights: {}'.format(W.get_value()))

Final weights: [[ 0.95124745]
 [-2.10494328]
 [ 2.35118389]
 [-0.78173244]
 [-0.08810341]]


In [15]:
predict = K.function([x], y_hat)

In [16]:
predictions = predict([X_batch])

In [17]:
accuracy = (y_batch == (predictions > 0.5).astype('int')).mean()*100

In [18]:
print(accuracy)

81.25
