# `keras`

> Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation.



The core data structure of Keras is a model, a way to organize layers. The main type of model is the ``Sequential model``, a linear stack of layers. 

```Python
from keras.models import Sequential
model = Sequential()
```

Stacking layers is as easy as ``.add()``:

```Python
from keras.layers import Dense, Activation

model.add(Dense(output_dim=64, input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))
```

Once your model looks good, configure its learning process with ``.compile()``:

```Python
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
```

If you need to, you can further configure your optimizer.

```Python
from keras.optimizers import SGD
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))
```

You can now iterate on your training data in batches:

```Python
model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)
```

Evaluate your performance in one line:
```Python
loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)
```

Or generate predictions on new data:

```Python
classes = model.predict_classes(X_test, batch_size=32)
proba = model.predict_proba(X_test, batch_size=32)
```

## Example: MNIST  MLP

In [2]:
'''
Trains a simple deep NN on the MNIST dataset.
You can get to 98.40% test accuracy after 20 epochs.
'''

from __future__ import print_function
import tensorflow as tf
import numpy as np

tf.reset_default_graph()

np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras.utils import np_utils

batch_size = 128
nb_classes = 10
nb_epoch = 10

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test  = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

# print model characteristics
model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(X_train, 
                    Y_train,
                    batch_size=batch_size, 
                    nb_epoch=nb_epoch,
                    verbose=1, 
                    validation_data=(X_test, Y_test))

score = model.evaluate(X_test, Y_test, verbose=0)

print('\n')
print('Test score:', score[0])
print('Test accuracy:', score[1])

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz


Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno -2] Name or service not known

## ``keras`` sequential mode

The ``Sequential`` model is a linear stack of layers.

You can create a ``Sequential`` model by passing a list of layer instances to the constructor:

```python
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_dim=784),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])
```

You can also simply add layers via the ``.add()`` method:

```python
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
...
```

### The ``merge`` layer

Multiple Sequential instances can be merged into a single output via a ``Merge`` layer. The output is a layer that can be added as first layer in a new ``Sequential`` model. For instance, here's a model with two separate input branches getting merged:

```python
from keras.layers import Merge

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(10, activation='softmax'))
```

![alt text](images/merge.png)

Such a two-branch model can then be trained via e.g.:

```python
final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
final_model.fit([input_data_1, input_data_2], targets)  # we pass one data array per model input
```

The ``Merge`` layer supports a number of pre-defined modes:

+ ``sum`` (default): element-wise sum
+ ``concat``: tensor concatenation. You can specify the concatenation axis via the argument concat_axis.
+ ``mul``: element-wise multiplication
+ ``ave``: tensor average
+ ``dot``: dot product. You can specify which axes to reduce along via the argument dot_axes.
+ ``cos``: cosine proximity between vectors in 2D tensors.

You can also pass a function as the mode argument, allowing for arbitrary transformations:

```python
merged = Merge([left_branch, right_branch], mode=lambda x: x[0] - x[1])
``` 

### Compilation

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

+ an optimizer. This could be the string identifier of an existing optimizer (such as ``rmsprop`` or ``adagrad``), or an instance of the  ``Optimizer`` class. 
+ a loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as ``categorical_crossentropy`` or ``mse``), or it can be an objective function. 
+ a list of metrics. For any classification problem you will want to set this to ``metrics=['accuracy']``. A metric could be the string identifier of an existing metric or a custom metric function. Custom metric function should return either a single tensor value or a dict ``metric_name`` -> ``metric_value``. 

```python
# for a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# for a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# for a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

```

### Training

``Keras`` models are trained on Numpy arrays of input data and labels. For training a model, you will typically use the  ``fit`` function.

For a single-input model with 2 classes (binary):

In [2]:
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()
model.add(Dense(1, input_dim=784, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
data = np.random.random((1000, 784))
labels = np.random.randint(2, size=(1000, 1))


# train the model, iterating on the data in batches
# of 32 samples
model.fit(data, labels, nb_epoch=10, batch_size=32)

model.summary()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dense_4 (Dense)                  (None, 1)             785         dense_input_2[0][0]              
Total params: 785
____________________________________________________________________________________________________


For a multi-input model with 10 classes:


In [3]:
from keras.layers import Merge

left_branch = Sequential()
left_branch.add(Dense(32, input_dim=784))

right_branch = Sequential()
right_branch.add(Dense(32, input_dim=784))

merged = Merge([left_branch, right_branch], mode='concat')

model = Sequential()
model.add(merged)
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# generate dummy data
import numpy as np
from keras.utils.np_utils import to_categorical
data_1 = np.random.random((1000, 784))
data_2 = np.random.random((1000, 784))

# these are integers between 0 and 9
labels = np.random.randint(10, size=(1000, 1))
# we convert the labels to a binary matrix of size (1000, 10)
# for use with categorical_crossentropy
labels = to_categorical(labels, 10)

# train the model
# note that we are passing a list of Numpy arrays as training data
# since the model has 2 inputs
model.fit([data_1, data_2], labels, nb_epoch=10, batch_size=32)

model.summary()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
dense_5 (Dense)                  (None, 32)            25120                                        
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 32)            25120                                        
____________________________________________________________________________________________________
dense_7 (Dense)                  (None, 10)            650         merge_1[0][0]                    
Total params: 50890
____________________________________________________________________________________________________


## Keras functional API

The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.

The ``Sequential`` model is probably a better choice to implement such a network, but it helps to start with something really simple.

Using ``Model`` class:

+ A layer instance is callable (on a tensor), and it returns a tensor
+ ``Input`` tensor(s) and output tensor(s) can then be used to define a ``Model``
+ Such a model can be trained just like Keras Sequential models.

```python
from keras.layers import Input, Dense
from keras.models import Model

# this returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# this creates a model that includes
# the Input layer and three Dense layers
model = Model(input=inputs, output=predictions)

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
              
model.fit(data, labels)  # starts training
```

All models are callable, just like layers.

With the functional API, it is easy to re-use trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just re-using the architecture of the model, you are also re-using its weights.

```python
x = Input(shape=(784,))
# this works, and returns the 10-way softmax we defined above.
y = model(x)
```

###  Siamese MLP on pairs of digits from the MNIST 

(Source: https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py)

Siamese networks are commonly used in image comparison applications such as face or signature verification. They can also be used in language processing, times series analysis, etc.

In a typical Siamese network a large part of the network is duplicated at the base to allow multiple inputs to go through identical layers. 

This example shows how to teach a neural network to map an image from the MNIST dataset to a 2D point, while trying to minimize the distance between points of the same class and maximize the distance between points of different classes.

Siamese network architecture is a way of learning how to embed samples into lower-dimensions based on similarity computed with features learned by a feature network.

The feature network is the architecture we intend to fine-tune in this setting. 

Let's suppose we want to embed images. Given two images $X_1$ and $X_2$, we feed into the feature network $G_W$ and compute corresponding feature vectors $G_W(X_1)$ and $G_W(X_2)$. The final layer computes pair-wise distance between computed features $E_W = || G_W(X_1) - G_W(X_2) ||_{1}$ and final loss layer $L$ considers whether these two images are from the same class (label $1$) or not (label $0$).

![alt text](images/siamese1.png)

In the original [paper](http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf) it was proposed the **Contrastive Loss Function**: 

$$ L(W,(Y,X_1,X_2)^i) = (1 - Y) \times L_S(E_W(X_1,X_2)^i) + Y \times L_D(E_W(X_1,X_2)^i) $$

where $L_S$ is the partial loss function for a "same-class" pair and $L_D$ is the partial loss function for a "different-class" pair.

$L_S$ and $L_D$ should be designed in such a way that the minimization of $L$ will decrease the distance in the embedding space of "same-class" pairs and increase it in the case of "different-class" pairs:

$$ L_S = \frac{1}{2} E_W^2 $$
$$ L_D = \frac{1}{2} \{ \mbox{max }(0,1-E_W) \}^2 $$

In [None]:
import os
os._exit(00)

In [3]:
'''Train a Siamese MLP on pairs of digits from the MNIST dataset.
It follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the
output of the shared network and by optimizing the contrastive loss (see paper
for mode details).
[1] "Dimensionality Reduction by Learning an Invariant Mapping"
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
Gets to 99.5% test accuracy after 20 epochs.
3 seconds per epoch on a Titan X GPU
'''

from __future__ import absolute_import
from __future__ import print_function
import numpy as np
import tensorflow as tf 

np.random.seed(1337)  # for reproducibility

tf.reset_default_graph()

import random
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, Input, Lambda
from keras.optimizers import RMSprop
from keras import backend as K


def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))


def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)


def contrastive_loss(y_true, y_pred):
    '''Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1
    return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))


def create_pairs(x, digit_indices):
    '''Positive and negative pair creation.
    Alternates between positive and negative pairs.
    '''
    pairs = []
    labels = []
    n = min([len(digit_indices[d]) for d in range(10)]) - 1
    for d in range(10):
        for i in range(n):
            z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
            pairs += [[x[z1], x[z2]]]
            inc = random.randrange(1, 10)
            dn = (d + inc) % 10
            z1, z2 = digit_indices[d][i], digit_indices[dn][i]
            pairs += [[x[z1], x[z2]]]
            labels += [1, 0]
    return np.array(pairs), np.array(labels)


def create_base_network(input_dim):
    '''Base network to be shared (eq. to feature extraction).
    '''
    seq = Sequential()
    seq.add(Dense(128, input_shape=(input_dim,), activation='relu'))
#    seq.add(Dropout(0.1))
    seq.add(Dense(128, activation='relu'))
#    seq.add(Dropout(0.1))
    seq.add(Dense(128, activation='relu'))
    return seq


def compute_accuracy(predictions, labels):
    '''Compute classification accuracy with a fixed threshold on distances.
    '''
    return labels[predictions.ravel() < 0.5].mean()


# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
input_dim = 784
nb_epoch = 10

# create training+test positive and negative pairs
digit_indices = [np.where(y_train == i)[0] for i in range(10)]
tr_pairs, tr_y = create_pairs(X_train, digit_indices)

digit_indices = [np.where(y_test == i)[0] for i in range(10)]
te_pairs, te_y = create_pairs(X_test, digit_indices)

# network definition
base_network = create_base_network(input_dim)

input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))

# because we re-use the same instance `base_network`,
# the weights of the network
# will be shared across the two branches
processed_a = base_network(input_a)
processed_b = base_network(input_b)

distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([processed_a, processed_b])

model = Model(input=[input_a, input_b], output=distance)

# train
rms = RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
          validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y),
          batch_size=128,
          nb_epoch=nb_epoch)

# compute final accuracy on training and test sets
pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])
tr_acc = compute_accuracy(pred, tr_y)
pred = model.predict([te_pairs[:, 0], te_pairs[:, 1]])
te_acc = compute_accuracy(pred, te_y)

print('* Accuracy on training set: %0.2f%%' % (100 * tr_acc))
print('* Accuracy on test set: %0.2f%%' % (100 * te_acc))

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz


Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno -2] Name or service not known

![alt text](images/siameseresult.png)