# Keras and Natural Language Processing

Sebastian Sierra

## Outline
* What is Keras?
  * Installing Keras
* Models in Keras: Sequential vs Graph
* Recurrent layers in keras.
  * LSTM and bidirectional LSTM
  * GRU
* Creating new layers in Keras


## What is Keras?

It's a Deep Learning library for Theano and TensorFlow. Keras is also built upon four guiding principles:
* Modularity.
  *  Neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules.
* Minimalism.
* Easy extensibility.
* Work with Python.

Keras is suited for easy and fast prototyping. It also supports **convolutional neural networks** and **recurrent neural networks** and easy combination between both. Besides it enables multi-input and multi-output training. Keras runs on GPU or CPU.

**Further documentation** can be found on [Keras Docs](http://keras.io/)

### Installing Keras

Keras requirements are:
* numpy, scipy
* pyyaml
* HDF5 and h5py
* In case of using CNNs: cuDNN

In this case we are going to work with **Theano** as backend, so the latest version of **Theano** should be used
```bash
sudo pip install git+git://github.com/Theano/Theano.git
```
Finally pip install the latest version of keras
```bash
sudo pip install keras
```
Then we check if we have the latest version(>0.3)

In [None]:
import pkg_resources
pkg_resources.get_distribution("keras").version

## Models in Keras: Sequential vs Graph

Models are the main structure in Keras. There are two kinds of models: Sequential model and Graph model. Sequential is a sequence of layers, organized in the exact order they where added. Graph models are determined by the connections nodes and the connections between their nodes.

Sequential models can be easily created:
```python
from keras.models import Sequential
model = Sequential()
```
Then we can add each layer, in this short example we are creating a network with a Embedding layer as input layer, then we add a LSTM, a Dropout layer, a Dense layer that is a standard fully connected layer and finally an Activation layer using a sigmoid function.
```python
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM

model.add(Embedding(input_dim, output_dim, input_length=maxlen))
model.add(LSTM(output_dim))
model.add(Dropout(prob))
model.add(Dense(1))
model.add(Activation('sigmoid'))
```

One example of Keras' easy extensibility is that we can define functions :
```python
def tanh(x):
    return theano.tensor.tanh(x)

model.add(Dense(64, activation=tanh))
model.add(Activation(tanh))
```

On the other side we have Graph models, that can be created so:
```python
from keras.models import Sequential
model = Graph()
```
In this case we are defining a bidirectional LSTM for a classification problem. Note that in this case we have to define first the input. *maxlen* stands for the input size that our network will have. The details of the construction of a bidirectional LSTM will be further discussed. At the end of the specification of the network we can see that it is really similar to the specification of the previous network.
```python
model.add_input(name='input', input_shape=(maxlen,), dtype=int)
model.add_node(Embedding(input_dim, output_dim, input_length=maxlen),
               name='embedding', input='input')
model.add_node(LSTM(output_dim), name='forward', input='embedding')
model.add_node(LSTM(output_dim, go_backwards=True), name='backward', input='embedding')
model.add_node(Dropout(prob), name='dropout', inputs=['forward', 'backward'])
model.add_node(Dense(1, activation='sigmoid'), name='sigmoid', input='dropout')
model.add_output(name='output', input='sigmoid')
```

## Recurrent Layers in Keras

Recurrent Layers are implemented in Keras. It supports LSTM, GRU and SimpleRNN recurrent layers. Each of one can be called easiy using this:
```python
from keras.layers.recurrent import LSTM, GRU, SimpleRNN
```
Its input is a 3D tensor with shape **(nb_samples, timesteps, input_dim)**. The output will be 3D tensor with shape  **(nb_samples, timesteps, output_dim)**.

Keras by default resets the memory of the recurrent network. In some cases we would like to enable statefulness, so the input of the following iteration is fed with the previous state of the network. This can be done specifying `stateful=True` in the layer constructor.

We are going to see how a RNN can be used in text classification task and compare the performance of three basic structures: LSTM, GRU and Bidirectional LSTM. Although we have to set our data ready to use in Keras. Keras has a module with some standard datasets, in our case we will work with the sentiment analysis task of the IMDB reviews dataset.

In [1]:
from __future__ import absolute_import
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility

from keras.preprocessing import sequence
from keras.optimizers import SGD, RMSprop, Adagrad
from keras.utils import np_utils
from keras.utils.np_utils import accuracy
from keras.models import Sequential, Graph
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM, GRU
from keras.datasets import imdb

Using Theano backend.


Using gpu device 0: Graphics Device (CNMeM is disabled)


We have to define the number of top most frequent words to consider of our Embedding layer, this number will be *max_features*, then we define the maximum length of the input sequence. 

In [2]:
max_features = 20000
maxlen = 100 # cut texts after this number of words (among top max_features most common words)
batch_size = 32

Then we easily load the IMDB data, defining the percentage for test.

In [3]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features, test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

20000 train sequences
5000 test sequences


Then the sequences will be padded(where the length is less than 100):

In [4]:
print("Pad sequences (samples x time)")
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

Pad sequences (samples x time)
X_train shape: (20000, 100)
X_test shape: (5000, 100)


Then we build the model as we have previously done.

In [5]:
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(128))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

With compile function we can set the objective function, the optimizer and the class evaluation mode. The following objectives are available:
* mean_squared_error / mse
* root_mean_squared_error / rmse
* mean_absolute_error / mae
* mean_absolute_percentage_error / mape
* mean_squared_logarithmic_error / msle
* squared_hinge
* hinge
* binary_crossentropy: Also known as logloss.
* categorical_crossentropy: multiclass logloss

On the side of the optimizers, Keras provide us these:
* SGD
* RMSprop
* Adagrad
* Adadelta
* Adam

In [6]:
model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")

Finally we can use *fit* function(In a sci-kit learn fashion) to train the model. *evaluate* will show the performance of the model on the test set.

In [7]:
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=4, validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Test score: 0.507146091509
Test accuracy: 0.8168


We can easily use a GRU instead of a LSTM. Most of the code will be similar to the previous one.

In [8]:
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(GRU(128))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")

print("----------------------")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=4, validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)

----------------------
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Test score: 0.503725951719
Test accuracy: 0.8306


In the following case we will use a little more complicated structure. A bidirectional LSTM, it will be built using a Graph model. The main key is to declare two LSTM, one of them have to be enabled to go backward. Unfortunately documentation about this functionality is not clear.

In [10]:
model = Graph()
model.add_input(name='input', input_shape=(maxlen,), dtype=int)
model.add_node(Embedding(max_features, 128, input_length=maxlen),
               name='embedding', input='input')
model.add_node(LSTM(64), name='forward', input='embedding')
model.add_node(LSTM(64, go_backwards=True), name='backward', input='embedding')
model.add_node(Dropout(0.5), name='dropout', inputs=['forward', 'backward'])
model.add_node(Dense(1, activation='sigmoid'), name='sigmoid', input='dropout')
model.add_output(name='output', input='sigmoid')

This time instead of using *evaluate* function, we will evaluate it manually

In [11]:
model.compile('adam', {'output': 'binary_crossentropy'})

print('--------------------')
model.fit({'input': X_train, 'output': y_train}, batch_size=batch_size, nb_epoch=4)
acc = accuracy(y_test, np.round(np.array(model.predict({'input': X_test},
                                               batch_size=batch_size)['output'])))
print('Test accuracy:', acc)

--------------------
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Test accuracy: 0.8354


## Creating new layers in Keras

TODO

## Using Keras and gensim to solve Semantic Similarity task

TODO

## About us
<img src="https://sites.google.com/a/unal.edu.co/mindlab/_/rsrc/1353286903227/config/customLogo.gif?revision=10" />