# Sequence learning with Keras

In recent years, there has been a considerable increase in the attention for sequence modelling in Deep Learning. In this chapter, we will delve into the exciting topic of recurrent in neural networks. We will build upon the POS-Chunk data which we loaded in the previous chapter. Importantly, we will demonstrate that a keras model (as any Theano or TensorFlow graph) can have multiple inputs and outputs. We will show, for instance, that it is perfectly possible to to train a model that **simultaneously** learns to pos tag and chunk. As always, we first set up our booth:

In [1]:
from __future__ import print_function

import codecs
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

import numpy as np

Couldn't import dot_parser, loading of dot files will not be possible.


Using Theano backend.


We load the CONLL train data again, but this time, we also load the chunk labels:

In [5]:
def load_data(path):
    data = []
    for line in codecs.open(path, 'r', 'utf8'):
        line = line.strip()
        if line:
            try:
                token, pos, chunk = line.strip().split()
                data.append((token, pos, chunk))
            except:
                pass
    return data
        
train_data = load_data('data/seq/train.txt')

print(len(train_data))
for i in train_data[:10]:
    print(' - '.join(i))
    
train_tokens, train_pos, train_chunk = zip(*train_data)

211727
Confidence - NN - B-NP
in - IN - B-PP
the - DT - B-NP
pound - NN - I-NP
is - VBZ - B-VP
widely - RB - I-VP
expected - VBN - I-VP
to - TO - I-VP
take - VB - I-VP
another - DT - B-NP


Let us start with the POS labels, which we encode as before:

In [6]:
tag_encoder = LabelEncoder()
tag_encoder.fit(train_pos)
print('Total nb POS tags:', len(tag_encoder.classes_))

y_train_pos = tag_encoder.transform(train_pos)

Y_train_pos = np_utils.to_categorical(y_train_pos,
                                  nb_classes=len(tag_encoder.classes_))

Total nb POS tags: 44


Likewise, we vectorize our training instances as before:

In [9]:
from collections import Counter
vocab = Counter(train_tokens)
indexer = {'unk': 0}

for k, v in vocab.most_common():
    indexer[k] = len(indexer)

nb_left, nb_right = 2, 1

def vectorize(tokens):
    sequences = []
    for curr_idx, token in enumerate(tokens):
        left_context = tokens[(curr_idx - 2) : curr_idx]
        while len(left_context) < nb_left:
            left_context = ['<unk>'] + left_context

        right_context = tokens[curr_idx + 1 : curr_idx + 2]
        while len(right_context) < nb_right:
            right_context += ['<unk>']

        seq = left_context + [token] + right_context

        ints = [indexer[t] if t in indexer else 0 for t in seq]

        sequences.append(ints)
    
    return np.array(sequences, dtype='int8')

X_train = vectorize(list(train_tokens))

Let's start building our model:

In [11]:
from keras.models import Sequential
from keras.layers import Embedding, Dense, Activation

from keras.layers.recurrent import LSTM

model = Sequential()
model.add(Embedding(input_dim=len(indexer), output_dim=150,
                    input_length=nb_left + 1 + nb_right))
model.add(LSTM(100, return_sequences=False, activation='tanh'))
model.add(Dense(len(tag_encoder.classes_)))
model.add(Activation('softmax'))

Interestingly, if you compare the following model to the one we had before, you see that only a single line has changed: instead of collapsing our 5 embedding vectors into a single, flat vector, we now have a **recurrent layer** loop over the embeddings and produce a single vector representation of the sequence at the end (hence `return_sequences=False`). The recurent layer we use is a [Long-Short Term Memory](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)) layer: such a layer will loop through our embedding vectors and model them *as a sequence* from left to tight. While the exact working of an LSTM is well out of the scope of this tutorial, the main advantage is that such a layer can remember information from previous timesteps for a pretty long time and have this information affect the way it processes the vectors it sees along the way. Let is test the model:

In [None]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, Y_train_pos, batch_size=10, nb_epoch=10)

Epoch 1/10

## The functional API

So far we have been working with keras's 'vanilla' model, i.e. the simply Sequential model where we simple stack a series of layers on top of each other. There are many cases, however, where we would to like to have a more flexible way of construing a layer graph. Below, we will first work with a simple, yet highly relevant example: we will combina a left-to-right LSTM, with a right-to-left LSTM. To this end, we will make use of keras's extremely powerful `Model`, which is part of its so-called 'functional' API. When working with the `Model`, we first need to tell keras what our input data will look like. In our case

In [325]:
from keras.layers import Input
context_input = Input(shape=(nb_left + 1 + nb_right,), dtype='int32')

In [326]:
embedding = Embedding(input_dim=len(indexer), output_dim=150)(context_input)

In [327]:
left_to_right = LSTM(100, return_sequences=False, activation='tanh')(embedding)
right_to_left = LSTM(100, return_sequences=False, activation='tanh', go_backwards=True)(embedding)

In [328]:
from keras.layers import merge
merged = merge([left_to_right, right_to_left], mode='sum')

In [329]:
output = Dense(len(tag_encoder.classes_), activation='softmax')(merged)

In [330]:
from keras.models import Model
model = Model(input=context_input, output=output)

In [331]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
input_29 (InputLayer)              (None, 4)           0                                            
____________________________________________________________________________________________________
embedding_54 (Embedding)           (None, 4, 150)      2868450     input_29[0][0]                   
____________________________________________________________________________________________________
lstm_59 (LSTM)                     (None, 100)         100400      embedding_54[0][0]               
____________________________________________________________________________________________________
lstm_60 (LSTM)                     (None, 100)         100400      embedding_54[0][0]               
___________________________________________________________________________________________

In [332]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [333]:
model.fit(X_train, Y_train_pos, batch_size=10, nb_epoch=10,
          shuffle=True, validation_data=(X_test, Y_test_pos))

Train on 211727 samples, validate on 47377 samples
Epoch 1/10
  4310/211727 [..............................] - ETA: 564s - loss: 2.6950 - acc: 0.2631

KeyboardInterrupt: 

## Multiple inputs and outputs

### Multiple inputs

In [336]:
X_train_focus = [indexer[focus] if focus in indexer else 0 for focus in train_tokens]
print(len(X_train_focus))

211727


In [338]:
X_train_focus = np.array(X_train_focus, dtype='int32')
print(X_train_focus.shape)

(211727,)


In [339]:
context_input = Input(shape=(4,), dtype='int32', name='context')
focus_input = Input(shape=(1,), dtype='int32', name='focus')

In [340]:
context_embedding = Embedding(input_dim=len(indexer), output_dim=150)(context_input)
focus_embedding = Embedding(input_dim=len(indexer), output_dim=150)(focus_input)

In [341]:
left_to_right = LSTM(100, return_sequences=False, activation='tanh')(context_embedding)
right_to_left = LSTM(100, return_sequences=False, activation='tanh', go_backwards=True)(context_embedding)
merged1 = merge([left_to_right, right_to_left], mode='sum')

In [342]:
flat_context = Flatten()(focus_embedding)

In [343]:
merged2 = merge([merged1, flat_context], mode='concat')

In [344]:
pos_output = Dense(len(tag_encoder.classes_), activation='softmax', name='pos')(merged2)

In [345]:
from keras.models import Model
model = Model(input=[context_input, focus_input], output=pos_output)

In [346]:
print(X_train.shape)
print(X_train_focus.shape)
print(Y_train_pos.shape)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit({'context':X_train,
           'focus': X_train_focus},
          {'pos': Y_train_pos},
          batch_size=10, nb_epoch=10, shuffle=True)

(211727, 4)
(211727,)
(211727, 44)
Epoch 1/10
  3310/211727 [..............................] - ETA: 1042s - loss: 2.4935 - acc: 0.3906

KeyboardInterrupt: 

### Multiple outputs

In [347]:
chunk_encoder = LabelEncoder()
chunk_encoder.fit(train_chunk)
print('Total nb chunk labels:', len(chunk_encoder.classes_))

y_train_chunk = chunk_encoder.transform(train_chunk)

Y_train_chunk = np_utils.to_categorical(y_train_chunk,
                                  nb_classes=len(chunk_encoder.classes_))

Total nb chunk labels: 22


In [348]:
context_input = Input(shape=(4,), dtype='int32', name='context')
focus_input = Input(shape=(1,), dtype='int32', name='focus')

context_embedding = Embedding(input_dim=len(indexer), output_dim=150)(context_input)
left_to_right = LSTM(100, return_sequences=False, activation='tanh')(context_embedding)
right_to_left = LSTM(100, return_sequences=False, activation='tanh', go_backwards=True)(context_embedding)
merged1 = merge([left_to_right, right_to_left], mode='sum')

focus_embedding = Embedding(input_dim=len(indexer), output_dim=150)(focus_input)
flat_context = Flatten()(focus_embedding)

merged2 = merge([merged1, flat_context], mode='concat')

pos_output = Dense(len(tag_encoder.classes_), activation='softmax', name='pos')(merged2)
chunk_output = Dense(len(chunk_encoder.classes_), activation='softmax', name='chunk')(merged2)

model = Model(input=[context_input, focus_input], output=[pos_output, chunk_output])

In [349]:
print(X_train.shape)
print(X_train_focus.shape)
print(Y_train_pos.shape)
print(Y_train_chunk.shape)


model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit({'context': X_train,
           'focus': X_train_focus},
          {'pos': Y_train_pos,
           'chunk': Y_train_chunk},
          batch_size=100, nb_epoch=10, shuffle=True)

(211727, 4)
(211727,)
(211727, 44)
(211727, 22)
Epoch 1/10
  4290/211727 [..............................] - ETA: 948s - loss: 3.8640 - pos_loss: 2.4032 - chunk_loss: 1.4609 - pos_acc: 0.3786 - chunk_acc: 0.5343

KeyboardInterrupt: 

## Sequence to sequence learning