# Word-Based Neural Language Models in Python with Keras

## Introduction

Following tutorial available at https://machinelearningmastery.com/develop-word-based-neural-language-models-python-keras/

When training and testing an LM, there are several possible approaches or "framing". For example: we provide one token as input and the model learns to predict one token as output; or we provide a sequence of n tokens and input and predict a single token as output; or we input n tokens and we predict m tokens.

We start from a simple toy text and we explore the differences between 3 possible framings.

In [7]:
data = """Jack and Jill went up the hill\n
To fetch a pail of water\n
Jack fell down and broke his crown\n
And Jill came tumbling after\n"""

print(data)

Jack and Jill went up the hill

To fetch a pail of water

Jack fell down and broke his crown

And Jill came tumbling after



We start with the simplest model, namely:

## One-word-in, one-word-out (2-grams?)

As mentioned above, in this model we provide one word as input and the model learns to predict the following word.

First of all we need a way to represent our texts in a way which can be the input of a neural net. We do so Keras tokenizer, which converts each token to an integer, and therefore sequences of tokens are converted to sequences of integers

In [10]:
# packages
import keras
from keras.preprocessing.text import Tokenizer

In [22]:
# integer encode text
tokenizer = Tokenizer() # initialize tokenizer
tokenizer.fit_on_texts([data]) # fit on our text

print(tokenizer.word_index) # display token encoding

{'to': 8, 'the': 6, 'came': 19, 'pail': 11, 'down': 15, 'a': 10, 'jack': 2, 'jill': 3, 'and': 1, 'crown': 18, 'after': 21, 'his': 17, 'tumbling': 20, 'fell': 14, 'water': 13, 'fetch': 9, 'of': 12, 'up': 5, 'broke': 16, 'went': 4, 'hill': 7}


In [23]:
# apply to data in order to get sequence of integer, then flatten to extract a list
encoded = tokenizer.texts_to_sequences([data])[0]

print(encoded) # display full text encoding

[2, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 2, 14, 15, 1, 16, 17, 18, 1, 3, 19, 20, 21]


Next, it is useful to know the vocabulary size. For example we'll need it when we define the size of the embedding layer in the neural net. To get it, we apply len to the result of word_index of tokenizer object:

In [19]:
# determine the vocabulary size
vocab_size = len(tokenizer.word_index) + 1 # we add one because we'll define an array based on this number
print('Vocabulary Size: %d' % vocab_size)

Vocabulary Size: 22


Next, we need to create token sequences to be used as input-output (X-y) pairs during training:

In [24]:
# create word -> word sequences
sequences = list() # initialize list
for i in range(1, len(encoded)): # for each index i...
    sequence = encoded[i-1:i+1] # we put together the (i-1)th word and the (i+1)th word
    sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))

Total Sequences: 24


In [26]:
print(sequences[0]) # for example

[2, 1]


Next, we split X from y or input from output in each pair:

In [28]:
# numpy
import numpy as np

In [29]:
# split into X and y elements
sequences = np.array(sequences)
X, y = sequences[:,0], sequences[:,1]

Next, we convert y to categorical, i.e. one-hot encoding, because our model will have to predict, for each word, a probability distribution over words indicating which is the most likely output. One-hot encoding is degenerate distribution (e.g. 0,0,0,1,...,0,0,0,...) and will provide ground truth for the model to aim for and from which we can calculate error and update the model.

We do this with Keras to_categorical:

In [37]:
# import module
from keras.utils import to_categorical

In [33]:
# one hot encode outputs
y = to_categorical(y, num_classes=vocab_size)

Finally, let's define the model:

In [38]:
# import modules
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense

In [39]:
# define model
model = Sequential() # initialize sequential model
model.add(Embedding(vocab_size, 10, input_length=1)) # embedding layer, vectors of length 10, input=1 i.e. one token
model.add(LSTM(50)) # 50 units of lstm
model.add(Dense(vocab_size, activation='softmax')) # dense layer+softmax for outputting probability distribution
print(model.summary())

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 1, 10)             220       
_________________________________________________________________
lstm_1 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_1 (Dense)              (None, 22)                1122      
Total params: 13,542
Trainable params: 13,542
Non-trainable params: 0
_________________________________________________________________
None


Next, we can compile and train the model, using categorical crossentropy as loss function, because technically we are facing a multi-class classification task (predict a word out of a vocabulary of a certain length):

In [40]:
# compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train
model.fit(X, y, epochs=500, verbose=2) # 500 epochs is certainly more than needed

Instructions for updating:
Use tf.cast instead.
Epoch 1/500
 - 1s - loss: 3.0918 - acc: 0.0000e+00
Epoch 2/500
 - 0s - loss: 3.0912 - acc: 0.0000e+00
Epoch 3/500
 - 0s - loss: 3.0905 - acc: 0.0000e+00
Epoch 4/500
 - 0s - loss: 3.0898 - acc: 0.0833
Epoch 5/500
 - 0s - loss: 3.0890 - acc: 0.0833
Epoch 6/500
 - 0s - loss: 3.0882 - acc: 0.0833
Epoch 7/500
 - 0s - loss: 3.0874 - acc: 0.1250
Epoch 8/500
 - 0s - loss: 3.0866 - acc: 0.1250
Epoch 9/500
 - 0s - loss: 3.0858 - acc: 0.1250
Epoch 10/500
 - 0s - loss: 3.0850 - acc: 0.1250
Epoch 11/500
 - 0s - loss: 3.0841 - acc: 0.1250
Epoch 12/500
 - 0s - loss: 3.0833 - acc: 0.1250
Epoch 13/500
 - 0s - loss: 3.0824 - acc: 0.1250
Epoch 14/500
 - 0s - loss: 3.0816 - acc: 0.1250
Epoch 15/500
 - 0s - loss: 3.0807 - acc: 0.1250
Epoch 16/500
 - 0s - loss: 3.0798 - acc: 0.1250
Epoch 17/500
 - 0s - loss: 3.0789 - acc: 0.1250
Epoch 18/500
 - 0s - loss: 3.0779 - acc: 0.1250
Epoch 19/500
 - 0s - loss: 3.0770 - acc: 0.1250
Epoch 20/500
 - 0s - loss: 3.0760 - a

Epoch 165/500
 - 0s - loss: 2.0680 - acc: 0.6667
Epoch 166/500
 - 0s - loss: 2.0557 - acc: 0.6667
Epoch 167/500
 - 0s - loss: 2.0433 - acc: 0.6667
Epoch 168/500
 - 0s - loss: 2.0310 - acc: 0.7083
Epoch 169/500
 - 0s - loss: 2.0186 - acc: 0.7083
Epoch 170/500
 - 0s - loss: 2.0062 - acc: 0.7083
Epoch 171/500
 - 0s - loss: 1.9938 - acc: 0.7083
Epoch 172/500
 - 0s - loss: 1.9813 - acc: 0.7083
Epoch 173/500
 - 0s - loss: 1.9688 - acc: 0.7083
Epoch 174/500
 - 0s - loss: 1.9563 - acc: 0.7083
Epoch 175/500
 - 0s - loss: 1.9438 - acc: 0.7083
Epoch 176/500
 - 0s - loss: 1.9313 - acc: 0.7083
Epoch 177/500
 - 0s - loss: 1.9187 - acc: 0.7083
Epoch 178/500
 - 0s - loss: 1.9062 - acc: 0.7083
Epoch 179/500
 - 0s - loss: 1.8936 - acc: 0.7083
Epoch 180/500
 - 0s - loss: 1.8810 - acc: 0.7083
Epoch 181/500
 - 0s - loss: 1.8684 - acc: 0.7083
Epoch 182/500
 - 0s - loss: 1.8558 - acc: 0.7083
Epoch 183/500
 - 0s - loss: 1.8432 - acc: 0.7083
Epoch 184/500
 - 0s - loss: 1.8306 - acc: 0.7083
Epoch 185/500
 - 0s 

Epoch 333/500
 - 0s - loss: 0.4963 - acc: 0.8750
Epoch 334/500
 - 0s - loss: 0.4926 - acc: 0.8750
Epoch 335/500
 - 0s - loss: 0.4889 - acc: 0.8750
Epoch 336/500
 - 0s - loss: 0.4853 - acc: 0.8750
Epoch 337/500
 - 0s - loss: 0.4817 - acc: 0.8750
Epoch 338/500
 - 0s - loss: 0.4781 - acc: 0.8750
Epoch 339/500
 - 0s - loss: 0.4746 - acc: 0.8750
Epoch 340/500
 - 0s - loss: 0.4712 - acc: 0.8750
Epoch 341/500
 - 0s - loss: 0.4678 - acc: 0.8750
Epoch 342/500
 - 0s - loss: 0.4644 - acc: 0.8750
Epoch 343/500
 - 0s - loss: 0.4611 - acc: 0.8750
Epoch 344/500
 - 0s - loss: 0.4579 - acc: 0.8750
Epoch 345/500
 - 0s - loss: 0.4547 - acc: 0.8750
Epoch 346/500
 - 0s - loss: 0.4515 - acc: 0.8750
Epoch 347/500
 - 0s - loss: 0.4483 - acc: 0.8750
Epoch 348/500
 - 0s - loss: 0.4453 - acc: 0.8750
Epoch 349/500
 - 0s - loss: 0.4422 - acc: 0.8750
Epoch 350/500
 - 0s - loss: 0.4392 - acc: 0.8750
Epoch 351/500
 - 0s - loss: 0.4362 - acc: 0.8750
Epoch 352/500
 - 0s - loss: 0.4333 - acc: 0.8750
Epoch 353/500
 - 0s 

 - 0s - loss: 0.2437 - acc: 0.8750


<keras.callbacks.History at 0x7f7bc8179dd8>

We can test the trained model by passing it (the encoded version of) a word and inspect the predicted word:

In [43]:
# evaluate
in_text = 'Jack' # for example
print("Input word: {}".format(in_text))

encoded = tokenizer.texts_to_sequences([in_text])[0] # encode input
encoded = np.array(encoded) # as numpy array

prediction = model.predict_classes(encoded, verbose=0) # inference, will output an index

for word, index in tokenizer.word_index.items(): # print the word corresponding to the predicted index
    if index == prediction:
        print("Output word: {}".format(word))

Input word: Jack
Output word: and


We can define a funcion to make this code re-usable and apply it to every word in our vocabulary:

In [44]:
def predict_next(input_word):
    print("Input word: {}".format(input_word))

    encoded = tokenizer.texts_to_sequences([input_word])[0] # encode input
    encoded = np.array(encoded) # as numpy array

    prediction = model.predict_classes(encoded, verbose=0) # inference, will output an index

    for word, index in tokenizer.word_index.items(): # print the word corresponding to the predicted index
        if index == prediction:
            print("Output word: {}".format(word))

In [45]:
predict_next("went")

Input word: went
Output word: up


In [46]:
predict_next("fell")

Input word: fell
Output word: down


In [47]:
predict_next("Jill")

Input word: Jill
Output word: went


Even better, we can write a function to generate a sequence given a seed word and the trained model:

In [51]:
# generate a sequence from the model
def generate_seq(model, tokenizer, seed_text, n_words):
    input_word, result = seed_text, seed_text
    # generate a fixed number of words
    for _ in range(n_words):
        # encode the text as integer
        encoded = tokenizer.texts_to_sequences([input_word])[0]
        encoded = np.array(encoded)
        # predict a word in the vocabulary
        predicted = model.predict_classes(encoded, verbose=0)
        # map predicted word index to word
        out_word = ''
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                out_word = word
                break
        # append to input
        input_word, result = out_word, result + ' ' + out_word
    return result

In [52]:
generate_seq(model, tokenizer, "Jack", 5)

'Jack and jill went up the'

In [53]:
generate_seq(model, tokenizer, "Jack", 10)

'Jack and jill went up the hill to fetch a pail'

In [54]:
generate_seq(model, tokenizer, "fell", 10)

'fell down and jill went up the hill to fetch a'

Cool.

Next, we can try with another framing.

## Line-by-line sequences

With this approach, each original sentence is processed so that the input X is a set of sequences of tokens, progressively longer untill the sentece is complete; for example: 

1) X = Jack

2) X = Jack, and

3) X = Jack, and, Jill

4) ...

In [56]:
# create line-based sequences
sequences = list() # initialize empty list
for line in data.split('\n'):
    encoded = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(encoded)):
        sequence = encoded[:i+1]
        sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))

Total Sequences: 21


In [57]:
print(sequences)

[[2, 1], [2, 1, 3], [2, 1, 3, 4], [2, 1, 3, 4, 5], [2, 1, 3, 4, 5, 6], [2, 1, 3, 4, 5, 6, 7], [8, 9], [8, 9, 10], [8, 9, 10, 11], [8, 9, 10, 11, 12], [8, 9, 10, 11, 12, 13], [2, 14], [2, 14, 15], [2, 14, 15, 1], [2, 14, 15, 1, 16], [2, 14, 15, 1, 16, 17], [2, 14, 15, 1, 16, 17, 18], [1, 3], [1, 3, 19], [1, 3, 19, 20], [1, 3, 19, 20, 21]]


In order to be able to feed these sequences to the first, embedding, layer of the model, we need padding, i.e. make it so every sequence has the same length by "filling" empty spots with a special placeholder token (in this case, we use the length of the longest sequence as fixed length). 

To do so, we use pad_sequence from Keras:

In [59]:
# module
from keras.preprocessing.sequence import pad_sequences

In [60]:
# pad input sequences
max_length = max([len(seq) for seq in sequences]) # fix max length
sequences = pad_sequences(sequences, maxlen=max_length, padding='pre') # padding
print('Max Sequence Length: %d' % max_length)

Max Sequence Length: 7


In [61]:
print(sequences[0]) # for example:

[0 0 0 0 0 2 1]


Next, much like before, let's split input and output (X and y), then convert output to categorical:

In [62]:
# split into input and output elements
sequences = np.array(sequences) # as numpy array
X, y = sequences[:,:-1], sequences[:,-1] # split
y = to_categorical(y, num_classes=vocab_size) # output to categorical

Next, we define the model. The only difference from previous version is in the size of first layer:

In [64]:
# define model
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1)) # it's not a single word anymore, but full seq
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())

# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit network
model.fit(X, y, epochs=500, verbose=2)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 6, 10)             220       
_________________________________________________________________
lstm_2 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_2 (Dense)              (None, 22)                1122      
Total params: 13,542
Trainable params: 13,542
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 2s - loss: 3.0909 - acc: 0.0476
Epoch 2/500
 - 0s - loss: 3.0898 - acc: 0.0952
Epoch 3/500
 - 0s - loss: 3.0886 - acc: 0.1429
Epoch 4/500
 - 0s - loss: 3.0873 - acc: 0.1429
Epoch 5/500
 - 0s - loss: 3.0858 - acc: 0.1429
Epoch 6/500
 - 0s - loss: 3.0844 - acc: 0.1429
Epoch 7/500
 - 0s - loss: 3.0830 - acc: 0.1429
Epoch 8/500
 - 0s - loss: 3.0815 - acc: 0.1429
Epoch 9/500
 - 0s - l

Epoch 156/500
 - 0s - loss: 0.7239 - acc: 0.8571
Epoch 157/500
 - 0s - loss: 0.7181 - acc: 0.8571
Epoch 158/500
 - 0s - loss: 0.7126 - acc: 0.8571
Epoch 159/500
 - 0s - loss: 0.7073 - acc: 0.8571
Epoch 160/500
 - 0s - loss: 0.7021 - acc: 0.8571
Epoch 161/500
 - 0s - loss: 0.6967 - acc: 0.8571
Epoch 162/500
 - 0s - loss: 0.6914 - acc: 0.8571
Epoch 163/500
 - 0s - loss: 0.6864 - acc: 0.8571
Epoch 164/500
 - 0s - loss: 0.6815 - acc: 0.8571
Epoch 165/500
 - 0s - loss: 0.6766 - acc: 0.8571
Epoch 166/500
 - 0s - loss: 0.6717 - acc: 0.8571
Epoch 167/500
 - 0s - loss: 0.6669 - acc: 0.8571
Epoch 168/500
 - 0s - loss: 0.6621 - acc: 0.8571
Epoch 169/500
 - 0s - loss: 0.6575 - acc: 0.8571
Epoch 170/500
 - 0s - loss: 0.6530 - acc: 0.8571
Epoch 171/500
 - 0s - loss: 0.6487 - acc: 0.8571
Epoch 172/500
 - 0s - loss: 0.6446 - acc: 0.8571
Epoch 173/500
 - 0s - loss: 0.6405 - acc: 0.8571
Epoch 174/500
 - 0s - loss: 0.6360 - acc: 0.8571
Epoch 175/500
 - 0s - loss: 0.6314 - acc: 0.8571
Epoch 176/500
 - 0s 

 - 0s - loss: 0.2663 - acc: 0.9524
Epoch 324/500
 - 0s - loss: 0.2645 - acc: 0.9524
Epoch 325/500
 - 0s - loss: 0.2629 - acc: 0.9524
Epoch 326/500
 - 0s - loss: 0.2613 - acc: 0.9524
Epoch 327/500
 - 0s - loss: 0.2596 - acc: 0.9524
Epoch 328/500
 - 0s - loss: 0.2578 - acc: 0.9524
Epoch 329/500
 - 0s - loss: 0.2562 - acc: 0.9524
Epoch 330/500
 - 0s - loss: 0.2546 - acc: 0.9524
Epoch 331/500
 - 0s - loss: 0.2530 - acc: 0.9524
Epoch 332/500
 - 0s - loss: 0.2513 - acc: 0.9524
Epoch 333/500
 - 0s - loss: 0.2497 - acc: 0.9524
Epoch 334/500
 - 0s - loss: 0.2482 - acc: 0.9524
Epoch 335/500
 - 0s - loss: 0.2466 - acc: 0.9524
Epoch 336/500
 - 0s - loss: 0.2449 - acc: 0.9524
Epoch 337/500
 - 0s - loss: 0.2435 - acc: 0.9524
Epoch 338/500
 - 0s - loss: 0.2420 - acc: 0.9524
Epoch 339/500
 - 0s - loss: 0.2404 - acc: 0.9524
Epoch 340/500
 - 0s - loss: 0.2388 - acc: 0.9524
Epoch 341/500
 - 0s - loss: 0.2374 - acc: 0.9524
Epoch 342/500
 - 0s - loss: 0.2359 - acc: 0.9524
Epoch 343/500
 - 0s - loss: 0.2344

Epoch 491/500
 - 0s - loss: 0.1200 - acc: 0.9524
Epoch 492/500
 - 0s - loss: 0.1196 - acc: 0.9524
Epoch 493/500
 - 0s - loss: 0.1193 - acc: 0.9524
Epoch 494/500
 - 0s - loss: 0.1190 - acc: 0.9524
Epoch 495/500
 - 0s - loss: 0.1186 - acc: 0.9524
Epoch 496/500
 - 0s - loss: 0.1183 - acc: 0.9524
Epoch 497/500
 - 0s - loss: 0.1180 - acc: 0.9524
Epoch 498/500
 - 0s - loss: 0.1177 - acc: 0.9524
Epoch 499/500
 - 0s - loss: 0.1173 - acc: 0.9524
Epoch 500/500
 - 0s - loss: 0.1170 - acc: 0.9524


<keras.callbacks.History at 0x7f7bc44f3da0>

As before, we can test our model by defining a generate_seq function:

In [69]:
# generate a sequence from the model
def generate_seq(model, tokenizer, max_length, seed_text, n_words):
    input_text = seed_text
    # generate a fixed number of words
    for _ in range(n_words):
        # encode the text as integer
        encoded = tokenizer.texts_to_sequences([input_text])[0]
        # pre-pad sequences to a fixed length
        encoded = pad_sequences([encoded], maxlen=max_length, padding='pre')
        # predict probabilities for each word
        predicted = model.predict_classes(encoded, verbose=0)
        # map predicted word index to word
        out_word = ''
        for word, index in tokenizer.word_index.items(): # convert encoded output into actual word
            if index == predicted:
                out_word = word
                break
        # append to input
        input_text += ' ' + out_word
    return input_text

In [70]:
generate_seq(model, tokenizer, 6, "Jack", 2)

'Jack fell down'

In [71]:
generate_seq(model, tokenizer, 6, "Jack", 5)

'Jack fell down and broke his'

In [72]:
generate_seq(model, tokenizer, 6, "Jack and", 5)

'Jack and jill went up the hill'

In [73]:
generate_seq(model, tokenizer, 6, "and Jill", 5)

'and Jill came tumbling after water water'

In [74]:
generate_seq(model, tokenizer, 6, "water", 5)

'water fell and jill went the'

Cool. 

Finally, one more approach.

## Two-words-in, one-word-out (3-grams?)

The title is pretty self explanatory: the input will be sequences of two tokens, the output one token. The biggest difference with previous approaches resides in the definition of sequences:

In [77]:
# tokenize as before
encoded = tokenizer.texts_to_sequences([data])[0]

# encode 2 words -> 1 word
sequences = list()
for i in range(2, len(encoded)):
    sequence = encoded[i-2:i+1]
    sequences.append(sequence)

In [78]:
print(sequences)

[[2, 1, 3], [1, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 9], [8, 9, 10], [9, 10, 11], [10, 11, 12], [11, 12, 13], [12, 13, 2], [13, 2, 14], [2, 14, 15], [14, 15, 1], [15, 1, 16], [1, 16, 17], [16, 17, 18], [17, 18, 1], [18, 1, 3], [1, 3, 19], [3, 19, 20], [19, 20, 21]]


In [83]:
# split into input and output, then output as categorical
sequences = np.array(sequences)
X, y = sequences[:,:-1], sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)

# define model
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=2))
model.add(LSTM(50))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())

# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# fit network
model.fit(X, y, epochs=500, verbose=2)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 2, 10)             220       
_________________________________________________________________
lstm_6 (LSTM)                (None, 50)                12200     
_________________________________________________________________
dense_6 (Dense)              (None, 22)                1122      
Total params: 13,542
Trainable params: 13,542
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 2s - loss: 3.0913 - acc: 0.0000e+00
Epoch 2/500
 - 0s - loss: 3.0906 - acc: 0.0000e+00
Epoch 3/500
 - 0s - loss: 3.0898 - acc: 0.0870
Epoch 4/500
 - 0s - loss: 3.0890 - acc: 0.1304
Epoch 5/500
 - 0s - loss: 3.0881 - acc: 0.1739
Epoch 6/500
 - 0s - loss: 3.0872 - acc: 0.1739
Epoch 7/500
 - 0s - loss: 3.0863 - acc: 0.1739
Epoch 8/500
 - 0s - loss: 3.0854 - acc: 0.1739
Epoch 9/500
 

Epoch 156/500
 - 0s - loss: 1.3350 - acc: 0.9130
Epoch 157/500
 - 0s - loss: 1.3160 - acc: 0.9130
Epoch 158/500
 - 0s - loss: 1.2970 - acc: 0.9130
Epoch 159/500
 - 0s - loss: 1.2780 - acc: 0.9565
Epoch 160/500
 - 0s - loss: 1.2590 - acc: 0.9565
Epoch 161/500
 - 0s - loss: 1.2400 - acc: 0.9565
Epoch 162/500
 - 0s - loss: 1.2211 - acc: 0.9565
Epoch 163/500
 - 0s - loss: 1.2021 - acc: 0.9565
Epoch 164/500
 - 0s - loss: 1.1832 - acc: 0.9565
Epoch 165/500
 - 0s - loss: 1.1643 - acc: 0.9565
Epoch 166/500
 - 0s - loss: 1.1454 - acc: 0.9565
Epoch 167/500
 - 0s - loss: 1.1265 - acc: 0.9565
Epoch 168/500
 - 0s - loss: 1.1077 - acc: 0.9565
Epoch 169/500
 - 0s - loss: 1.0889 - acc: 0.9565
Epoch 170/500
 - 0s - loss: 1.0702 - acc: 0.9565
Epoch 171/500
 - 0s - loss: 1.0515 - acc: 0.9565
Epoch 172/500
 - 0s - loss: 1.0329 - acc: 0.9565
Epoch 173/500
 - 0s - loss: 1.0144 - acc: 0.9565
Epoch 174/500
 - 0s - loss: 0.9959 - acc: 0.9565
Epoch 175/500
 - 0s - loss: 0.9775 - acc: 0.9565
Epoch 176/500
 - 0s 

 - 0s - loss: 0.0984 - acc: 0.9565
Epoch 324/500
 - 0s - loss: 0.0979 - acc: 0.9565
Epoch 325/500
 - 0s - loss: 0.0975 - acc: 0.9565
Epoch 326/500
 - 0s - loss: 0.0971 - acc: 0.9565
Epoch 327/500
 - 0s - loss: 0.0967 - acc: 0.9565
Epoch 328/500
 - 0s - loss: 0.0962 - acc: 0.9565
Epoch 329/500
 - 0s - loss: 0.0958 - acc: 0.9565
Epoch 330/500
 - 0s - loss: 0.0955 - acc: 0.9565
Epoch 331/500
 - 0s - loss: 0.0951 - acc: 0.9565
Epoch 332/500
 - 0s - loss: 0.0947 - acc: 0.9565
Epoch 333/500
 - 0s - loss: 0.0943 - acc: 0.9565
Epoch 334/500
 - 0s - loss: 0.0940 - acc: 0.9565
Epoch 335/500
 - 0s - loss: 0.0936 - acc: 0.9565
Epoch 336/500
 - 0s - loss: 0.0933 - acc: 0.9565
Epoch 337/500
 - 0s - loss: 0.0929 - acc: 0.9565
Epoch 338/500
 - 0s - loss: 0.0926 - acc: 0.9565
Epoch 339/500
 - 0s - loss: 0.0922 - acc: 0.9565
Epoch 340/500
 - 0s - loss: 0.0919 - acc: 0.9565
Epoch 341/500
 - 0s - loss: 0.0916 - acc: 0.9565
Epoch 342/500
 - 0s - loss: 0.0913 - acc: 0.9565
Epoch 343/500
 - 0s - loss: 0.0910

Epoch 491/500
 - 0s - loss: 0.0712 - acc: 0.9565
Epoch 492/500
 - 0s - loss: 0.0711 - acc: 0.9565
Epoch 493/500
 - 0s - loss: 0.0711 - acc: 0.9565
Epoch 494/500
 - 0s - loss: 0.0710 - acc: 0.9565
Epoch 495/500
 - 0s - loss: 0.0710 - acc: 0.9565
Epoch 496/500
 - 0s - loss: 0.0709 - acc: 0.9565
Epoch 497/500
 - 0s - loss: 0.0708 - acc: 0.9565
Epoch 498/500
 - 0s - loss: 0.0708 - acc: 0.9565
Epoch 499/500
 - 0s - loss: 0.0707 - acc: 0.9565
Epoch 500/500
 - 0s - loss: 0.0707 - acc: 0.9565


<keras.callbacks.History at 0x7f7ba4af7400>

In [84]:
# evaluate model
print(generate_seq(model, tokenizer, 2, 'Jack and', 5))
print(generate_seq(model, tokenizer, 2, 'And Jill', 3))
print(generate_seq(model, tokenizer, 2, 'fell down', 5))
print(generate_seq(model, tokenizer, 2, 'pail of', 5))

Jack and jill went up the hill
And Jill went up the
fell down and broke his crown and
pail of water jack fell down and


Cool.