# Recurrent Neural Network - with Keras

Implementing char-RNN with Keras 

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import random 
import numpy as np
from glob import glob

from keras.models import Sequential
from keras.layers.recurrent import LSTM
from keras.layers.core import Dense, Activation, Dropout


Using Theano backend.


In [2]:
# Load up input text data. in this case - these are SOTU (State of the Union) speeches 
# by US Presidents
text_files = glob('data/sotu/*.txt')
text = '\n'.join([open(f, 'r').read() for f in text_files])

# get all (unique) chars - these are our 'categories' or 'labels'
chars = list(set(text))

# set a fixed vector size
# so we look at specific window size
max_len = 20 

In [3]:
# see how much data

LEN_TEXT = len(text)
NUM_LABELS = len(chars)

print(LEN_TEXT)
print(NUM_LABELS)

2942683
89


The task is classification. Given a sequence of chars, we will predict the next character, based on its probability. 
Each character in the vocabulary has a label, e.g. "a" is 0, "b" is 1, etc.

We use _softmax_ activation (used for categorical ouput) on the output layer to generate probabilities for the predicted character. The char with the highest probability is our best guess. 

The _categorical crossentropy_ loss function is standard for multiple classification; it essentially penalizes the network more the further off it is from the correct label.

We use _dropout_ to prevent overfitting. We don't want the network to memorize everything, we want some novelty; dropout prevents the network from overfitting. We use anywhere from 20% to 50% dropout (meaning the network with turn off those nodes/neurons).

To train, we chunk the training data into chunks of size `max_len`. We match the chunks with the char that immediately follows each sequence. 


In [4]:
# Let's define our RNN model - to predict the next single character
model = Sequential()
# 1st layer: LSTM of 512 nodes
model.add(LSTM(256, return_sequences=True, input_shape=(max_len, NUM_LABELS) ))
model.add(Dropout(0.25))
# 2nd LSTM layer
model.add(LSTM(256, return_sequences=False))
model.add(Dropout(0.25))
# flatten out - 
model.add(Dense(NUM_LABELS))
# last one - softmax activation
model.add(Activation('softmax'))
# now compile the mode
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

In [5]:
# Let's try some sample text
# Sample quote from Bertrand Russell
example_text = "The mind which has become accustomed to the freedom and impartiality of philosophic contemplation will preserve something of the same freedom and impartiality in the world of action and emotion. It will view its purposes and desires as parts of the whole, with the absence of insistence that results from seeing them as infinitesimal fragments in a world of which all the rest is unaffected by any one man's deeds. The impartiality which, in contemplation, is the unalloyed desire for truth, is the very same quality of mind which, in action, is justice, and in emotion is that universal love which can be given to all, and not only to those who are judged useful or admirable. Thus contemplation enlarges not only the objects of our thoughts, but also the objects of our actions and our affections: it makes us citizens of the universe, not only of one walled city at war with all the rest. In this citizenship of the universe consists man's true freedom, and his liberation from the thraldom of narrow hopes and fears."

# step size of 3
input_1 = example_text[0:20]
true_out1 = example_text[20]
print(input_1)
print(true_out1)

input_2 = example_text[3:23]
true_out2 = example_text[23]

input_3 = example_text[6:26]
true_out2 = example_text[26]


The mind which has b
e


### Generalize the inputs

In [6]:
step = 3
inputs  = []
outputs = []
for i in range(0, len(text) - max_len, step):
    inputs.append (text[i: i+max_len])
    outputs.append(text[i+max_len])
    
print(len(inputs))
print(len(outputs))

980888
980888


Map each char to its label and vice versa. This is also called the encoder / decoder

In [7]:
# Create Encoder and Decoder: map from chars to label and reverse
char2labels = {ch: i for i, ch in enumerate(chars)}
labels2char = {i: ch for i, ch in enumerate(chars)}

In [8]:
# quick test to see what the dicts contain
print(char2labels)

{'x': 40, '-': 0, 's': 1, '\n': 69, '+': 3, '3': 41, 'g': 42, 'q': 4, '%': 43, 'h': 5, 'N': 44, 'B': 6, 'R': 7, 'p': 8, ']': 9, "'": 46, '7': 2, 'V': 10, 'A': 47, 'l': 48, 'v': 49, 'z': 12, 'o': 50, 'd': 13, '\x95': 52, '1': 15, 'J': 53, '$': 54, 'Q': 16, 't': 56, 'H': 55, 'M': 68, 'Y': 57, 'e': 17, 'D': 60, 'n': 61, 'O': 18, '½': 19, '(': 86, ';': 20, 'f': 62, 'w': 11, 'S': 51, 'T': 21, '?': 63, ' ': 64, '!': 65, 'k': 66, 'W': 67, '5': 22, 'C': 14, '`': 23, 'U': 24, '6': 36, 'c': 70, ':': 71, '¼': 72, 'X': 25, '8': 26, '0': 73, ')': 27, '.': 74, 'b': 75, '[': 76, '4': 28, '¢': 77, '&': 29, 'L': 78, ',': 30, 'K': 39, 'a': 31, 'F': 32, '/': 33, 'u': 79, 'E': 34, 'i': 35, 'G': 80, '2': 58, 'y': 81, 'j': 82, 'm': 83, 'Z': 37, 'r': 38, '9': 84, 'P': 85, '"': 45, '*': 59, 'I': 87, '_': 88}


In [9]:
# define X input and Y output label Tensors
# use book to reduce memory usage

# X shape: depth x max_len x num_labels
X = np.zeros( (len(inputs), max_len, NUM_LABELS), dtype=np.bool) 
# y shape: depth x NUM_LABELS
y = np.zeros( (len(inputs), NUM_LABELS), dtype=np.bool )  

# set appropriate indices to 1 in each one-hot vector
for i, example in enumerate(inputs):
    for t, char in enumerate(example):
        X[i, t, char2labels[char]] = 1
    y[i, char2labels[outputs[i]]]  = 1

print(X.shape)
print(y.shape)

(980888, 20, 89)
(980888, 89)


In [None]:
# now start training 
epochs = 2    # can be 10 or higher, but will need a GPU to hasten it
model.fit(X, y, 
         batch_size=128,
         nb_epoch=epochs)

Let's write a generator func that will now generate chars based on the network's predictions.


The _temperature_ controls how random we want the network to be. A lower temp means favoring more likely values; a higher temp means more randomness

In [10]:
def generate(temperature=0.3, seed=None, predicate=lambda x: len(x) < 100):
    """  Returns a new generated sentence (of upto 100 chars)
    """
    if seed is not None and len(seed) < max_len:
        raise Exception('seed must be atleast {} chars long'.format(max_len))
    
    # if no seed text, use random
    else: 
        start_ix = random.randint(0, len(text) - max_len - 1)
        seed = text[start_ix: start_ix + max_len]
    
    sentence = seed
    generated = sentence
    
    while predicate(generated):
        # create input tensor
        # from the last max_len chars generated so far
        x = np.zeros( (1, max_len, len(chars)) )
        for t, char in enumerate(sentence):
            x[0, t, char2labels[char]] = 1.
        
        # produce a prob distribution over the chars
        probs = model.predict(x, verbose=0)[0]
        
        # sample the character to use based on predicted probabilities
        next_idx  = sample(probs, temperature)
        next_char = labels2char[next_idx]
        
        generated  += next_char
        sentence    = sentence[1:] + next_char
    return generated

def sample(probabilities, temperature):
    """ samples an index from a vecor of probabilities
    """
    a = np.log(probabilities)/temperature
    distr = np.exp(a)/np.sum(np.exp(a))
    choices = range(len(probabilities))
    return np.random.choice(choices, p=distr)


Let's generate some sample text



In [11]:
epochs = 4   # Can be 10.. but need more horse power
for i in range(epochs):
    print('epoch %d'%i)
    
    # set nb_epoch to 1 since we iterating manually
    # comment to just generate text
    model.fit(X, y, batch_size=128, nb_epoch=1)
    
    # preview
    for temp in [0.33, 0.66, 1.0]:
        print('temperature: %0.2f' % temp)
        print('%s' % generate(temperature=temp))

epoch 0
Epoch 1/1
temperature: 0.33
p the sign that says to the for the consideration to the war in the last year to the Congress of the
temperature: 0.66
urgent and intense. The $ay to all to the Union in the people the for refigred and the finst the Nea
temperature: 1.00
s State lines, and in we can be increase person schools. As their over I inough the duscork or. Do a
epoch 1
Epoch 1/1
temperature: 0.33
 quarter of their interest of the restore the people and the security to the community and the prive
temperature: 0.66
e of the Nation's recommends that the abort of the country college is a new source of the Soviet cha
temperature: 1.00
tatorship to take over, path America that to nection is over mall and stade and commanded drefice. 

epoch 2
Epoch 1/1
167808/980888 [====>.........................] - ETA: 2518s - loss: 1.3544

KeyboardInterrupt: 