# Recurrent Neural Networks

These are networks that have a different structure. Instead of passing the weights from one layer to the next, there is part of it that stays within the network, stored in a hidden state. 

This looks as follows:

or, to put it in a different way:

<img src='./img/RNN-unrolled.png' width="700px">



What can you do with recurrent neural networks? They are useful to keep longer term dependencies between input and output. 

This is kind of a problem very often in sequential decision making: the output makes sense after some complicated combination of different stimuli.

Many interesting use cases of RNN are described here:

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

In [1]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers import LSTM
from keras.optimizers import RMSprop
import numpy as np
import random
import sys

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


#### 1. Read some data

We will import a text file of female names.

In [2]:
f = open('./data/female.txt','r')

#### 2. Preprocessing

The preprocessing step will consist of three phases this time:
- Read the names as a long string.
- Get overlapping pairs of (sequence, next_character). 
- Encode the pairs above into one-hot vectors.

In [3]:
text_lines = f.readlines()

In [4]:
text = ' '.join(text_lines)

The processed text look as follows:

In [12]:
text[0:100]

'Abagael\n Abagail\n Abbe\n Abbey\n Abbi\n Abbie\n Abby\n Abigael\n Abigail\n Abigale\n Abra\n Acacia\n Ada\n Adah'

In [5]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 56


We now cut the text in semi-redundant sequences of `maxlen` characters


In [6]:
maxlen = 10
step = 1
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

Number of sequences: 40566


After this preprocessing step, our text file looks as follows:

In [18]:
print(sentences[7:15])
print(next_chars[7:15])

['\n Abagail\n', ' Abagail\n ', 'Abagail\n A', 'bagail\n Ab', 'agail\n Abb', 'gail\n Abbe', 'ail\n Abbe\n', 'il\n Abbe\n ']
[' ', 'A', 'b', 'b', 'e', '\n', ' ', 'A']


Our final preprocessing step is to generate a one-hot vector for each pair of sentence and succesor character.

In [19]:
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
print('Vectorization complete!')
    

Vectorization complete!


#### 3. Build the model

We will use again `keras` friendly API to add an LSTM and a dense layer. 

The predictions will be generated using a softmax function, so that we can use those scores as probabilities for generating the next character.

In [20]:
# build the model: 2 stacked LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


We need an auxiliary function to generate the next character using the outputs of the model.

In [21]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

#### 4. Train the model and see it generate some names!

In [22]:
# train the model, output generated text after each iteration
for iteration in range(1, 5):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(X, y, batch_size=128, epochs=2)

    start_index = random.randint(0, len(text) - maxlen - 1)

    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print()
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        print('\nGenerated')
        sys.stdout.write(generated)

        for i in range(20):
            x = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.

            preds = model.predict(x, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


--------------------------------------------------
Iteration 1
Epoch 1/2
Epoch 2/2

----- diversity: 0.2
----- Generating with seed: "rissa
 Mar"

Generated
rissa
 Marissa
 Maris
 Marissa

----- diversity: 0.5
----- Generating with seed: "rissa
 Mar"

Generated
rissa
 Marisa
 Marith
 Mariti


----- diversity: 1.0
----- Generating with seed: "rissa
 Mar"

Generated
rissa
 Martsa
 Marerisua
 Peli

----- diversity: 1.2
----- Generating with seed: "rissa
 Mar"

Generated
rissa
 Marisacde
 Arianna
 Are

--------------------------------------------------
Iteration 2
Epoch 1/2
Epoch 2/2

----- diversity: 0.2
----- Generating with seed: "tta
 Letti"

Generated
tta
 Letti
 Lettie
 Letil
 Let

----- diversity: 0.5
----- Generating with seed: "tta
 Letti"

Generated
tta
 Letti
 Lettie
 Letile
 Le

----- diversity: 1.0
----- Generating with seed: "tta
 Letti"

Generated
tta
 Letti
 Letcit
 Leric
 Ler

----- diversity: 1.2
----- Generating with seed: "tta
 Letti"

Generated
tta
 Letti
 Levrietta
 