# RNN Text Generation

This notebook implements a recurrent neural network that learns to compose sonnets after being trained by Shakespeare. A character level approach is used. Hidden layers use LSTM units. 

In [1]:
from tensorflow.keras.callbacks import LambdaCallback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import RMSprop
import numpy as np
import random
import sys
import io
import requests

## Shakespeare's Sonnets or Frankenstein

A previously processed text file containing all of Shakespeare's sonnets or the text of Frankenstein by Mary Shelley is imported. The character vocabulary and dictionaries to make characters to indexes and vice versa are created.

In [2]:
# read in preprocessed text of shakespeare's sonnets 

# from local file system
# filename = 'sonnets.txt'
# filename = 'frankenstein.txt'
# file = open(filename,'r')
# text = file.read()

# from github
url = 'https://raw.githubusercontent.com/prof-groff/deep-learning/master/data/sonnets.txt'
# url = 'https://raw.githubusercontent.com/prof-groff/deep-learning/master/data/frankenstein.txt'
text = requests.get(url).text

len_text = len(text)
print('text length: '.upper() + str(len(text)) + '\n')
print('text sample:'.upper() + '\n')
print(text[0:612]) # show some text

TEXT LENGTH: 94687

TEXT SAMPLE:

i
from fairest creatures we desire increase,
that thereby beauty's rose might never die,
but as the riper should by time decease,
his tender heir might bear his memory:
but thou, contracted to thine own bright eyes,
feed'st thy light's flame with self-substantial fuel,
making a famine where abundance lies,
thy self thy foe, to thy sweet self too cruel:
thou that art now the world's fresh ornament,
and only herald to the gaudy spring,
within thine own bud buriest thy content,
and tender churl mak'st waste in niggarding:
pity the world, or else this glutton be,
to eat the world's due, by the grave and thee.


In [3]:
# created some dictionaries
chars = sorted(list(set(text)))
len_chars = len(chars)
print('total chars: ' + str(len_chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 38


The text is chunked up into sequences of uniform length. Each sequence is a feature and has a target corresponding to the character in the text immediately following the sequence. The features and targets are then vectorized. Each character is converted to a Boolean vectors having only one true element at the possiton corresponding to the character. <em>Note: making these vectorized features and targets integers (0s and 1s) instead of booleans seriously hampers learning and I am not sure why.</em>

In [4]:
# cut the text in semi-redundant sequences of maxlen characters
seq_length = 128
step = 3
features = []
targets = []
for i in range(0, len_text - seq_length, step):
    features.append(text[i: i + seq_length])
    targets.append(text[i + seq_length])
num_features = len(features)
print('number of features (sequences):'.upper() + str(num_features))

NUMBER OF FEATURES (SEQUENCES):31520


In [5]:
x = np.zeros((num_features, seq_length, len_chars), dtype=np.bool)
y = np.zeros((num_features, len_chars), dtype=np.bool)
for i, feature in enumerate(features):
    for t, char in enumerate(feature):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[targets[i]]] = 1

## Define the Model Graphs

Two RNN graphs are implemented here. One is more deep and less wide and the other is less deep and more wide. The model parameters have been tuned to give decent results. A function is also defined to allow sampling of the train modeled in order to generate new character sequences. In additoin, a function is defined which is called at the end of each epoch to generate and display generated character sequences as the network learns

In [6]:
def deep_model(seq_length, n_chars):
    print('buildinng two-layer model with 128 memory units...'.upper()+'\n')
    
    model = Sequential()
    model.add(LSTM(128, return_sequences=True, input_shape=(seq_length, n_chars)))
    model.add(LSTM(128))
    model.add(Dropout(0.1))
    model.add(Dense(len(chars), activation='softmax'))

    optimizer = RMSprop(lr=0.005)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    
    return model

def wide_model(seq_length, n_chars):
    print('building one-layer model with 256 memory units...'.upper()+'\n')
    
    model = Sequential()
    model.add(LSTM(256, input_shape=(seq_length, n_chars)))
    model.add(Dropout(0.1))
    model.add(Dense(len(chars), activation='softmax'))

    optimizer = RMSprop(lr=0.005)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

    return model

def sample(probs, method=0):
    probs = np.asarray(probs).astype('float64')
    probs = probs/np.sum(probs)
    
    # helper function to sample an index from a probability array
    if method == 0: 
        # method 0: just return index of character with the highest probability
        index = np.argmax(probs)
    elif method == 1: 
        # method 1: draw a random number from 0 to 1, calculate the cumulative sum of the prediction vector.
        # return the index of the first element in the cumulative sum greater than the random number
        index = np.argwhere(np.cumsum(probs)>np.random.uniform())[0][0]
    elif method == 2:
        # method 2: draw an element from a multinomial distribution defined by the given probabilities
        index = np.argmax(np.random.multinomial(1, probs, 1))
    elif method == 3:
        # method 3: emphasis larger probabilities and diminish smaller probabilities by doing a log transform
        temperature = 0.5 # less than one increases differences between small and large probabilities
        probs = np.log(probs) / temperature # same as method 2 with temperature = 1
        probs = np.exp(probs) # undo log transform
        probs = probs / np.sum(probs)
        index = np.argmax(np.random.multinomial(1, probs, 1))
        
    return index

def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    if (epoch)%10 == 0:

        print('\nGENERATING CHARACTER SEQUENCE AFTER EPOCH: {}\n'.format(epoch+1))

        start_index = random.randint(0, len_text - seq_length - 1)
       
        
        seed = text[start_index: start_index + seq_length]
        print('GENERATING WITH SEED:\n\n' + seed + '\n')
        
        
        print('MAX SAMPLING:\n')
        phrase = seed
        generated = ''
        generated += phrase
        sys.stdout.write(generated)

        for i in range(seq_length):
            x_pred = np.zeros((1, seq_length, len_chars))
            for t, char in enumerate(phrase):
                x_pred[0, t, char_indices[char]] = 1.

            probs = model.predict(x_pred, verbose=0)[0]
            next_index = sample(probs, method=0)
            next_char = indices_char[next_index]

            generated += next_char
            phrase = phrase[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print('\n')
        
        print('PROBABILISTIC SAMPLING:\n')
        phrase = seed
        generated = ''
        generated += phrase
        sys.stdout.write(generated)

        for i in range(seq_length):
            x_pred = np.zeros((1, seq_length, len_chars))
            for t, char in enumerate(phrase):
                x_pred[0, t, char_indices[char]] = 1.

            probs = model.predict(x_pred, verbose=0)[0]
            next_index = sample(probs, method=1)
            next_char = indices_char[next_index]

            generated += next_char
            phrase = phrase[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print('\n')

## Model Training

In [7]:
# deeper, less wide, model
model = deep_model(seq_length, len_chars)
model.summary()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

print('\n')
model.fit(x, y,
          batch_size=256,
          epochs=41,
          # epochs=21, # for frankenstein use fewer epochs because the text is much longer
          callbacks=[print_callback])

BUILDINNG TWO-LAYER MODEL WITH 128 MEMORY UNITS...

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128, 128)          85504     
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense (Dense)                (None, 38)                4902      
Total params: 221,990
Trainable params: 221,990
Non-trainable params: 0
_________________________________________________________________


Train on 31520 samples
Epoch 1/41
GENERATING CHARACTER SEQUENCE AFTER EPOCH: 1

GENERATING WITH SEED:

t hath my duty strongly knit,
to thee i send this written embassage,
to witness duty, not t

Epoch 37/41
Epoch 38/41
Epoch 39/41
Epoch 40/41
Epoch 41/41
GENERATING CHARACTER SEQUENCE AFTER EPOCH: 41

GENERATING WITH SEED:

es have thorns, and silver fountains mud:
clouds and eclipses stain both moon and sun,
and loathsome canker lives in sweetest bu

MAX SAMPLING:

es have thorns, and silver fountains mud:
clouds and eclipses stain both moon and sun,
and loathsome canker lives in sweetest budded.
the raintance to be, rettered my verse is love,
to time thy fairet made still in outtifor was,
i is the dare weeth forther

PROBABILISTIC SAMPLING:

es have thorns, and silver fountains mud:
clouds and eclipses stain both moon and sun,
and loathsome canker lives in sweetest budne,
intermed it the fingers, my loves thou my lovely arpound,
which i stold contwern deel me hape to frow fall,
genenger apliet



<tensorflow.python.keras.callbacks.History at 0x7ffac041a6a0>

In [8]:
# less deep but wider model
# deeper, less wide, model
model = wide_model(seq_length, len_chars)
model.summary()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

print('\n')
model.fit(x, y,
          batch_size=256,
          epochs=41,
          # epochs=21, # for frankenstein use fewer epochs because the text is much longer
          callbacks=[print_callback])

BUILDING ONE-LAYER MODEL WITH 256 MEMORY UNITS...

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 256)               302080    
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 38)                9766      
Total params: 311,846
Trainable params: 311,846
Non-trainable params: 0
_________________________________________________________________


Epoch 1/41

GENERATING CHARACTER SEQUENCE AFTER EPOCH: 1

GENERATING WITH SEED:

eir eyes were kind,
to thy fair flower add the rank smell of weeds:
but why thy odour matcheth not thy show,
the soil is this, t

MAX SAMPLING:

eir eyes were kind,
to thy fair flower add the rank smell of weeds:
but why thy odour matcheth not thy show,
the soil is 

in thy soul's thought, all naked, whils, and kindy,
when thou still in this false in my cheross'd;
and ther faret first masure of hours love,
to the pligut the f

PROBABILISTIC SAMPLING:

ne
may make seem bare, in wanting words to show it,
but that i hope some good conceit of thine
in thy soul's thought, all naked, thengh right'd was duemy't,
to my shame dith art, who, cruch and coones,
thou art, and roose shoul line,
they in this sid, and 



<keras.callbacks.History at 0x7fc6adc7a850>