Text Generation with Python 

In [None]:
The Natural Language Toolkit

In [3]:
import numpy as np
import random
import sys

Use text file, we will be training the network on the text

In [4]:
with open('epdf.pub_alexandre-dumas-the-black-tulip.txt', 'r') as file:
    text = file.read().lower()
    
print('text length:', len(text))    

text length: 434031


In [5]:
#getting all unique chars
chars = sorted(list(set(text))) #getting all unique chars
print('total chars:', len(chars))

#use the enumerate function

char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

total chars: 49


To get valuable data, which can use to train our model we will split our data up into subsequences with a length of 49 charcters. Then we will transform our data to an boolean array.

Create training examples and targets

In [6]:
maxlen = 49
step = 3
sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
    
    
    
x = np.zeros((len(sentences), maxlen, len(chars)), dtype = np.bool)
y = np.zeros((len(sentences), len(chars)), dtype = np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
        
        
    y[i, char_indices[next_chars[i]]] = 1
    

Recurrent Neural Network Model

We will create a simple RNN with the following structure

We will use the RMSprop optimizer and the categorical crossentropy loss function

In [7]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop

Using TensorFlow backend.


In [8]:
model = Sequential()
model.add(LSTM(128, input_shape = (maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr = 0.01)
model.compile(loss = 'categorical_crossentropy', optimizer = optimizer)

Built the model

In [19]:
def sample(preds, temperature = 1.0):
    #helper function to sample an index from a probably array
    preds = np.array(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, logs):
    #Function invoked at end of each epoch. Prints generated text.
    
    print()
    print('----Generating text after Epoch: % d' % epoch)
    
    start_index = random.randint(0, len(text) -maxlen - 1)
    
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----diversity:', diversity)
        
        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('-----Generating with seed: "' + sentence + '"')
        
        sys.stdout.write(generated)
        
        
        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1
                
                
                
            preds = model.predict(x_pred, verbose = 0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]
            
            generated += next_char
            sentence = sentence[1:] + next_char
            sys.stdout.write(next_char)
            sys.stdout.flush()
            
        print()
        
        
        

In [10]:
import tensorflow as tf
print_callback = tf.keras.callbacks.LambdaCallback(on_epoch_end = on_epoch_end) 

We will also define two other callback functions. The first is called ModelCheckpoint. It wil save our model each 
epoch the loss decrease

In [11]:
from keras.callbacks import ModelCheckpoint

filepath = 'weights.hdf5'
checkpoint = ModelCheckpoint(filepath, monitor = 'loss', verbose = 1, save_best_only = True, mode = 'min')

The other callback will reduce the learning rate each time our learning plateaus

In [12]:
from keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(monitor = 'loss', factor = 0.2, patience = 1, min_lr = 0.001)

callbacks = [print_callback, checkpoint, reduce_lr]

Training a model and generating new text 

In [13]:
model.fit(x,y, batch_size = 128, epochs = 5, callbacks = callbacks)

Epoch 1/5

----Generating text after Epoch:  0
----diversity: 0.2
-----Generating with seed: " alone, but there is that master jacob, who watch"
 alone, but there is that master jacob, who watchell the prison of his hand of the prison of the came of the prison of his are the same the paster of the paster of the parst of the prison and the parited the  and  and
the prison of the prison of the prison of the prison of the prison of the prison of the same the prison of the came the parter of the carder of the could not the same the was here and the parst of his hand the  and  and  the  man  
----diversity: 0.5
-----Generating with seed: " alone, but there is that master jacob, who watch"
 alone, but there is that master jacob, who watched  the  decate
mortean, whe than the hands of and and my the pared for young the in an  a  on  her
withound herest the  could  and  and  the
ball the land him, could not well where here were in a linds on not the lang of the prason of the camusare the parle

<keras.callbacks.callbacks.History at 0x7fdca8385a50>

To generate text ourselves we will create a function similar to the on_epoch_end function. 

In [23]:
def generate_text(length, diversity):
    #get random starting text
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated = ''
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    for i in range(length):
        x_pred = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_indices[char]] = 1
            
            
            
        preds = model.predict(x_pred, verbose = 0)[0]
        next_index = sample(preds, diversity)
        next_char = indices_char[next_index]
        
        generated += next_char
        sentence = sentence[1 :] + next_char
        
        
    return generated    

Now we can create text by just calling the generate_text function

In [24]:
print(generate_text(500, 0.2))

ine?" asked rosa, trembling.
     "yes,-that of my friend, and the contray carriage of the statt of the prince of the tulip was to the could not the statt of the prince of the statt of his brow him to the state of the states of his fate of the prince, and the cornelius was to the states of the statt of the two brother but the two one of the fell to the contriver, when the ward the state of the cornelius was to the statt of the carriage of the shall of his hands of the society of the statt of the  angary  to  the  new  for  the  bulbs  of  the 
