In [12]:
#from  future import print_function
import numpy as np
import random
import sys
from keras.models import Sequential
from keras.layers import Dense, LSTM,Activation,Dropout
from keras.optimizers import RMSprop

The Project Gutenberg eBook of the complete works of William Shakespeare’s dataset is used to train the network for automated text generation. Data can be downloaded from http:// www.gutenberg.org/ for the raw file used for training:

In [13]:
#http://www.gutenberg.org/
#http://www.gutenberg.org/files/100/100-0.txt
#https://hub.packtpub.com/auto-generate-texts-shakespeare-writing-using-deep-recurrent-neural-networks/

The following code is used to create a dictionary of characters to indices and vice-versa mapping, which we will be using to convert text into indices at later stages. This is because deep learning models cannot understand English and everything needs to be mapped into indices to train these models:

In [14]:
path = "C:/Users/pmlef/Desktop/Python_work/Selenium/Shakespeare.txt"

In [15]:
text = open(path, encoding="utf8").read().lower()

In [16]:
characters = sorted(list(set(text)))
print('corpus length:', len(text))
print('total chars:', len(characters))

corpus length: 5667137
total chars: 72


In [17]:
char2indices = dict((c, i) for i, c in enumerate(characters))
indices2char = dict((i, c) for i, c in enumerate(characters))

Before training the model, various preprocessing steps are involved to make it work. The following are the major steps involved:

    Preprocessing: Prepare X and Y data from the given entire story text file and converting them into indices vectorized format.
    Deep learning model training and validation: Train and validate the deep learning model.
    Text generation: Generate the text with the trained model.

How it works…

The following lines of code describe the entire modeling process of generating text from Shakespeare’s writings. Here we have chosen character length. This needs to be considered as 40 to determine the next best single character, which seems to be very fair to consider. Also, this extraction process jumps by three steps to avoid any overlapping between two consecutive extractions, to create a dataset more fairly:

In [18]:
# cut the text in semi-redundant sequences of maxlen characters

maxlen = 40
step = 3
sentences = []
next_chars = []

np_sequence=0
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
    np_sequence=len(sentences)

The next code block is used to convert the data into a vectorized format for feeding into deep learning models, as the models cannot understand anything about text, words, sentences and so on. Initially, total dimensions are created with all zeros in the NumPy array and filled with relevant places with dictionary mappings:

In [19]:
# Converting indices into vectorized format

X = np.zeros((len(sentences), maxlen, len(characters)), dtype=np.bool)
y = np.zeros((len(sentences), len(characters)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char2indices[char]] = 1
        y[i, char2indices[next_chars[i]]] = 1

The deep learning model is created with RNN, more specifically Long Short-Term Memory networks with 128 hidden neurons, and the output is in the dimensions of the characters. The number of columns in the array is the number of characters. Finally, the softmax function is used with the RMSprop optimizer. 

In [20]:
#Model Building

model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(characters))))
model.add(Dense(len(characters)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.01))
print (model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 128)               102912    
_________________________________________________________________
dense_2 (Dense)              (None, 72)                9288      
_________________________________________________________________
activation_2 (Activation)    (None, 72)                0         
Total params: 112,200
Trainable params: 112,200
Non-trainable params: 0
_________________________________________________________________
None


As mentioned earlier, deep learning models train on number indices to map input to output (given a length of 40 characters, the model will predict the next best character). The following code is used to convert the predicted indices back to the relevant character by determining the maximum index of the character:

In [21]:
# Function to convert prediction into index

def pred_indices(preds, metric=1.0):
    preds = np.asarray(preds).astype('float64')
    #if preds.all()==0: preds==10**-10
    preds = np.log(preds+10**-10) / metric#
    exp_preds = np.exp(preds)
    preds = exp_preds/np.sum(exp_preds)
    probs = np.random.multinomial(1, preds, 1)
    return np.argmax(probs)

The model will be trained over 30 iterations with a batch size of 128. And also, the diversity has been changed to see the impact on the predictions:

In [22]:
# Train and Evaluate the Model

for iteration in range(0, 30):
    print('-' * 40)
    print('Iteration', iteration)
    model.fit(X, y,batch_size=128,epochs=1)
    start_index = random.randint(0, len(text) - maxlen - 1)
    gen_diversity=[]    
    for diversity in [0.2, 0.7,1.2]:
        print('n----- diversity:', diversity)
        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        gen_diversity.append(generated)
        
        for i in range(400):
            x = np.zeros((1, maxlen, len(characters)))

        pred_sentence=[]       
        pred_chars=[]
        for t, char in enumerate(sentence):
            x[0, t, char2indices[char]] = 1.
            preds = model.predict(x, verbose=0)[0]
            next_index = pred_indices(preds, diversity)
            pred_char = indices2char[next_index]
            generated += pred_char
            sentence = sentence[1:] + pred_char
            pred_sentence.append(sentence)
            pred_chars.append(pred_char)
            #sys.stdout.flush()
            #print("nOne combination completed n")

----------------------------------------
Iteration 0
Epoch 1/1
n----- diversity: 0.2
----- Generating with seed: "martino and his wife and daughters;
coun"
n----- diversity: 0.7
----- Generating with seed: "martino and his wife and daughters;
coun"
n----- diversity: 1.2
----- Generating with seed: "martino and his wife and daughters;
coun"
----------------------------------------
Iteration 1
Epoch 1/1
n----- diversity: 0.2
----- Generating with seed: "ar's lips,
    finger of birth-strangled"
n----- diversity: 0.7
----- Generating with seed: "ar's lips,
    finger of birth-strangled"
n----- diversity: 1.2
----- Generating with seed: "ar's lips,
    finger of birth-strangled"
----------------------------------------
Iteration 2
Epoch 1/1
n----- diversity: 0.2
----- Generating with seed: "ty;
    then be at peace, except ye thir"
n----- diversity: 0.7
----- Generating with seed: "ty;
    then be at peace, except ye thir"
n----- diversity: 1.2
----- Generating with seed: "ty;
    then be 

n----- diversity: 0.2
----- Generating with seed: "oubt some noble creature in her,
    das"
n----- diversity: 0.7
----- Generating with seed: "oubt some noble creature in her,
    das"
n----- diversity: 1.2
----- Generating with seed: "oubt some noble creature in her,
    das"
----------------------------------------
Iteration 20
Epoch 1/1
n----- diversity: 0.2
----- Generating with seed: "ell met, master ford.
  ford. trust me, "
n----- diversity: 0.7
----- Generating with seed: "ell met, master ford.
  ford. trust me, "
n----- diversity: 1.2
----- Generating with seed: "ell met, master ford.
  ford. trust me, "
----------------------------------------
Iteration 21
Epoch 1/1
n----- diversity: 0.2
----- Generating with seed: "d, and uncle exeter,
    we will aboard "
n----- diversity: 0.7
----- Generating with seed: "d, and uncle exeter,
    we will aboard "
n----- diversity: 1.2
----- Generating with seed: "d, and uncle exeter,
    we will aboard "
-----------------------------------

martino and his wife and daughters; ar's lips, finger of birth-strangled" "ty;
    then be at peace, except ye thir" "t strike, thy conscience
    is so posse" 'edward the fourth, by the grace of go"  ", pucelle, hold thy peace;
    if talbot" " tide,
being prison’d in her eye, like p" "emn'd upon the act of fornication
    to"  "
achilles.
i shall forestall thee, lord "  "like a worm i’ th’ bud, ves, shade folly. who is he comes here?
"
feed on her dama"  "idst presume, "is day come
to blow that furnesse out th"  "ever was prince’s child.  happy what fol" en the blood was cool, have threaten'd " it: i cannot eat it.
  alcibiades. when"  "dromio of syracuse. no? why, 'tis a plai"  "
    my soul!                           " " found so, master page. master doctor  "oubt some noble creature in her,
    das"
  " "ell met, master ford.
  ford. trust me, "  "d, and uncle exeter,
    we will aboard "  "er mother’s flesh,
by the defiling of he" "ity, your old virginity, is like one of "  "m you
have recovered, desire it not. far" "of it into his face,
extinguishing his c"  "nd
    as his misdoubts present occasion"
     "haste you may.
                         "" be said as lovers they do feign.
  audr""irits hear me,
    and yet i needs must "