# Character Level Text Generation

For this exercise we will be generating text with the same writing style as the writer, the LSTM model has been trained from. This model will be trained in the charater-level which means that the sentence: 
>**"The quick brown fox jumps over the lazy dog."**

has a total of:

```python
sentence = list("The quick brown fox jumps over the lazy dog.")
print(sentence)
>>>> ['T','h','e',' ','q','u','i','c','k',' ','b','r','o','w','n',' ','f', 'o','x',' ','j','u','m','p','s',' ','o', 'v','e', 'r',' ','t','h','e',' ','l','a','z','y',' ','d','o','g','.']
len(sentence)
>>>> 44
```
44 variables

The model will be fed a certain amount of character and it will try to predict the next character as seen below:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>The quick brown fox jumps over the lazy do </i>** [?]**

The model should predict the letter **"g"**


The model will not be case sensitive and it will not ignore punctuations. It will not read words but specific letters.It will also need an initial input phrase of 40 characters in order to generate text and from there its output will become its input.


### Imports

In [1]:
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM,Input
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
from keras.models import Model,load_model
import numpy as np
import random
import sys
import io

Using TensorFlow backend.


### Corpuses

So were going to try different corpuses and generate their own texts. Source:kaggle.com, gutenberg.org

In [2]:
corpus = ['shakespeare.txt',
          'nietzsche.txt', 
          'emilydickensonpoems.txt',
          'eminem.txt']

### Get the length of each Corpuses

In [3]:
def total_char(corpus):
    text = []
    for c in corpus:
        text_in_file = open(str(c),'r', encoding='ISO-8859-1') 
        text.append(text_in_file.read().lower()) 
        print('Total Characters in '+ str(c) + ': ' + str(len(text[-1])) )
    return text

In [4]:
text_data = total_char(corpus)

Total Characters in shakespeare.txt: 2588732
Total Characters in nietzsche.txt: 600901
Total Characters in emilydickensonpoems.txt: 170396
Total Characters in eminem.txt: 739123


### Count the unique charaters 

In [5]:
def count_unique_char(corpus):
    for c in range(len(corpus)):
        chars = sorted(list(set(text_data[c])))
        print('Total unique characters in '+str(corpus[c])+':', len(chars))
    return None

In [6]:
count_unique_char(corpus)

Total unique characters in shakespeare.txt: 41
Total unique characters in nietzsche.txt: 59
Total unique characters in emilydickensonpoems.txt: 52
Total unique characters in eminem.txt: 72


Now that we know the details of all the corpuses we will start working on just one corpus and then apply all the functions we have written to the other ones. For our initial corpus we will use shakespeare.

## Shakespeare

In [7]:
shakespeare = text_data[0]

### Create a character dictionary

In [8]:
def ind_char(corpus):
    chars = sorted(list(set(corpus)))
    char_indices = dict((c, i) for i, c in enumerate(chars))
    indices_char = dict((i, c) for i, c in enumerate(chars))
        
    return chars,char_indices,indices_char

In [9]:
chars, char_indices,indices_char = ind_char(shakespeare)

### Character to Indices 

In [10]:
char_indices

{'\n': 0,
 ' ': 1,
 '!': 2,
 '$': 3,
 '&': 4,
 "'": 5,
 ',': 6,
 '-': 7,
 '.': 8,
 '3': 9,
 ':': 10,
 ';': 11,
 '?': 12,
 '[': 13,
 ']': 14,
 'a': 15,
 'b': 16,
 'c': 17,
 'd': 18,
 'e': 19,
 'f': 20,
 'g': 21,
 'h': 22,
 'i': 23,
 'j': 24,
 'k': 25,
 'l': 26,
 'm': 27,
 'n': 28,
 'o': 29,
 'p': 30,
 'q': 31,
 'r': 32,
 's': 33,
 't': 34,
 'u': 35,
 'v': 36,
 'w': 37,
 'x': 38,
 'y': 39,
 'z': 40}

### Indices to character

In [11]:
indices_char

{0: '\n',
 1: ' ',
 2: '!',
 3: '$',
 4: '&',
 5: "'",
 6: ',',
 7: '-',
 8: '.',
 9: '3',
 10: ':',
 11: ';',
 12: '?',
 13: '[',
 14: ']',
 15: 'a',
 16: 'b',
 17: 'c',
 18: 'd',
 19: 'e',
 20: 'f',
 21: 'g',
 22: 'h',
 23: 'i',
 24: 'j',
 25: 'k',
 26: 'l',
 27: 'm',
 28: 'n',
 29: 'o',
 30: 'p',
 31: 'q',
 32: 'r',
 33: 's',
 34: 't',
 35: 'u',
 36: 'v',
 37: 'w',
 38: 'x',
 39: 'y',
 40: 'z'}

### Cut the text in semi-redundant sequences of maxlen characters 

These will be used to feed the model

In [12]:
maxlen = 40
step = 3
def subset_sentences(text,maxlen,step,verbose=False):
    #declare lists
    sentences = []
    next_chars = []
    
    #append subseted strings in to their respective lists
    for i in range(0, len(text) - maxlen, step):
        sentences.append(text[i: i + maxlen])
        next_chars.append(text[i + maxlen])
    if verbose == True: 
        print('sentence sequences:', len(sentences))
    
    return sentences,next_chars

In [13]:
sentences,next_chars = subset_sentences(shakespeare,maxlen,step,verbose=True)

sentence sequences: 862898


### Lines are now cut into a defined number of characters (Length of 40)

In [14]:
sentences[5:7]

['before we proceed any further, hear me s',
 'ore we proceed any further, hear me spea']

### Next charater after each line are now assigned into a variable

In [15]:
next_chars[5:7]

['p', 'k']

## Sample:

### First Line:
    
    
    
**"before we proceed any further, hear me s"&nbsp;&nbsp;&nbsp;+&nbsp;&nbsp;&nbsp;"p"** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>(full word is speak)</i>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sentence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;next_chars


### Second Line



**"ore we proceed any further, hear me spea"&nbsp;&nbsp;&nbsp;+&nbsp;&nbsp;&nbsp;"k"** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i>(full word is speak)</i>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sentence&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;next_chars

### One hot encode x and y

In [16]:
def oh_xy(sentences,maxlen,chars,char_indices,next_chars):
    #declare x,y into zero vectors
    x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
    y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

    #assign 1 in their appropriate spots in the matrix
    for i, sentence in enumerate(sentences):
        for t, char in enumerate(sentence):
            x[i, t, char_indices[char]] = 1
        y[i, char_indices[next_chars[i]]] = 1
        
    return x,y

In [17]:
x,y = oh_xy(sentences,maxlen,chars,char_indices,next_chars)

In [179]:
sum(y)

array([ 5786, 42712,   268,   213,     4,    11,  3478,   355,   390,
         198,     4,  2662,   348,   450,     5,    37,    40,    47,
          27,    20,    26,     6,     8,    21,    24,    40,    12,
           3,   343,   151,   145,     8, 14628,  3399,  5113,  6847,
       19465,  3511,  4330, 10216, 14941,   543,  3392,  7967,  6402,
       12396, 15327,  3413,   110,  8489, 11435, 17238,  7192,  1490,
        4524,   172,  5704,   234,    16,    10,     0,     1,     0,
           1,     2,     1,     1,     0,     0,     0,     6,     3])

### Building an LSTM Model

In [18]:
def build_model(nodes,maxlen,chars):
    model = Sequential()
    model.add(LSTM(nodes, input_shape=(maxlen, len(chars))))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    return model

### Sampling function

we increase the probability of the most probable words, and decrease the probabilities of less probable ones using **temperature**

In [19]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

### Printing the Generated Text

We want to the model to output the generated text when we wish to so, we can see how well the model is doing and if it needs further training. 

In [22]:
def print_output(model,length,text,maxlen,chars,char_indices,indices_char,generated = ''):
    generated = generated.lower()
    start_index = random.randint(0, len(text) - maxlen - 1)
    
    if len(generated) != maxlen and generated != '':
        print('Input must be ' + str(maxlen) + ' characters long.')
        print('Input is ' + str(len(generated)) + ' characters long')
    else:
        if generated == '':
            sentence = text[start_index: start_index + maxlen]
            generated += sentence
        else:
            sentence = generated
                
        for diversity in [0.01,0.2, 0.5,1]:
            generated1 = generated
            sentence1 = sentence
            print()
            print()
            print('/---------------------------------- DIVERSITY: %f ----------------------------------/'% diversity)

            print('----- Generating with seed: "' + sentence1 + '"')
            print()
            print()
            sys.stdout.write(generated1)
            
            for i in range(length):
                x_pred = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(sentence1):
                    x_pred[0, t, char_indices[char]] = 1.

                preds = model.predict(x_pred, verbose=0)[0]
                next_index = sample(preds, diversity)
                next_char = indices_char[next_index]

                generated1 += next_char
                sentence1 = sentence1[1:] + next_char

                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()
        
    return None

In [23]:
model = build_model(256,maxlen,chars)
model.fit(x, y,
          batch_size=512,
          epochs=1)

Epoch 1/1


<keras.callbacks.History at 0x24da6c6d320>

### Text Generation: One Epoch

In [24]:
print_output(model,400,shakespeare,maxlen,chars,char_indices,indices_char)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: "chus:
my lord, i hear.

pericles:
most h"


chus:
my lord, i hear.

pericles:
most him and the beging of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of my bear the bear of


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: "chus:
my lord, i hear.

pericles:
most h"


chus:
my lord, i hear.

pericles:
most him and the beging of my dear the bear the beging
that i did the be the was and be the begit
that a father of the back and master that i have be the baster
and fair faith that i was a fa

In [25]:
model.fit(x, y,
          batch_size=512,
          epochs=3)


Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x24da6c6d240>

### Text Generation: Three Epochs

In [26]:
print_output(model,400,shakespeare,maxlen,chars,char_indices,indices_char)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: "
there wanteth but a mean to fill your s"



there wanteth but a mean to fill your son
which were the world that i will be the strength,
and the world that i will be so shall be the
with the world i will be so so shall be the
with the world of the world that i will see the streets,
that will she will not the strength of the
with the world of the world i will be the
with the world of the world i will be the
with the world of the world i will be so soul.

cornwall:
what shall i 

  after removing the cwd from sys.path.


wil


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: "
there wanteth but a mean to fill your s"



there wanteth but a mean to fill your son
will strike the strength of the will of the threel.

lucentio:
what shall i was the world that i will be so life
the man in the shall be the such a threel.

benedick:
what shall i lies the ways that well be so soul.

cornwall:
i will not be not the world of thy loved,
that i am the strength of the hands of the good
that i will be the arm of the hands and soul.

king richard ii:
hark the sun and


/---------------------------------- DIVERSITY: 0.500000 ----------------------------------/
----- Generating with seed: "
there wanteth but a mean to fill your s"



there wanteth but a mean to fill your sorrow.

capulet:
har is the prince, that so so fall the life,
her commanded shall sleep unto the foul cause.

a charge; what i should not do you were between him
by thy souls cassiu

Interestingly, even with just a few epochs our model  can already form words in somewhat correct spelling. As we train the model further its performance will only get better.

# Generating Text From Different Sources

I ran the model in varying amounts. Basically, it requires a lot of training which I will not be doing since it consumes too much time. 

In [80]:
corpus

['shakespeare.txt', 'nietzsche.txt', 'emilydickensonpoems.txt', 'eminem.txt']

# Putting everything into a single function

In [81]:
def gen_model(data,nodes,maxlen = 40,step = 3):
    chars, char_indices,indices_char = ind_char(data)
    print(len(chars))
    sentences,next_chars = subset_sentences(data,maxlen,step)
    x,y = oh_xy(sentences,maxlen,chars,char_indices,next_chars)

    model = build_model(nodes,maxlen,chars)
    return model,x,y,maxlen,chars,char_indices,indices_char

### Shakespeare

In [82]:
def build_model(nodes,maxlen,chars):
    model = Sequential()
    model.add(LSTM(nodes, input_shape=(maxlen, len(chars))))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.0001)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    return model

In [83]:
model_shakespeare,x,y,maxlen_shakespeare,chars_shakespeare,char_indices_shakespeare,indices_char_shakespeare = gen_model(text_data[0],nodes = 256)

41


In [None]:
model.fit(x, y,
          batch_size=256,
          epochs=1000)

In [None]:
model.save('model_shakespeare.h5') 
model = load_model('model_shakespeare.h5')

In [87]:
model_shakespeare = load_model('model_shakespeare.h5')

In [119]:
print_output(model_shakespeare,500,text_data[0],maxlen_shakespeare,chars_shakespeare,char_indices_shakespeare,indices_char_shakespeare)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: " manners shall lie all in one or two men"


 manners shall lie all in one or two menisers
than him the condetion of my spilit,
which marry he have hath here as hole and dear.

king henry vii:
i would i have the field to rich fronchind: and
the truth of his face, from sayishin yea,
to wis or here the raget will made of sir,
for theiried earth all of the seased with him.

duke vincentic:
:
what man's? have you? sir?

servant:
a black and the prayers; they can be saint,
and save you, you are an else.

caesar:
where is this gone?
the noble duke my name, or earth he loves me.

lady 


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: " manners shall lie all in one or two men"


 manners shall lie all in one or two menisers
than him the commended the passing off
all fortunes, we are his nobled, and the

### Nietzche

In [100]:
def build_model(nodes,maxlen,chars):
    model = Sequential()
    model.add(LSTM(nodes, input_shape=(maxlen, len(chars)),return_sequences=True))
    model.add(LSTM(nodes))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    return model

In [101]:
model_nietzsche,x,y,maxlen_nietzsche,chars_nietzsche,char_indices_nietzsche,indices_char_nietzsche = gen_model(text_data[1],nodes = 256)

59


In [None]:
model.fit(x, y,
          batch_size=256,
          epochs=1000)

In [None]:
model.save('model_nietzsche.h5') 
model = load_model('model_nietzsche.h5')

In [102]:
model_nietzsche = load_model('model_nietzsche.h5')

In [122]:
print_output(model_nietzsche,500,text_data[1],maxlen_nietzsche,chars_nietzsche,char_indices_nietzsche,indices_char_nietzsche)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: "however,
where it is believed that the l"


however,
where it is believed that the lengua an

  after removing the cwd from sys.path.


d rank and finally
believes that at last the lose of the rible, the french revess of man morality the
trouble and arrogon of man, that which inner calling all misunderstanding and upon the individual about the rudence for tremplate in ourselves
have good repules them, inasure, and after called themselves, in
such a herdinic of every kind of the hontire of the reverleged
his own doing insterence and certain of this guiltic certainty of pleasure of
an idea of the spirit, as elsome, a state


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: "however,
where it is believed that the l"


however,
where it is believed that the lengua and rank and finally
believes that at last menrich, in the same least and like every things, there must alway discoverity of corncencess,
 and in the same laughter superior delight into of
the polman sexual more soul of the end, in order to kind of men of the
morality of the ancient tistic rev

### Emily Dickenson

In [105]:
def build_model(nodes,maxlen,chars):
    model = Sequential()
    model.add(LSTM(nodes, input_shape=(maxlen, len(chars)),return_sequences=True))
    model.add(LSTM(nodes))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    return model

In [106]:
model_emilydickenson,x,y,maxlen_emilydickenson,chars_emilydickenson,char_indices_emilydickenson,indices_char_emilydickenson = gen_model(text_data[2],nodes = 256)

52


In [None]:
model.fit(x, y,
          batch_size=256,
          epochs=1000)

In [None]:
model.save('model_emilydickenson.h5') 
model = load_model('model_emilydickenson.h5')

In [107]:
model_emilydickenson = load_model('model_emilydickenson.h5')

In [108]:
print_output(model_emilydickenson,500,text_data[2],maxlen_emilydickenson,chars_emilydickenson,char_indices_emilydickenson,indices_char_emilydickenson)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: "l rise
when i shall be forgiven,
till ha"


l rise
when i shall be forgiven,
till hair a little loo

  after removing the cwd from sys.path.


k to dee,
  prayer the sem yought be
the smiler away.





xxxiii.

lont the blook, --
  ond the could not be one feel;
i hought little keach wour the ,
ay hear we preays.

seakes your will like morning crown

to child by jubt to greating in
the speein was, miel-we play
for not serasing of neirs.





liii.

a blaker dauter war low meen
contion affience past
condensed of the fanter stood,
and midew and whoo it was brand,
and then the eart her one.





xv.

the harting the morn ha


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: "l rise
when i shall be forgiven,
till ha"


l rise
when i shall be forgiven,
till hair a little look to dee,
  prayes un an adloon for his,
hands must the sun he sun,
some created, and waith beggagl.
it known had would love go do ame,
but horied in she on the himm
  i hilf you.

i 't you where '  my not her fringer
                                                                          

### Eminem

In [109]:
def build_model(nodes,maxlen,chars):
    model = Sequential()
    model.add(LSTM(nodes, input_shape=(maxlen, len(chars)),return_sequences=True))
    model.add(LSTM(nodes))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    return model

In [110]:
model_eminem,x,y,maxlen_eminem,chars_eminem,char_indices_eminem,indices_char_eminem = gen_model(text_data[3],nodes = 256)

72


In [None]:
model.fit(x, y,
          batch_size=256,
          epochs=1000)

In [None]:
model.save('model_eminem.h5') 
model = load_model('model_eminem.h5')

In [111]:
model_eminem = load_model('model_eminem.h5')

In [123]:
print_output(model_eminem,500,text_data[3],maxlen_eminem,chars_eminem,char_indices_eminem,indices_char_eminem)



/---------------------------------- DIVERSITY: 0.010000 ----------------------------------/
----- Generating with seed: "p my window
and i can't see at all
and e"


p my window
and i can't see at all
and even if i c

  after removing the cwd from sys.path.


ould it'll all be gray
put your picture on my wall
it reminds me, that it's not so bad
it's not so bad
mee the rest pick's in the champ,
i's make ma get clops (get a walk on the floor
with me out of this shit on a wrong kits highs were to shit hout that but it on the mic man
i'ma make you ain't got man
i tried to stavil the truts
from a bitch rap is dragged him
on a nas suck, my dick's so fuck you minut
think you fuckin' dis
that i catt coming wates, i'd goin to race this mome out
they


/---------------------------------- DIVERSITY: 0.200000 ----------------------------------/
----- Generating with seed: "p my window
and i can't see at all
and e"


p my window
and i can't see at all
and even if i could it'll all be gray
put your picture on my wall
it reminds me, that it's not so bad
it's not so bad
mee the rest pitefth, shit what happen 'em
in a there and persencing me back out of the white for deals
(ed hell)
or everything, i mare dog, i'm stripped in a bang for the way while i stop 