Objective:
My objevtive for the machine learning project is to create a Recurrent Neural Network (RNN) that can generate (somewhat) legible poetry based on a collection of poems. To do so, I will clean the data and cut up groups of words into sets of inputs for the network to take. The size of the input i.e. the number of words is yet to be determined considering we want to have a large enough input size to gather important features like rhyme schemes and context for the LSTM cell to remember, but we don't want to be too large in our input size that training is too slow. 

Instead of feeding in actual words, I will encode each word into an identifier, making the unique set of these identifiers as the Y vector for our supervised learning to take place. The vocabulary vector, our Y vector, will be one hot encoded for the word that was actually present, which the LSTM cell will attempt at predicting for each time step. 

link to data : https://www.kaggle.com/ishnoor/poetry-analysis-with-machine-learning

In [1]:
import pandas as pd
import numpy as np

poems = pd.read_csv("all.csv")

In [2]:
poems['length'] = 0
for i in range(len(poems)):
    poems['length'][i] = len(poems['content'][i])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Clean data by deleting null entries etc.

In [3]:
poems = poems.sort_values(by='length') #Sort by length of poem
poems = poems[14:len(poems)-5] # Delete tails on both sides
poems = poems[poems['content'].str.contains('Published')==False]# Eliminate non-poems with 'Published'
print(len(poems))
poems = poems[poems['content'].str.contains('from Selected Poems')==False]# Eliminate non-poems with 'from Selected Poems'
print(len(poems))
poems = poems[poems['content'].str.contains('Collected Poems')==False]# Eliminate non-poems with 'from Collected Poems'
print(len(poems))
#Eliminate where poem is just intro
for ind, row in poems.iterrows():
    if row['author'] in row['content'].upper() or str(row['poem name']) in row['content'][:40]:
        poems = poems.drop([ind])
print(len(poems))

552
536
518
465


In [10]:
num_poems = len(poems)
poem = poems['content'][:num_poems]
poem = poem[poems['length'] > 100]
poem = poem[poems['length'] < 1000]
poem = poem.reset_index(drop=True)
X = poem
num_poems = len(poem)

In [5]:
print(num_poems)

349


Create vocab size and word dictionary

In [12]:
temp = ''
for i in range(num_poems):
    temp += poem[i] + ' '
poem = temp

import re
#poem = re.sub(' +',' ',poem)
poem = poem.lower()
poem = re.findall(r'[\w]+|[\'!"#$%&()*+,-./:;<=>?@[\]^_`{|}~]',poem)
words = list(set(poem))
vocab_size = len(words)
#print(vocab_size)


5528


In [13]:
print(X.describe())

count                                                   349
unique                                                  311
top       When I was fair and young, then favor graced m...
freq                                                      3
Name: content, dtype: object


In [16]:
X[0]

'The fog comes on little cat feet.  It sits looking over harbor and city on silent haunches and then moves on.'

In [15]:
for i in range(len(X)):
    X[i] = X[i].replace("\r\n"," ")  

In [17]:
from keras.preprocessing.text import  Tokenizer
from keras.preprocessing.sequence import pad_sequences

In [18]:
tokenizer = Tokenizer( num_words=vocab_size)

In [19]:
tokenizer.fit_on_texts(X)

In [20]:
text = tokenizer.texts_to_sequences(X)
text = pad_sequences(text, maxlen=1000)

In [21]:
word_dict = tokenizer.word_index

In [22]:
maxwords = len(word_dict)

In [23]:
print(maxwords,vocab_size)

5624 5528


In [24]:
count = 0
for key,value in tokenizer.word_counts.items():
    count += 1
    print(key,value)
    
print(count)

the 1527
fog 3
comes 18
on 181
little 49
cat 1
feet 11
it 211
sits 7
looking 6
over 21
harbor 1
and 1403
city 2
silent 6
haunches 1
then 124
moves 6
new 30
yeare 2
forth 15
out 65
of 751
janus 1
gate 6
doth 123
seeme 1
to 693
promise 6
hope 12
delight 26
bidding 3
thold 1
adieu 6
his 200
pass 9
no 164
crooked 1
leg 1
bleared 1
eye 33
part 9
deformed 1
kind 19
nor 79
yet 78
so 226
ugly 5
half 5
can 87
be 270
as 229
is 327
inward 5
suspicious 1
mind 39
suddenly 4
discovering 1
in 646
eyes 91
very 13
beautiful 3
normande 1
cocotte 1
learned 1
british 1
museum 1
assistant 1
wine 3
at 82
mouth 12
love 333
thats 5
all 258
we 77
shall 127
know 49
for 264
truth 19
before 29
grow 22
old 47
die 17
i 731
lift 4
glass 8
my 611
look 35
you 239
sigh 11
only 45
wanderer 3
knows 16
england's 3
graces 6
or 128
anew 3
see 74
clear 9
familiar 3
faces 6
who 79
loves 33
joy 22
he 95
that 549
dwells 5
shadows 11
do 130
not 263
forget 19
me 354
quite 5
o 84
severn 6
meadows 6
notes 8
her 328
truly 6
your 140

guilty 2
less 8
wrought 8
destiny 3
won 4
spirit 13
marriage 3
he's 2
plans 2
useless 2
indeed 5
we'll 5
cotswold 3
sheep 2
feed 9
quietly 3
heed 5
quick 6
driving 3
small 11
died 5
nobly 2
cover 7
violets 2
purple 4
side 5
thick 3
memoried 2
wet 6
somehow 2
boy 8
powr 1
fickle 1
sickle 1
hour 8
hast 15
waning 2
therein 3
showst 1
withering 1
self 25
growst 3
nature 12
sovereign 7
wrack 1
goest 1
onwards 1
keeps 4
purpose 3
skill 11
disgrace 9
wretched 2
minute 1
kill 8
minion 1
detain 1
audit 1
delayed 1
answered 4
quietus 1
render 2
cares 6
naked 6
tire 2
reaped 1
sheaves 1
sacred 5
alarms 2
swords 5
arms 8
velvet 1
glorious 7
outfacing 1
crowned 3
command 1
torches 1
nuptial 1
shades 5
underground 2
arrived 1
guest 4
beauteous 5
spirits 4
engirt 2
iope 2
helen 3
stories 2
finished 1
tongue 16
wilt 19
banqueting 2
delights 9
masques 3
revels 3
youth 18
tourneys 2
challenges 2
knights 5
triumphs 2
beautys 8
sake 4
honours 2
didst 5
murder 2
drawn 2
feeds 3
flocks 5
evergreens 1
fruit 

dawned 1
fasted 1
together 5
cooled 1
stream 4
stronger 2
slept 1
bided 1
longer 3
blossomed 1
plaits 2
pillow 2
threaded 2
filigree 2
uncanny 2
brow 9
sleeps 7
winsome 2
composed 2
bride 2
darling 4
thrushes 2
evenings 2
weret 1
aught 3
canopy 2
extern 1
bases 1
eternity 3
proves 1
ruining 1
dwellers 1
form 2
paying 1
rent 1
compound 1
forgoing 1
savour 2
pitiful 1
thrivers 1
gazing 1
obsequious 1
oblation 1
mixd 1
seconds 1
mutual 1
hence 3
subornd 1
informer 1
impeachd 1
stands 3
control 2
haste 3
meets 1
yesterday 1
dare 5
dim 2
terror 1
feebled 1
towards 2
weigh 5
foe 6
sustain 1
prevent 1
adamant 1
iron 1
fram'd 1
skilful 1
design'd 1
mother's 1
womb 2
deriv'd 2
due 5
descent 1
large 5
richesse 1
third 3
life's 1
ornament 3
raised 3
alive 1
praised 2
elizabeths 1
accents 2
wrapp'd 1
ravisher 1
perish 2
ceasing 1
ne'er 1
return 2
sympathy 1
pursue 2
scornful 2
breath'd 1
fell 2
defac'd 1
outworn 1
lofty 5
towers 4
ras'd 1
brass 5
rage 4
ocean 5
advantage 1
kingdom 1
firm 1
wat'ry 

tillage 1
husbandry 1
fond 1
tomb 3
posterity 2
despite 5
wrinkles 1
remembred 1
single 2
monuments 1
outlive 1
powerful 1
rhyme 3
contents 1
unswept 1
besmeared 1
sluttish 1
statues 1
overturn 1
broils 1
masonry 1
mars 1
sword 3
record 1
oblivious 1
enmity 2
judgement 2
yourself 3
bless 1
lucky 3
embased 1
graced 5
invent 1
enchased 1
deignd 1
relent 1
thrall 1
setting 1
uplifting 1
degree 1
lance 1
guided 1
obtain'd 1
prize 1
judgment 1
france 1
horsemen 1
horsemanship 1
advance 1
town 1
folks 3
daintier 1
applies 1
sleight 1
impute 1
excel 1
awry 1
rises 2
spreads 2
bath 3
cloth 2
underneath 2
sunbeams 2
glistening 2
mellow 5
glows 5
stoops 2
sponge 2
swung 2
gloire 2
dijon 2
drips 2
herself 3
glisten 2
crumple 2
listen 2
sluicing 2
dishevelled 2
petals 2
sunlight 2
concentrates 2
until 2
pied 2
dressed 2
trim 1
saturn 2
smell 4
different 1
odour 1
summers 7
lap 3
lilys 1
vermilion 2
pattern 2
seemd 1
despise 1
dote 1
tune 1
delighted 1
feeling 3
touches 1
prone 1
taste 6
invited 1


faine 1
pretely 1
sees 1
finely 1
tricks 1
fooles 1
hire 2
tirannies 1
jugling 1
theyr 9
slieghts 1
abuse 1
nimble 1
delightful 1
butt 6
childlike 1
refuse 3
breathes 1
conclusions 1
brags 1
paleness 2
inseparate 1
learns 1
prest 3
sweats 1
darlings 1
clothe 2
wherein 2
cherries 6
fairly 2
orient 2
fill'd 1
peer 2
brows 2
bended 2
bows 2
threat'ning 2
piercing 2
attempt 2
deeme 1
vilde 1
greife 1
joyings 1
fauning 1
smiling 1
appeers 1
griefe 3
grone 1
envies 1
ly 1
els 1
harmes 1
rely 1
frosts 1
surfett 1
burne 1
margaret 3
midsummer 3
tower 3
solace 1
gladness 1
madness 1
badness 1
joyously 1
maidenly 1
womanly 1
demeaning 1
indite 1
patient 1
isaphill 1
coriander 1
pomander 1
cassander 1
courteous 1
pack 1
banish 1
mount 1
larks 2
aloft 1
morrow 7
borrow 3
prune 1
nightingale 1
redbreast 1
furrow 2
thrush 1
stare 3
linnet 1
elves 1
yourselves 1
gladly 1
began 3
whan 1
assays 1
patience 2
denays 1
approved 1
darkest 2
becoming 2
fitt 2
lightsome 2
darknes 2
oprest 2
mirthe 2
controle

In [25]:
embedding_matrix = np.zeros((maxwords,50))

In [26]:
with open('glove.6B.50d.txt') as f:
    for line in f:
        l = line.split()
        if l[0] in word_dict:
            indx = word_dict[l[0]]
            for i in range(50):
                embedding_matrix[indx-1][i] = l[i+1]
    

In [27]:
embedding_matrix[-50:]

array([[ 0.61869 , -0.7303  , -0.52154 , ...,  0.27815 , -0.795   ,
        -1.3784  ],
       [-1.0045  , -0.18225 ,  0.4153  , ..., -0.545   , -0.33232 ,
         0.051445],
       [-0.63147 ,  0.10974 ,  1.4975  , ...,  1.3709  ,  1.1515  ,
         0.14985 ],
       ...,
       [ 0.      ,  0.      ,  0.      , ...,  0.      ,  0.      ,
         0.      ],
       [ 0.56934 , -1.8914  ,  0.31425 , ...,  0.047203,  0.10582 ,
        -0.46135 ],
       [ 0.050716, -0.73237 ,  0.77412 , ..., -0.22416 , -0.76625 ,
         0.14384 ]])

RNN Model:

In [37]:
from keras.models import Sequential, Model
from keras.layers import Embedding, LSTM, Dropout, TimeDistributed, Dense, Activation, Input
from keras.optimizers import Adam


num_steps = 1000
hidden_size = 350
use_dropout = False




optimizer = Adam(0.0002, 0.5)

In [None]:
def generator():
    model = Sequential()
    model.add(Embedding(maxwords, hidden_size, input_length=num_steps))
    model.add(LSTM(hidden_size, return_sequences=True))
    # model.add(LSTM(hidden_size, return_sequences=True))
    if use_dropout:
        model.add(Dropout(0.5))
    model.add(TimeDistributed(Dense(maxwords)))
    model.add(Activation('softmax'))
    model.layers[0].weight=embedding_matrix
    model.layers[0].trainable=False
    
    noise = Input(shape=(num_steps,))
    gen_poem = model(noise)

    return Model(noise, gen_poem)

In [35]:

def discriminator():
    model = Sequential()
    model.add(Embedding(maxwords, hidden_size, input_length=num_steps))
    model.add(LSTM(hidden_size, return_sequences=False))
    # model.add(LSTM(hidden_size, return_sequences=True))
    if use_dropout:
        model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.layers[0].weight=embedding_matrix
    model.layers[0].trainable=False
    
    i_poem = Input(shape=(num_steps,))
    validity = model(i_poem)

    return Model(i_poem, validity)

In [38]:
# Build and compile the discriminator
discriminator = discriminator()
discriminator.compile(loss='binary_crossentropy',
    optimizer=optimizer,
    metrics=['accuracy'])

# Build the generator
generator = generator()

# The generator takes noise as input and generates poem
z = Input(shape=(num_steps,))
g_poem = generator(z)

# For the combined model we will only train the generator
discriminator.trainable = False

# The discriminator takes generated poems as input and determines validity
validity = discriminator(g_poem)

# The combined model  (stacked generator and discriminator)
# Trains the generator to fool the discriminator
combined = Model(z, validity)
combined.compile(loss='binary_crossentropy', optimizer=optimizer)

ValueError: Shape must be rank 3 but is rank 2 for 'model_1/sequential_4/lstm_4/Tile' (op: 'Tile') with input shapes: [?,350,1], [2].

Train Model