<h1>Poem Generator using RNN</h1>

Welcome to the notebook!<br>
We'll be generating poems by taking the seed text, the style of the poem (Shakespearian Sonnets or Irish lyrics), and the number of words to be predicted as an input from the user.<br>
<br>
Let's get started by importing the necessary libraries.

In [None]:
import tensorflow
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
import tensorflow.keras.utils as ku 
import numpy as np

Let's download both the datasets (Shakespearian Sonnets and Irish lyrics).

In [None]:
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/irish-lyrics-eof.txt \
    -O /tmp/irish-lyrics-eof.txt
data_irish = open('/tmp/irish-lyrics-eof.txt').read()

--2020-09-05 15:19:56--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/irish-lyrics-eof.txt
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.204.128, 172.217.203.128, 74.125.31.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.204.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68970 (67K) [text/plain]
Saving to: ‘/tmp/irish-lyrics-eof.txt’


2020-09-05 15:19:57 (70.2 MB/s) - ‘/tmp/irish-lyrics-eof.txt’ saved [68970/68970]



In [None]:
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sonnets.txt \
    -O /tmp/sonnets.txt
data_shakespeare = open('/tmp/sonnets.txt').read()

--2020-09-05 15:19:57--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/sonnets.txt
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.204.128, 172.217.203.128, 173.194.210.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.204.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93578 (91K) [text/plain]
Saving to: ‘/tmp/sonnets.txt’


2020-09-05 15:19:57 (55.6 MB/s) - ‘/tmp/sonnets.txt’ saved [93578/93578]



Here we focus on two important steps we take regarding **feature engineering**:


1.   Tokenization
2.   Padding

###Tokenization
The process involves building a word index dictionary which tokenizes each word in the training set i.e. we index every unique word in the training dataset.<br>
<br>
###Padding
Every sentence in the dataset would not have the same number of words obviously. Now this is a problem because neural networks expect a fixed input size.<br>
We overcome this problem by keeping a fixed input size of 150 and pad the shorter sentences such that they possess the same shape. Also, we have set the truncation type to *post* which means, when it comes to sentences longer than 150 words, only the first 150 words would be taken into account and the rest of the words would simply be trimmed off.<br>
<br>
Let's perform this for both Shakesperian Sonnets and Irish lyrics.

In [None]:
tokenizer_irish = Tokenizer()

corpus_irish = data_irish.lower().split("\n")

tokenizer_irish.fit_on_texts(corpus_irish)
total_words_irish = len(tokenizer_irish.word_index) + 1

# create input sequences using list of tokens
input_sequences_irish = []
for line in corpus_irish:
	token_list = tokenizer_irish.texts_to_sequences([line])[0]
	for i in range(1, len(token_list)):
		n_gram_sequence = token_list[:i+1]
		input_sequences_irish.append(n_gram_sequence)


# pad sequences 
max_sequence_len_irish = max([len(x) for x in input_sequences_irish])
input_sequences_irish = np.array(pad_sequences(input_sequences_irish, maxlen=max_sequence_len_irish, padding='pre'))

# create predictors and label
predictors_irish, label_irish = input_sequences_irish[:,:-1],input_sequences_irish[:,-1]

label_irish = ku.to_categorical(label_irish, num_classes=total_words_irish)

In [None]:
tokenizer_shakespeare = Tokenizer()

corpus_shakespeare  = data_shakespeare.lower().split("\n")

tokenizer_shakespeare.fit_on_texts(corpus_shakespeare)
total_words_shakespeare = len(tokenizer_shakespeare.word_index) + 1

# create input sequences using list of tokens
input_sequences_shakespeare = []
for line in corpus_shakespeare:
	token_list = tokenizer_shakespeare.texts_to_sequences([line])[0]
	for i in range(1, len(token_list)):
		n_gram_sequence = token_list[:i+1]
		input_sequences_shakespeare.append(n_gram_sequence)


# pad sequences 
max_sequence_len_shakespeare = max([len(x) for x in input_sequences_shakespeare])
input_sequences_shakespeare = np.array(pad_sequences(input_sequences_shakespeare, maxlen=max_sequence_len_shakespeare, padding='pre'))

# create predictors and label
predictors_shakespeare, label_shakespeare = input_sequences_shakespeare[:,:-1],input_sequences_shakespeare[:,-1]

label_shakespeare = ku.to_categorical(label_shakespeare, num_classes=total_words_shakespeare)

Now, let us import the models that we had trained and saved in the other notebooks!

In [None]:
model_irish = tensorflow.keras.models.load_model('irish_model.h5')
model_shakespeare = tensorflow.keras.models.load_model('shakespeare_model.h5')

Here we define a function that uses the imported trained models to generate the poems.<br>
Note the parameters specified as the function's input.

In [None]:
def generate(seed_text, next_words, model_name):
  if model_name=='irish':
    for _ in range(next_words):
	    token_list = tokenizer_irish.texts_to_sequences([seed_text])[0]
	    token_list = pad_sequences([token_list], maxlen=max_sequence_len_irish-1, padding='pre')
	    predicted = model_irish.predict_classes(token_list, verbose=0)
	    output_word = ""
	    for word, index in tokenizer_irish.word_index.items():
		    if index == predicted:
			    output_word = word
			    break
	    seed_text += " " + output_word
    return(seed_text)
  elif model_name=='shakespeare':
    for _ in range(next_words):
	    token_list = tokenizer_shakespeare.texts_to_sequences([seed_text])[0]
	    token_list = pad_sequences([token_list], maxlen=max_sequence_len_shakespeare-1, padding='pre')
	    predicted = model_shakespeare.predict_classes(token_list, verbose=0)
	    output_word = ""
	    for word, index in tokenizer_shakespeare.word_index.items():
		    if index == predicted:
			    output_word = word
			    break
	    seed_text += " " + output_word
    return(seed_text)
  else:
    print('Invalid model name!')
    return      


<h2>Let's test our final function!</h2>

In [None]:
seed_text = "shubham my love"
next_words = 20
model_name = 'irish'

In [None]:
generate(seed_text, next_words, model_name)

'shubham my love is fairer than any day and tried to take them from me go word by corporal casey i love so'

In [None]:
seed_text = "Jeet ate pizza today"
next_words = 20
model_name = 'shakespeare'

In [None]:
generate(seed_text, next_words, model_name)

"Jeet ate pizza today no praise wilt but the expense by those old erred some messengers worthier thee hast gone ' ' ' '"

<h2>Perfect!</h2>

*Conclusion:* We have successfully implemented the function to generate poems taking:
1.    Seed text
2.    No. of words to be generated
3.    Style of the poem

to generate appropriate output using the trained RNN models from the other notebooks.