# Rhyming Sonnet Generation

We follow the guidelines given in the assignment to make sure that every other line in our generated sonnet rhymes.

Introducing rhyme into your poems is not actually that difficult. Since the sonnet follows strict rhyming
patterns, we can figure out what rhymes Shakespeare uses by looking at the last words of rhyming line
pairs, and add this to some sort of rhyming dictionary. Then, we can generate two lines that rhyme by
seeding the end of the line with words that rhyme, and then do HMM generation in the reverse direction.

We look at the last words of rhyming line pairs, and add this to some sort of rhyming dictionary

In [3]:
import os
import numpy as np
from IPython.display import HTML
from itertools import groupby
import re
import random

import Overall_HMM_helper
from Rhyme_HMM import unsupervised_HMM
import Rhyme_HMM_helper

In [4]:
shakespeare = open("data/shakespeare.txt", 'r')

poems = shakespeare.readlines()
split_at = "\n"
final_poems = [list(g)[1:] for k, g in groupby(poems, lambda x: x != split_at) if k]
print("Initial number of poems: {}".format(len(final_poems)))
poem_lengths = [len(poem) for poem in final_poems] 
bad_poems = np.where(np.array(poem_lengths)!= 14)[0]
print ("Sonnets {} and {} are not 14 lines long so we remove them from our list.".format(bad_poems[0], bad_poems[1]))

final_poems = [final_poems[i] for i in np.delete(np.arange(len(final_poems)), bad_poems)]
print("Final number of poems: {}".format(len(final_poems)))
final_poems = [''.join([line.strip(' ') for line in poem]) for poem in final_poems]

Initial number of poems: 154
Sonnets 98 and 125 are not 14 lines long so we remove them from our list.
Final number of poems: 152


In [16]:
# token_map maps words to numbers
# tokenized_poems replaces the words in poems with their corresponding number
tokenized_poems, token_map = Overall_HMM_helper.parse_observations(final_poems)
token_map_r = Overall_HMM_helper.obs_map_reverser(token_map)
flattened_tokenized_poems = [val for sublist in tokenized_poems for val in sublist]

## Syllable Analysis

In [10]:
# Syllables
syllable_file = open("data/Syllable_dictionary.txt", 'r')
syllables = syllable_file.readlines()
syllables = [x.split() for x in syllables]
syllable_dict = {}
"""
for syllable in syllables:
    word = re.sub(r'[^\w]', '', syllable[0])
    syllable_dict[word] = syllable[1:]
"""
# We choose to map words to tuples of lists
# the first list corresponds to the number of syllables if the word were at the end (E)
# the second list corresponds to the number of syllables the word can take anywhere
# E.g. "test": ['E1', '2', '3'] <-> "test": [([1], [2, 3])]
for syllable in syllables:
    word = re.sub(r'[^\w]', '', syllable[0])
    end_syllable_list = []
    regular_syllable_list = []
    for item in syllable[1:]:
        if item[0] == "E":
            end_syllable_list.append(int(item[1:]))
        else:
            regular_syllable_list.append(int(item))
    syllable_dict[word] = (end_syllable_list, regular_syllable_list)
    
# syllable_dict

In [11]:
tokenized_syllable_dict = {}
for key in syllable_dict.keys():
    # If the word in syllable_dict is in our token map, add it to our tokenized_syllable_dict
    try:
        tokenized_syllable_dict[token_map[key]] = syllable_dict[key]
    except KeyError:
        pass
# tokenized_syllable_dict

## Rhyme Analysis

In [12]:
def get_rhyme_pairs(poem):
    rhyme_pairs = []
    last_words = []
    poem = poem.split("\n")
    for line in poem:
        
        word = line.split(" ")[-1]
        word = re.sub(r'[^\w]', '', word).lower()
        last_words.append(word)

    if '' in last_words:
        last_words.remove('')
    
    rhyme_pairs.append((last_words[0], last_words[2]))
    rhyme_pairs.append((last_words[1], last_words[3]))
    rhyme_pairs.append((last_words[4], last_words[6]))
    rhyme_pairs.append((last_words[5], last_words[7]))
    rhyme_pairs.append((last_words[8], last_words[10]))
    rhyme_pairs.append((last_words[9], last_words[11]))
    rhyme_pairs.append((last_words[12], last_words[13]))
    
    return rhyme_pairs

# Now compile all the rhyming words in each poem
rhyming_dict = []
for poem in final_poems:
    rhyming_dict += get_rhyme_pairs(poem)

# rhyming_dict

Next we want to cluster these words and put them into lists of rhyming words. We did this, but we noticed that this clusters some words together that may not actually rhyme in their context, such as "I" and "free." Thus, we elected to just use our earlier method of word pairs.

In [13]:
import networkx as nx

G = nx.Graph()
G.add_edges_from(rhyming_dict)

paths_between_generator = nx.all_simple_paths(G,source="thee",target="i")
nodes_between_set = set()
for path in paths_between_generator:
    for node in path:
        nodes_between_set.add(node)
print(nodes_between_set)

{'me', 'see', 'thee', 'be', 'free', 'i'}


In [15]:
import networkx as nx

G = nx.Graph()
G.add_edges_from(rhyming_dict)
rhyme_clusters = []
for graph in list(nx.connected_component_subgraphs(G)):
    rhyme_clusters.append(graph.nodes())
rhyme_clusters

[['cease', 'excess', 'lease', 'decrease', 'increase', 'decease'],
 ['history',
  'legacy',
  'remedy',
  'masonry',
  'ye',
  'be',
  'die',
  'thereby',
  'fly',
  'qualify',
  'me',
  'memory',
  'flattery',
  'thee',
  'melancholy',
  'husbandry',
  'eye',
  'canopy',
  'eternity',
  'posterity',
  'fortify',
  'idolatry',
  'sky',
  'fee',
  'free',
  'defy',
  'usury',
  'why',
  'i',
  'deny',
  'by',
  'dignity',
  'see',
  'lie',
  'constancy',
  'decree',
  'gravity',
  'majesty',
  'alchemy',
  'enmity'],
 ['subtleties',
  'spies',
  'eyes',
  'arise',
  'cries',
  'lies',
  'despise',
  'devise',
  'prophecies'],
 ['cruel', 'jewel', 'fuel'],
 ['spent',
  'excellent',
  'rent',
  'invent',
  'argument',
  'ornament',
  'monument',
  'content'],
 ['sing',
  'wing',
  'thing',
  'bring',
  'king',
  'spring',
  'niggarding',
  'prefiguring',
  'ordering'],
 ['bow', 'allow', 'bough', 'how', 'brow', 'now', 'mow'],
 ['stelled', 'held', 'field'],
 ['praise', 'decays', 'days', 'lays

In [18]:
hmm = unsupervised_HMM(flattened_tokenized_poems, 2, 10, tokenized_syllable_dict)

pairs = []
for i in range(7):
    pairs.append(Rhyme_HMM_helper.sample_pair(hmm, token_map, rhyming_dict, num_syllables=10))

print('Rhymed Sonnet:\n====================')
print(pairs[0][0])
print(pairs[1][0])
print(pairs[0][1])
print(pairs[1][1])
print(pairs[2][0])
print(pairs[3][0])
print(pairs[2][1])
print(pairs[3][1])
print(pairs[4][0])
print(pairs[5][0])
print(pairs[4][1])
print(pairs[5][1])
print(pairs[6][0])
print(pairs[6][1])

Iteration: 10
Rhymed Sonnet:
And melancholy know this loves white grief;
My and in i doth for graciously say;
But hear are man fair inviting sight chief;
As seeing teeth you set recite let pay;
Gain tillage gilding thus and more the is;
To to leese o and makes joy the uphold;
The look king offence came music look his;
Ashes is is thy i as waste like cold;
Office day of and my my hours pipe blind;
Worser number methinks but in why mind;
Fool that of errors thou and is prove find;
Makes away lame large of imprisoned find;
True complexion glad half thou my did rest;
Answer make eyes mine death to have oppressed;
