## Project Members:
    Crystal Zhu (cyz22)
    Lyra D'Souza (lad279)

## Brief Description

The purpose of our project was to leverage what we have learned about stress and rhyme through our work with HFST and what we have learned about sentence structure and building a grammar through work with CFGs and FCFGs to create a Shakespearean sonnet generator. Connecting these two topics was a challenging but interesting task, as there is no large-scale English grammar, we could use to verify the validity of the output we generated. Thus, we used HFST to define the rhyme classes and stress classes we use, and incorporated the vocabulary we were able to extract from HFST into a context-free grammar that guides how our generator composes its sonnets, but we do not have functionality for verifying if a sentence makes sense in English. 

Given more resources, it might have been interesting to experiment with ways to create semantically viable sentences, but based on our research, it appears that most of the options for doing that involve NLP concepts beyond the scope of this class and not really related to the concepts we have learned. There is (understandably) no comprehensive semantic parser for the English language made publicly available, nor is there a viable grammar to base a generator on, thus we were limited in what we could achieve by a grammar and a parser designed and implemented by us. As a result, our generator is not as sophisticated as we had hoped it could be, but we worked very hard to give it the capacity to generate viable sonnets, though the poems produced do leave much of their meaning to the imagination and discretion of the user.


## Setup

In [1]:
# Imports

import hfst_dev as hfst
import graphviz
import random

import itertools
import random
import re
from nltk.probability import FreqDist
from nltk import CFG
from nltk import grammar, parse
from nltk.parse.generate import generate
from nltk.parse.util import load_parser

In [2]:
# Stream English 

istream = hfst.HfstInputStream('English')
assert istream.is_good() == True
English = istream.read()
istream.close()

In [3]:
# Set up definitions from phoneclass.fst 

defs = {'English' : English}

VowAA = hfst.regex('AA0 | AA1 | AA2', definitions=defs)
defs['VowAA'] = VowAA
VowAE = hfst.regex('AE0 | AE1 | AE2', definitions=defs)
defs['VowAE'] = VowAE
VowAH = hfst.regex('AH0 | AH1 | AH2', definitions=defs)
defs['VowAH'] = VowAH
VowAO = hfst.regex('AO0 | AO1 | AO2', definitions=defs)
defs['VowAO'] = VowAO
VowAW = hfst.regex('AW0 | AW1 | AW2', definitions=defs)
defs['VowAW'] = VowAW
VowAY = hfst.regex('AY0 | AY1 | AY2', definitions=defs)
defs['VowAY'] = VowAY
VowEH = hfst.regex('EH0 | EH1 | EH2', definitions=defs)
defs['VowEH'] = VowEH
VowER = hfst.regex('ER0 | ER1 | ER2', definitions=defs)
defs['VowER'] = VowER
VowEY = hfst.regex('EY0 | EY1 | EY2', definitions=defs)
defs['VowEY'] = VowEY
VowIH = hfst.regex('IH0 | IH1 | IH2', definitions=defs)
defs['VowIH'] = VowIH
VowIY = hfst.regex('IY0 | IY1 | IY2', definitions=defs)
defs['VowIY'] = VowIY
VowOW = hfst.regex('OW0 | OW1 | OW2', definitions=defs)
defs['VowOW'] = VowOW
VowOY = hfst.regex('OY0 | OY1 | OY2', definitions=defs)
defs['VowOY'] = VowOY
VowUH = hfst.regex('UH0 | UH1 | UH2', definitions=defs)
defs['VowUH'] = VowUH
VowUW = hfst.regex('UW0 | UW1 | UW2', definitions=defs)
defs['VowUW'] = VowUW

Vow0 = hfst.regex('AH0| IH0| ER0| IY0| OW0| AA0| EH0| UW0| AE0| AO0| AY0| EY0| AW0| UH0| OY0', definitions=defs)
defs['Vow0'] = Vow0
Vow1 = hfst.regex('EH1| AE1| AA1| IH1| IY1| EY1| OW1| AO1| AY1| AH1| UW1| ER1| AW1| UH1| OY1', definitions=defs)
defs['Vow1'] = Vow1
Vow2 = hfst.regex('EH2| EY2| AE2| AY2| AA2| IH2| OW2| IY2| AO2| UW2| AH2| AW2| ER2| UH2| OY2', definitions=defs)
defs['Vow2'] = Vow2

Vow = hfst.regex('Vow0 | Vow1 | Vow2', definitions=defs)
defs['Vow'] = Vow

Nas = hfst.regex('N | M | NG', definitions=defs)
defs['Nas'] = Nas

Phone = hfst.regex('AH0| N| S| L| T| R| K| D| IH0| M| Z| ER0| IY0| B| EH1| P| AE1| AA1| IH1| F| G| V| IY1| NG| HH| EY1| W| SH| OW1| OW0| AO1| AY1| AH1| UW1| JH| Y| CH| AA0| ER1| EH2| EY2| AE2| AY2| AA2| EH0| IH2| TH| AW1| OW2| UW0| IY2| AO2| AE0| UH1| AO0| AY0| UW2| AH2| EY0| OY1| AW2| DH| ZH| ER2| UH2| AW0| UH0| OY2| OY0', definitions = defs)
defs['Phone'] = Phone

Cons = hfst.regex('[Phone - Vow]', definitions = defs)
defs['Cons'] = Cons

## Creating Classes of Words Based on Stress

We use hfst to create classes of words that we can use to construct iambic pentameter based on the stress classes provided to us: primary stress (s1), secondary stress (s2), and unstressed (s0). We have only created machines for the stress cases that are relevnat to the unstressed-stressed pattern of iambic pentameter, so for example, something like s0s0 would not be helpful, therefore we have excluded it.

### One Syllable Words

#### Stressed

In [4]:
expr = '[English .o. [[ Cons* Vow1 Cons* ].l]].u'
n = hfst.regex(expr, definitions=defs)
defs["s1"] = n

#### Unstressed

In [5]:
expr = '[English .o. [[ Cons* Vow0 Cons* ].l]].u'
n = hfst.regex(expr, definitions=defs)
defs["s0"] = n

### Two Syllable Words

#### Main stress first

In [6]:
expr = '[English .o. [[Cons* Vow1 Cons* Vow0 Cons*]].l].u'
n = hfst.regex(expr, definitions=defs)
defs["s1s0"] = n

expr = '[English .o. [[Cons* Vow1 Cons* Vow2 Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["s1s2"] = m

#### Main stress second 

In [7]:
expr = '[English .o. [[Cons* Vow0 Cons* Vow1 Cons*]].l].u'
n = hfst.regex(expr, definitions=defs)
defs["s0s1"] = n

expr = '[English .o. [[Cons* Vow2 Cons* Vow1 Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["s2s1"] = m

### Three Syllable Words

#### Stressed, unstressed, stressed

In [8]:
expr = '[English .o. [[ Cons* Vow1 Cons* Vow0 Cons* Vow1 Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["s1s0s1"] = m

expr = '[English .o. [[ Cons* Vow1 Cons* Vow0 Cons* Vow2 Cons*]].l].u'
n = hfst.regex(expr, definitions=defs)
defs["s1s0s2"] = n

expr = '[English .o. [[ Cons* Vow1 Cons* Vow2 Cons* Vow1 Cons*]].l].u'
o = hfst.regex(expr, definitions=defs)
defs["s1s2s1"] = o

expr = '[English .o. [[ Cons* Vow2 Cons* Vow0 Cons* Vow2 Cons*]].l].u'
p = hfst.regex(expr, definitions=defs)
defs["s2s0s2"] = p

expr = '[English .o. [[ Cons* Vow2 Cons* Vow0 Cons* Vow1 Cons*]].l].u'
q = hfst.regex(expr, definitions=defs)
defs["s2s0s1"] = q

expr = '[English .o. [[ Cons* Vow1 Cons* Vow2 Cons* Vow2 Cons*]].l].u'
r = hfst.regex(expr, definitions=defs)
defs["s1s2s2"] = r

expr = '[English .o. [[ Cons* Vow2 Cons* Vow2 Cons* Vow1 Cons*]].l].u'
s = hfst.regex(expr, definitions=defs)
defs["s2s2s1"] = s

#### Unstressed, stressed, unstressed

In [9]:
expr = '[English .o. [[ Cons* Vow0 Cons* Vow1 Cons* Vow0 Cons*]].l].u'
n = hfst.regex(expr, definitions=defs)
defs["s0s1s0"] = n

expr = '[English .o. [[ Cons* Vow0 Cons* Vow1 Cons* Vow2 Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["s0s1s2"] = m

expr = '[English .o. [[ Cons* Vow0 Cons* Vow2 Cons* Vow0 Cons*]].l].u'
o = hfst.regex(expr, definitions=defs)
defs["s0s2s0"] = o

expr = '[English .o. [[ Cons* Vow2 Cons* Vow1 Cons* Vow0 Cons*]].l].u'
p = hfst.regex(expr, definitions=defs)
defs["s2s1s0"] = p

expr = '[English .o. [[ Cons* Vow2 Cons* Vow1 Cons* Vow2 Cons*]].l].u'
q = hfst.regex(expr, definitions=defs)
defs["s2s1s2"] = q

In some cases when constructing our vocabulary and our grammar, it became necessary to verify the stress pattern of a word. Below is the code we used to view the lower side representation of a given word based on the streamed in English dictionary in hfst. It is worth noting that hfst is able to generate many words that don't show up when explciitly called for in the format below (e.g. when streamed in, English.hfst contains the word "balloon" and will generate it for a regex match of words conatining the vowel sound 'VowUW,' but when checking for matches with the orthography "balloon" in English, a ValueError is thrown as no matches are found, so the sample population is 0). This is a distinct drawback of the hfst functionality, but we were able to work around it by googling such cases.

In [10]:
def sample_input(x,n=1,cycles=3):
        x2 = x.copy()
        x2.input_project()
        x2.minimize()
        return(random.sample(set(x2.extract_paths(max_cycles=3).keys()),n))

# Example word: sakura
expr = '[{plane} .o. English].l'
m = hfst.regex(expr, definitions=defs)
sample_input(m)

['PLEY1N']

## Generating Rhyme Classes

We defined a rhyme between two words as them containing the same final vowel sound. Thus, we consider all words that share a final vowel sound to be rhymes of each other, and have created classes for each of the vowel sounds defined in HFST.

In [11]:
expr = '[English .o. [[ Phone* VowAA Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAA"] = m

expr = '[English .o. [[ Phone* VowAE Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAE"] = m

expr = '[English .o. [[ Phone* VowAH Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAH"] = m

expr = '[English .o. [[ Phone* VowAO Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAO"] = m

expr = '[English .o. [[ Phone* VowAW Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAW"] = m

expr = '[English .o. [[ Phone* VowAY Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeAY"] = m

expr = '[English .o. [[ Phone* VowEH Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeEH"] = m

expr = '[English .o. [[ Phone* VowER Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeER"] = m

expr = '[English .o. [[ Phone* VowEY Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeEY"] = m

expr = '[English .o. [[ Phone* VowIH Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeIH"] = m

expr = '[English .o. [[ Phone* VowIY Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeIY"] = m

expr = '[English .o. [[ Phone* VowOW Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeOW"] = m

expr = '[English .o. [[ Phone* VowUH Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeUH"] = m

expr = '[English .o. [[ Phone* VowUW Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
defs["rhymeUW"] = m

Based on the rhyme classes we define above through the use of hfst, we use sampling to generate a vocabulary for our end rhymes

In [12]:
def sample_input(x,n=10,cycles=3):
        x2 = x.copy()
        x2.input_project()
        x2.minimize()
        return(random.sample(set(x2.extract_paths(max_cycles=3).keys()),n))

# Example vowel sound: VowUW
expr = '[English .o. [[ Phone* VowUW Cons*]].l].u'
m = hfst.regex(expr, definitions=defs)
sample_input(m)

['cre|w',
 'broadview',
 'loew',
 'to|olro|om',
 'harpo|ons',
 'kluge',
 'tabo|o',
 'rhe|w',
 's|hutes',
 'mahmo|od']

Below is the set of lists we developed for each of the vowel-based rhyming classes based on our sampling.

In [13]:
AW = ["bloodhound", "crowns", "blackout", "reroute", "loud", "hometown", "scowl", "countdown", "rouse", "mount"]
AY = ["devise", "privatize", "bribe", "modernize", "coincide", "chimes", "deprived", "reunite", "apprise", "knifelike"]
EH = ["doorsteps", "aspects", "flare", "sleepwear", "pens", "pastel", "bullpen", "pipette"]
ER = ["rewired", "spindler", "harvesters", "thunders", "lowered", "gander", "prisoners", "trimmer", "scholar", "modern"]
EY = ["prepaid", "gateways", "blockades", "replace", "cliched", "acclimate", "drain", "birthdays", "upscale", "sedate"]
IH = ["hallways", "parades", "dislocate", "hurricane", "escape", "downplay", "shortchange", "lace", "days"]
IY = ["coyote", "squeaky", "delete", "cheek", "cream", "blackberry", "publicly", "blatantly"]
OW = ["yolks", "chrome", "intone", "pronto", "sorrow", "disowned", "potatoes", "mole", "notes"]
OY = ["decoy", "convoy", "noise", "annoy", "purloin", "steroid", "datapoint", "boy", "tabloids", "soy"]
UH = ["wolves", "underwood", "scrapbooks", "cooked", "schedules", "cookbooks", "endure", "understood", "rook", "woods"]
UW = ["balloons", "typhoons", "duped", "croon", "loon", "resume", "ingenue", "remove", "lawsuit", "troops"]

Based on the lists generated above, we created a dictionary of words in our vocabulary mapped to the rhyme class of their last syllable. As this is a rather large dictionary, we wrote it to to an external text file that we then import below to use moving forward.

In [14]:
# Retrieve rhyme dictionary from external file 
from rhyme_dict import rhyme_dict

## Generate Iambic Pentameter

Here we use the stress classes defined above to generate lines of iambic pentameter. 

In [15]:
def sample_input(x,n=8,cycles=3):
        x2 = x.copy()
        x2.input_project()
        x2.minimize()
        return random.sample(set(x2.extract_paths(max_cycles=3).keys()),n)[0].replace('|', '')

In [16]:
def sample(wordClasses : list, defs) -> (str, str):
    # [wordClasses] a LIST of [word class, frequency] lists, with word classes defined in [defs]
    # Frequencies should add up to 1
    # 
    # Returns: (word class, sample)
    r = random.random()
    for wordClass in wordClasses:
        r -= wordClass[1]
        if r < 0:
            return (wordClass[0], sample_input(hfst.regex(wordClass[0], definitions=defs), n=1))

In [17]:
def classesToList(lst, wordClasses : dict):
    result = []
    sumFreq = 0
    for wc in lst:
        result.append([wc, wordClasses[wc]])
        sumFreq += wordClasses[wc]
    for i in range(len(result)):
        result[i][1] /= sumFreq
    return result

In [18]:
def generate_iambs(wordClasses : dict, defs):
    # [wordClasses] a DICTIONARY mapping word classes (defined in [defs]) to frequencies
    # Frequencies should add up to 1
    # Each element: s1 (primary), s2 (secondary), s0 (unstressed)
    syllables = []
    words_out = []
    index = 0
    while index < 10:
        if index == 0:
            preLst = ["s0", "s0s1", "s2s1", "s0s1s0", "s0s1s2", "s0s2s0", "s2s1s0", "s2s1s2"]
        elif index == 8:
            if index % 2 == 0:
                preLst = ["s0", "s0s1"]
                if syllables[index - 1] == "s1":
                    preLst.extend(["s2s1"])
        elif index == 9:
            preLst = ["s1"]
        else:
            # Unstressed
            if index % 2 == 0:
                preLst = ["s0", "s0s1", "s0s1s0", "s0s1s2", "s0s2s0"]
                if syllables[index - 1] == "s1":
                    preLst.extend(["s2s1", "s2s1s0", "s2s1s2"])
            # Stressed
            else:
                preLst = ["s1", "s1s0", "s1s2", "s1s0s1", "s1s0s2", "s1s2s1", "s1s2s2"]
                if syllables[index - 1] == "s0":
                    preLst.extend(["s2s0s2", "s2s0s1", "s2s2s1"])
                    
        lst = classesToList(preLst, wordClasses)      
        wordClass, word = sample(lst, defs)
        wordSyl = wordClass.split("s")
        for syl in wordSyl:
            if syl != "":   
                syllables.append("s" + syl) 
                index += 1
        words_out.append(word)
        
    return words_out

## Creating the Grammar

Below we create a base grammar with stress defined for each of the terminals, which we then use to generate our final grammar that contains all possible non-terminal constructions with their stress decomposition included.

In [19]:
from random import randint

class wordIterable:
    def __init__(self, file, n):
        p1 = load_parser(file, trace=0, cache=False)
        g1 = p1.grammar()
        self.parser = p1
        self.grammar = g1
        self.n = n
        self.sents = []
        gen1 = generate(g1, depth=2*n)
        s_temp = list(gen1)
        for s in s_temp:
            if len(s) == self.n and self.check_grammatical(s) and s not in self.sents:
                self.sents.append(s)
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if len(self.sents) > 0:
            i = randint(0, len(self.sents) - 1)
            sent = self.sents[i]
            del self.sents[i]
            return sent
        else:
            raise StopIteration
    
    def check_grammatical(self, s):
        try:
            t = next(self.parser.parse(s))
            return True
        except StopIteration:
            return False

In [20]:
# Encoding lists as dictionary.keys() makes the program run faster
wordClasses = {"s0": 1/16, "s1": 1/16, "s0s1": 1/16, "s1s0": 1/16, "s2s1": 1/16, "s1s2": 1/16, "s0s1s0": 1/16, "s0s1s2": 1/16, "s0s2s0": 1/16, "s2s1s0": 1/16, "s2s1s2": 1/16, "s1s0s1": 1/16, "s1s0s2": 1/16, "s1s2s1": 1/16, "s2s0s2": 1/16, "s2s0s1": 1/16}
rhymePatterns = {"AA": 0, "AE": 0, "AH": 0, "AO": 0, "AW": 0, "AY": 0, "EH": 0, "ER": 0, "EY": 0, "IH": 0, "IY": 0, "OW": 0, "OY": 0, "UH": 0, "UW": 0}

In [21]:
tail = 0
with open('grammar.fcfg', 'r') as f:
    lines = []
    for line in f:
        if line != '\n':
            tail += 1
            lines.append(line.strip())

In [22]:
def check_sentence_left(left):
    if '[' not in left and left == 'S' or left.split('[')[0] == 'S':
        return True
    return False

In [23]:
def rule_add_stress(left, right, sp):
# left: one symbol
# right: [symbol1, symbol2] or [symbol1]
# sp: stress patterns for symbols 1 and 2 [sp1, sp2]
# return (left, right)

    if check_sentence_left(left):
        left_stress = left
    else:
        if '[' in left:
            ind = left.find('[')
            left_stress = left[:ind] + '_' + ''.join(sp) + left[ind:]
        else:
            left_stress = left + '_' + ''.join(sp)
        
    right_stress = []
    for i in range(len(right)):
        if '[' in right[i]:
            ind = right[i].find('[')
            right_stress.append(right[i][:ind] + '_' + sp[i] + right[i][ind:])
        else:
            right_stress.append(right[i] + '_' + sp[i])
            
    return left_stress + ' -> ' + right_stress[0] + ' ' + (right_stress[1] if len(right) > 1 else '')
    #left_stress, right_stress

In [24]:
def rule_add_rhyme(left, right, rp):
# left: one symbol
# right: [symbol1, symbol2] or [symbol1]
# rp: rhyme patterns for symbols 1 and 2 [rp1, rp2]

    if check_sentence_left(left):
        left_rhyme = left
    else:
        if '[' in left:
            ind = left.find('[')
            left_rhyme = left[:ind] + '_' + rp[-1] + left[ind:]
        else:
            left_rhyme = left + '_' + rp[-1]
    
    right_rhyme = []
    for i in range(len(right)):
        if '[' in right[i]:
            ind = right[i].find('[')
            right_rhyme.append(right[i][:ind] + '_' + rp[i] + right[i][ind:])
        else:
            right_rhyme.append(right[i] + '_' + rp[i])
    
    return left_rhyme + ' -> ' + right_rhyme[0] + ' ' + (right_rhyme[1] if len(right) > 1 else '')

In [25]:
def check_meter_add(sp1, sp2):
    sp1end = 's' + sp1.split('s')[-1]
    sp2begin = 's' + sp2.split('s')[1]
    if sp1end == 's0':
        if sp2begin == 's1':
            return True
        if sp2begin == 's0':
            return False
    if sp1end == 's1':
        if sp2begin == 's0':
            return True
        if sp2begin == 's1':
            return False
    if sp1end == 's2':
        sp1pu = 's' + sp1.split('s')[-2]
        if sp1pu == 's0':
            if sp2begin == 's0':
                return True
            else:
                return False
        elif sp1pu == 's1':
            if sp2begin == 's1':
                return True
            else:
                return False
        else:
            raise Exception("Two s2's in a row in sp1")
    if sp2begin == 's2':
        sp2sec = 's' + sp2.split('s')[2]
        if sp2sec == 's0':
            if sp1end == 's0':
                return True
            else:
                return False
        elif sp2sec == 's1':
            if sp1end == 's1':
                return True
            else:
                return False
        else:
            raise Exception("Two s2's in a row in sp2")

In [26]:
def check_iambic_pent(left, sp1, sp2):
    if check_sentence_left(left):
        if not check_meter_add(sp1, sp2):
            return False
        if len(sp1 + sp2) != 20:
            return False
        if 's' + sp1.split('s')[1] == 's1':
            return False
        if 's' + sp1.split('s')[1] == 's2' and len(sp1.split('s')[1:]) > 1 and 's' + sp1.split('s')[2] == 's0':
            return False
        return True
    else:
        return check_meter_add(sp1, sp2)

In [27]:
def concat(wordClasses):
    result = []
    for sp in wordClasses:
        result.append(sp)
    
    for sp1 in wordClasses:
        for sp2 in wordClasses:
            if check_meter_add(sp1, sp2):
                result.append(sp1 + sp2)
    
    limitHit = False
    for sp in result:
        if len(sp) >= 10:
            limitHit = True
            break
            
    if not limitHit:
        result = concat(result)

    return result

In [28]:
ls = wordClasses.keys()
wc = concat(ls)

In [29]:
def rewrite_grammar(wordClasses):
    grammar_section = False
    lexical_section = False
    index = 0;

    while grammar_section == False:
        if lines[index] == '# Grammar Rules':
            grammar_section = True
        index += 1

    new_grammar = set()
    while lexical_section == False:
        if lines[index] == '# Lexical Rules':
            lexical_section = True
        if ' -> ' in lines[index]:
            left = lines[index].split(' -> ')[0]
            right = lines[index].split(' -> ')[1].split(' ')
            if len(right) == 1:
                for sp in wordClasses:
                    new_grammar.add(rule_add_stress(left, right, [sp]))
            else:
                for sp1 in wordClasses:
                    for sp2 in wordClasses:
                        if check_iambic_pent(left, sp1, sp2):
                            new_grammar.add(rule_add_stress(left, right, [sp1, sp2]))
                            #for rp1 in rhymePatterns:
                                #for rp2 in rhymePatterns:
                                    #sleft, sright = rule_add_stress(left, right, [sp1, sp2])
                                    #rule_add_rhyme(sleft, sright, [rp1, rp2])
                            
        index += 1

    lexical_rules = []
    while index < tail:
        lexical_rules.append(lines[index])
        index += 1

    
    with open('grammarnew.fcfg', 'w') as f:
        f.write("% start S\n")
        for rule in new_grammar:
            f.write(rule)
            f.write('\n')
        for rule in lexical_rules:
            f.write(rule)
            f.write('\n')

In [30]:
rewrite_grammar(wc)

In [31]:
pd = load_parser("grammarnew.fcfg", trace=0, cache=False)
gd = pd.grammar()

## Sonnet Generation Method 1: Using the NLTK Generator

Our first attempt at generation used NLTK's built in generator. As expected, it takes a very long time to run the generations, so we have stored the results of running 100,000 sentence generations in a text file that can be used to create the sonnets to save time. However, we have left the code to generate around 100 or so sentences, which takes much shorter and demonstrates the functionality of our generator without demanding the full night that it took to run such a large set of generations.

Two huge drawbacks of the NLTK generator are first, that it does not take tags into account when generating sentences, so all information we wanted the generator to take into account had to be factored into the labels of our grammar, making it much bigger than it would have been if tags were an option. Secondly, the generator has a specific order that it generates sentences in, which means sentences following a very specific construction (e.g., “The baboon eats white fruit,” “The baboon eats white grapes,” “The baboon eats white trees”) will take up the first thousand rows of generations. As we made our vocabulary bigger, the NLTK generator failed us more, because we did not have the processing power to run the millions of generations needed to get the diversity of sentences our grammar had the potential to produce. This was our first attempt at generating a sonnet, and though it had some drawbacks, it did manage to produce something useful.

In [32]:
def check_grammatical(p, s):
    try:
        t = next(p.parse(s))
        return True
    except StopIteration:
        return False
    except ValueError:
        return False

gen = generate(gd, depth=20)

s = next(gen)
generated_sents = []
for i in range(100):
    while not check_grammatical(pd, s) or s in generated_sents:
        s = next(gen)
    generated_sents.append(s)

Due to the time it takes to run generations, we have stashed 100,000 generated sentences in a text file and have the generator code here only produce 100 so functioanlity can be seen without stalling the whole notebook.

In [33]:
# For stashing the results of our call to generate in a file ##

# with open(r'sentences_new.txt', 'w') as fp:
#     for s in generated_sents:
#         # write each item on a new line
#         fp.write("%s\n" % s)
#     print('Done')

In [34]:
# For reading in the results of generate from a text file ##

test_sents = []
with open('sentences.txt') as file:
    for line in file:
        test_sents.append(line)
    print('Done')

new_lst = []
for i in test_sents:
    elements = re.findall(r'[a-z]+', i[0:-1])
    res = [x for x in elements]
    new_lst.append(res)

generated_sents = new_lst

Done


As mentioned, the generation function built into nltk has a very specific order to how it generates sentences, so we attempted to shuffle the indices of the generations to create as much variety as possible in the lines that go into our sonnets. This did not end up working that well, since the first 10,000 generations all contain the same first 4 words or so, but it helped a little.

In [35]:
# Shuffle indices 
random.shuffle(generated_sents)

In [36]:
len(generated_sents)

100000

In [37]:
dets = ['the', 'some', 'his', 'her', 'their', 'its', 'my', 'your', 'our']

def check_line(line, poem_so_far):
    """
    Verifies that a given word has only appeared in the existing body of 
    the sonnet no more than once before.
    """

    for word in line:
        count = 0
        for line in poem_so_far:
            if word in line:
                count += 1
            
        if count > 1 and word not in dets:
            return False
            
    return True 

In [38]:
def generate_abab_stanza(generated_sents, rhyme_dict, poem_so_far=[]):
    """
    Generates a quatrain with an a/b/a/b rhyme scheme based on the following parameters
    
    ARGUMENTS
    =========
    
    generated_sents : list
        generations produced from the grammar 
    
    rhyme_dict : dict
        dictionary of the rhyme associated with each word in the grammar 
    
    poem_so_far : list of lists
        the existing lines of the poem that have been accumulated so far
    """
    
    r1 = random.randint(0, len(generated_sents) - 1)
    first_sent = generated_sents[r1]
    sents_list = poem_so_far
    
    while not check_line(first_sent, sents_list):
        r1 = random.randint(0, len(generated_sents) - 1)
        first_sent = generated_sents[r1]
    

    sents_list.append(first_sent)
    
    a_word = first_sent[-1]
    a = rhyme_dict[a_word]
    
    r2 = random.randint(0, len(generated_sents) - 1)
    second_sent = generated_sents[r2]
    
    while not check_line(second_sent, sents_list) or second_sent[-1] == a_word:
        r2 = random.randint(0, len(generated_sents) - 1)
        second_sent = generated_sents[r2]
    
    sents_list.append(second_sent)
    
    b_word = second_sent[-1]
    b = rhyme_dict[b_word]
   
    
    # shuffle indicies 
    inds1 = list(range(0, len(generated_sents)))
    random.shuffle(inds1)
    
    i = 0
    while i < len(inds1):
        sent = generated_sents[inds1[i]]
        if rhyme_dict[sent[-1]] == a and sent not in sents_list and sent[-1] != a_word and check_line(sent, sents_list):
            sents_list.append(sent)
            break;
        i += 1
        if i == len(inds1):
            raise Exception("Sorry! Could not find a rhyme.")

    inds2 = list(range(0, len(generated_sents)))
    random.shuffle(inds2)
    
    j = 0
    while j < len(inds2):
        sent = generated_sents[inds2[j]]                    
        if rhyme_dict[sent[-1]] == b and sent not in sents_list and sent[-1] != b_word and check_line(sent, sents_list):
            sents_list.append(sent)
            break; 
        j += 1
        if j == len(inds2):
            raise Exception("Sorry! Could not find a rhyme.")
    
    return sents_list 

In [39]:
def generate_gg_couplet(generated_sents, rhyme_dict, poem_so_far):
    """
    Generates a rhyming couplet based on the following parameters
    
    ARGUMENTS
    =========
    
    generated_sents : list
        generations produced from the grammar 
    rhyme_dict : dict
        dictionary of the rhyme associated with each word in the grammar 
    poem_so_far : list of lists
        the existing lines of the poem that have been accumulated so far
    """
    
    r = random.randint(0, len(generated_sents) - 1)
    sent = generated_sents[r]
    sents_list = poem_so_far
    
    while not check_line(sent, sents_list):
        r = random.randint(0, len(generated_sents) - 1)
        sent = generated_sents[r]
    
    sents_list.append(sent)
          
    g_word = sent[-1]
    g = rhyme_dict[g_word]
    
    inds = list(range(0, len(generated_sents)))
    random.shuffle(inds)
    
    i = 0
    while i < len(inds):
        sent = generated_sents[inds[i]]          
        if rhyme_dict[sent[-1]] == g and sent not in sents_list and sent[-1] != g_word and check_line(sent, sents_list):
            sents_list.append(sent)
            break;
        i +=1
        if i == len(inds):
            raise Exception("Sorry! Could not find a rhyme.")
        
    return sents_list

## Building the Sonnet Take 1!

Now we get to put all of the pieces together to generate a sonnet! Due to the smaller size of our vocabulary and the constraints placed around stress and rhyme, it can take the generator a few tries to successfully compose a sonnet. As a result, it may take a moment for the poem to appear. We hope you enjoy!

In [40]:
def sonnet(generated_sents, rhyme_dict):
    ind = True
    while ind:
        try:
            first_stage = generate_abab_stanza(generated_sents, rhyme_dict, [])
            second_stage = generate_abab_stanza(generated_sents, rhyme_dict, first_stage)
            third_stage = generate_abab_stanza(generated_sents, rhyme_dict, second_stage)
            final_sonnet = generate_gg_couplet(generated_sents, rhyme_dict, third_stage)
            return final_sonnet
        except:
            ind = True
            

In [41]:
final_sonnet = sonnet(generated_sents, rhyme_dict)

In [42]:
print("\n" + "=================SONNET================" + "\n")
count = 0
for line in final_sonnet:
    print(*line)
    count += 1
    if count == 4 or count == 8 or count == 12:
        print("\n")



malformed abhorrent lords replace the flare
malformed sedate typhoons replace balloons
sedate balloons replant the lowered pens
abhorrent prepaid notes escape typhoons


the knifelike bloodhound thunders lowered yolks
the chrome parades apprise the hurricane
the knifelike lawsuit basks rewired notes
the straightened upscale wolves dislocate days


the understood blockades remove the charm
the chrome cliched parades escape the lords
the upscale hurricane replants the jock
the prepaid bloodhound basks the straightened thorns


the modern wolves purloin the megaton
the duped cliched blockades dislocate stuffs


## Final Method: Creating Our Own Generator

After realizing that the functionality of the NLTK generator was not enough for what we hoped to do with this project, we constructed a randomized generator that addresses the issue of ordered generation we were having previously. We also expanded our grammar to include more determiners, which rendered the NLTK generator functionally useless at creating diveristy in its sentences, but made ours more proficient. 

In [43]:
def generate_random(g, rhyme_dict, start='S', rhyme_with=None, avoid_word=None, syllables=0, prev_node=None):
    if start[0] == "'" and start[-1] == "'":
        word = start[1:-1]
        if rhyme_with != None and 10 - syllables == len(prev_node.split('_')[1].split('s')[1:]) and rhyme_dict[word] != rhyme_with:
            return None
        else:
            return [word]
    
    randlst = []
    for rule in g:
        if rule.split(' -> ')[0] == start:
            randlst.append(rule)
            
    #print(randlst)
            
    if len(randlst) == 0:
        return None

    rg = None
    while rg == None:
        sentence = []
        ri = random.randint(0, len(randlst) - 1)
        r = randlst[ri]
        left = r.split(' -> ')[0]
        right = r.split(' -> ')[1].split(' ')
        #print(r)
        eliminate_options = True
        #print(left, right)
        for i, item in enumerate(right):
            #print(right)
            """
            if len(right) == 1 and right[0][0] == "'" and right[0][-1] == "'":
                newSyllablesLeft = 0
            elif i == len(right) - 1:
                newSyllablesLeft = len(right[0].split('_')[1].split('s')[1:])
            else:
                newSyllablesLeft = len(right[1].split('_')[1].split('s')[1:])
            """
            if i == 0:
                syllablesNext = syllables
            else:
                syllablesNext = syllables + len(right[0].split('_')[1].split('s')[1:])
            #print(left, right)
            #print(syllablesNext)
            rg = generate_random(g, rhyme_dict, start=item, rhyme_with=rhyme_with, avoid_word=avoid_word, syllables=syllablesNext, prev_node=left)
            #print(rg)
            if rg == None:
                break
            if avoid_word != None and rg[0] == avoid_word: # in avoid_words:
                rg = None
                #eliminate_options = False
                break
            for word in rg:
                sentence.append(word)
                
        if eliminate_options == True:
            if len(randlst) == 1:
                return None
            else:
                randlst = randlst[0:ri] + randlst[ri+1:len(randlst)]

    return sentence

In [44]:
def generate_abab_stanza(grammar_list, rhyme_dict, poem_so_far=[]):
    first_sent = generate_random(grammar_list, rhyme_dict)
    sents_list = poem_so_far
    sents_list.append(first_sent)
    a_word = first_sent[-1]
    a = rhyme_dict[a_word]
    
    second_sent = generate_random(grammar_list, rhyme_dict, avoid_word=a_word) #, avoid_words=[a_word]
    sents_list.append(second_sent)
    b_word = second_sent[-1]
    b = rhyme_dict[b_word]
    
    third_sent = generate_random(grammar_list, rhyme_dict, rhyme_with=a, avoid_word=a_word) #, avoid_words=[a_word, b_word]
    sents_list.append(third_sent)
    c_word = third_sent[-1]
    
    fourth_sent = generate_random(grammar_list, rhyme_dict, rhyme_with=b, avoid_word=b_word) #, avoid_words=[a_word, b_word, c_word]
    sents_list.append(fourth_sent)
    
    return sents_list

In [45]:
def generate_gg_couplet(grammar_list, rhyme_dict, poem_so_far):
    first_sent = generate_random(grammar_list, rhyme_dict)
    sents_list = poem_so_far
    sents_list.append(first_sent)
    a_word = first_sent[-1]
    a = rhyme_dict[a_word]
    
    second_sent = generate_random(grammar_list, rhyme_dict, rhyme_with=a, avoid_word=a_word) #, avoid_words=[a_word]
    sents_list.append(second_sent)
    
    return sents_list

In [46]:
g = []
with open('grammarnew.fcfg', 'r') as f:
    for line in f:
        if line != '\n':
            g.append(line.strip())

## Building the Sonnet Final Take!

Now we get to put all of the pieces together to generate a sonnet! We hope you enjoy!

In [49]:
first_stage = generate_abab_stanza(g, rhyme_dict, [])
second_stage = generate_abab_stanza(g, rhyme_dict, first_stage)
third_stage = generate_abab_stanza(g, rhyme_dict, second_stage)
final_sonnet = generate_gg_couplet(g, rhyme_dict, third_stage)

In [50]:
print("=================SONNET================" + "\n")
count = 0
for line in final_sonnet:
    print(*line)
    count += 1
    if count == 4 or count == 8 or count == 12:
        print("\n")


unpacked potatoes croon his good blockades
abhorrent snacks surround adroit balloons
abhorrent gateways acclimate her rays
cartoon pastel typhoons escape typhoons


their modern schedules tape my late campaign
malformed parades endure the bored lagoon
her lowered mailbag thunders straightened rays
the good sakura meets my squeaky troops


her chrome parades surround his straightened face
abhorrent frauds surround the underwood
her straightened lowered charm replants its rays
inept potatoes smother modern wolves


their lowered scowl awakens shorn balloons
her underwood acquires rotten troops
