<h1>Canterbury Tales <i>Redux</h1>
<br>
<h2>By Hunter Horst</h2>

<h3>Part 0: Conceptualization</h3>

When we first went over this project in class, I immediately thought:
<br> 
>Man, coming up with that many interesting words and inputting them all individually into different lists would be _really_ repetitive.<br>
>__How can I automate that?__ <br>

I think it's a habit I picked up from other coding classes: good code avoids repeating itself. So, from the beginning, I knew that I wanted to automate the creation of my candidate words.

Both for poetry and general fascination, older English dialects have inteested me for some time now. I like both Middle and Old English, though I can't actually speak or read either, just fragments. That's one reason why I chose to derive my words from the Canterbury Tales. It's a well-studied historical text with plenty of academic coverage, so I figured it would be easy to find a glossary of terms from its text with accompanying definitions and parts of speech (I was correct). Also, I thought it would sound cool.

I managed to find a website that catalogued Middle English words from the Canterbury Tales alongside their part of speech and a translation: 

[librarius.com](http://www.librarius.com/gy.htm)

Hopefully by the time this gets turned in I've thanked the authors of this website, as this was exactly what I needed.

Originally, I intended to create a way for me to sort through the large glossary, and pick out the best words. That's how I planned to exercise my artistic control over my poetry. That didn't end up happening.

<h3>Part 1: Working with the HTML of English majors</h3>

I started with a program that scraped the glossary from the website I found.
<br>
>Just copying and pasting directly from the site seemed 'hacky' - why do it manually when I can write code to do it for me?


In [1]:
import requests, json
import bs4 #BeautifulSoup
import re

In [2]:
url = "http://www.librarius.com/gy.htm"
request = requests.get(url)
raw = request.content    #the raw HTML data

In [3]:
soup = bs4.BeautifulSoup(raw, 'html.parser') #a BS4 object that allows searching and retrieval of individual HTML elements

In [4]:
#print(soup)

However, it turns out that the HTML of this website is formatted rather poorly.
<h4>Issues with this website:</h4>

1. The first word is in a different type of tag than the others.
2. The entire body (after the first element) is in one giant 'p' tag, so each word and its information aren't actually in their own HTML tags.

This _kind of_ defeated the purpose of scraping the data in the first place, though I did find a solution:

In [5]:
entries = soup('p', limit=1)    #searches for the first 'p' element
text = str(entries)
mylist = text.split('\n')       #since each word is in its own row, this should return every word by itself
print(len(mylist))    #theoretically the number of words, but includes some junk
#print(text)

2050


In [6]:
print(mylist[:10])

['[<p>', '<a name="abegge"><b>abegge</b> <small><i>verb</i></small> pay for (it)<p>', '<a name="abyde"><b>abid, abyd, abyde</b> <small><i>verb, prsnt.</i></small> remain, await, wait; <b>abood</b> <small><i>verb, pst.</i></small> awaited, remained<p>', '<a name="abideth"><b>abideth, abydeth</b> <small><i>verb</i></small> awaits<p>', '<a name="abidyng"><b>abidyng</b> <small><i>verb</i></small> awaiting<p>', '<a name="able"><b>able</b> <small><i>adj.</i></small> suitable<p>', '<a name="abluciouns"><b>abluciouns</b> <small><i>noun</i></small> cleansings<p>', '<a name="abood"><b>abood</b> <small><i>noun</i></small> delay<p>', '<a name="above"><b>above</b> <small><i>adj.</i></small> superior<p>', '<a name="abregge"><b>abregge</b> <small><i>verb</i></small> abridge, shorten<p>']


I still had to retrieve the first word, which I did like this:

In [7]:
first_entry = soup.find('a', attrs={'name': re.compile('.*')})   #finds the first 'a' tag with a 'name' attribute (containing anything)
first_entry = str(first_entry)                                   #this corresponds to the first word, which is not in a 'p' element

However, because everything's in one tag, this actually returns a string of the entire glossary.

In [8]:
#print(first_entry)

Looking back, I could have just used this to get my words from. But in the moment, I was too focused on finding the first word, so I kept my previous method, even though it clearly had flaws. That was a small mistake from me.

In [9]:
actual_first = first_entry.split('\n')[0]   #gets the first row (first word) from the giant 'a' tag

In [10]:
print(actual_first)

<a name="abayst"><b>abayst</b> embarrassed<p>


Since the first member of mylist is just the start of a tag, I replace it with actual_first:

In [11]:
mylist[0] = actual_first

In [12]:
print(mylist[:10])

['<a name="abayst"><b>abayst</b> embarrassed<p>', '<a name="abegge"><b>abegge</b> <small><i>verb</i></small> pay for (it)<p>', '<a name="abyde"><b>abid, abyd, abyde</b> <small><i>verb, prsnt.</i></small> remain, await, wait; <b>abood</b> <small><i>verb, pst.</i></small> awaited, remained<p>', '<a name="abideth"><b>abideth, abydeth</b> <small><i>verb</i></small> awaits<p>', '<a name="abidyng"><b>abidyng</b> <small><i>verb</i></small> awaiting<p>', '<a name="able"><b>able</b> <small><i>adj.</i></small> suitable<p>', '<a name="abluciouns"><b>abluciouns</b> <small><i>noun</i></small> cleansings<p>', '<a name="abood"><b>abood</b> <small><i>noun</i></small> delay<p>', '<a name="above"><b>above</b> <small><i>adj.</i></small> superior<p>', '<a name="abregge"><b>abregge</b> <small><i>verb</i></small> abridge, shorten<p>']


That's better. And my reward? Getting to work on the hard part.

_there were harder parts coming up but I didn't know that yet_

<h3>Part 2: Regex Fun and Custom Classes</h3>

Now that I had my text, I had to formalize what exactly I wanted from it. Since I can't read Middle English, and neither can my audience, I wanted to provide a translation of the generated poem, and therefore of every word in it. I also needed to keep track of the parts of speech (POS, pos) of each word. I chose an object-oriented approach, partially because I thought it was the best option for associating and accessing multiple attributes, and partially because I knew I would enjoy coding it.

First, I had to turn my giant string from earlier into convenient data. Since each word has relatively consistent formatting (some mistakes and irregularities but that's to be expected), I could use regex. Thanks to regex101.com, I was able to make this pattern:

In [13]:
CTpattern = r"""(?x)
(?P<first_form><a(.*)?\sname=\"(?P<name>.+?)\"><b>(?P<middle_eng_term1>.*?)</b>\s?(<small><i>(?P<pos1>.*?)?</i></small>)?(?P<definition1>.*?))[<;]
(?P<second_form>\s<b>(?P<middle_eng_term2>.*?)</b>\s?(<small><i>(?P<pos2>.*?)?</i></small>)?(?P<definition2>.*?)[<;])?
(?P<extra>.*)?"""

Which captures almost every word without issue, and sorts the important information into convenient named groups.

IMPORTANT NOTE: I developed this pattern over 30-60 minutes of debugging, which I will document below. I did not start with this, but I don't have the failed patterns to showcase here.

Next, I defined my class:

In [14]:
class Fword:
    """
    Short for 'fancy word'. An object to hold a word and its 'definition' (for whatever reason I thought of the descriptions as definitions instead of translations)

    Attributes:
        name (str): the word, spelled out
        form (str): the first variation of the word
        pos (str): the first form's part of speech
        deff (str): the word's first definition
        form2 (str): the second variation
        pos2 (str): the second part of speech
        deff2 (str): the second definition
        extra (str): anything else not captured by my named groups
    """
    def __init__(self, name, form, pos, definition, form2=None, pos2=None, def2=None, extra=None):
        """
        Initialize an Fword.

        Arguments:
            just the same as the object's attributes
        """
        self.name = name
        self.form = form
        self.pos = pos
        self.deff = definition
        self.form2 = form2
        self.pos2 = pos2
        self.deff2 = def2
        self.extra = extra
    def __str__(self):
        """
        Easy-to-read representations.

        At one point, this method returned a fancy description, listing out the word and its forms. I wanted it to be clear when an Fword was printed vs. a string.
        However, that was annoying to read.

        Returns:
            str: self.name
        """
        if True:
            return self.name
        if self.form2:
            return f"fword object of {self.name}, with forms {self.form}, {self.form2}"
        else:
            return f"fword object of {self.name}, with form(s) {self.form}"
    def __repr__(self):
        """
        Slightly less easy-to-read representations.

        This was meant to be the shorter version to save space, but after I simplified __str__, it just marks the word as an Fword object.

        Returns:
            str: the Fword's name, labeled as an Fword object.
        """
        return f"fwordObject {self.name}"
    def describe(self):
        """
        Provide an Fword's attributes in a nice, human-readable format.

        Returns:
            str: a formatted string of each labeled attribute
        """
        if self.form2 and self.pos2 and self.def2:
            return f"name: {self.name}, first form: {self.form}, part of speech: {self.pos}, definition: {self.deff}, second form: {self.form2}, part of speech: {self.pos2}, definition: {self.deff2}"
        else:
            return f"name: {self.name}, first form: {self.form}, part of speech: {self.pos}, definition: {self.deff}"

def get_pos(pos, fwords):
    """
    Filter an iterable of Fwords by their parts of speech.

    Arguments:
        pos (str): the part of speech to filter by
        fwords (list, tuple, set, frozenset): the collections of Fword objects to filter

    Returns:
        list: those Fwords that match the given pos
    """
    selection = []
    for word in fwords:
        if pos in word.pos or (word.pos2 != None and pos in word.pos2):
            selection.append(word)
    return selection

This was actually relatively easy to write, I'm fairly sure that I didn't need to edit this part. This next section definitely did, though: creating the Fword objects, using my regex pattern to locate the necessary information.

In [15]:
defined_words = []
x = 0
for line in mylist:
    raw_word = re.search(CTpattern, line)
    try:
        word_name = raw_word['name']
        form1 = raw_word['middle_eng_term1']
        form1_pos = raw_word['pos1']
        if form1_pos == None:                #if there was no match for the first form POS
            form1_pos = 'other'              #just call it 'other'
        form1_def = raw_word['definition1']
        try:                                 #since looking up a nonexistent group throws an error, 
            form2 = raw_word['middle_eng_term2']  #these try-catches use that to determine whether to set
            form2_pos = raw_word['pos2']          #the optional second-form attributes and extra data
            form2_def = raw_word['definition2']
            try:                                  #assume the regex found extra information
                extra = raw_word['extra']
                word = Fword(word_name, form1, form1_pos, form1_def, form2=form2, pos2=form2_pos, def2=form2_def, extra=extra)
            except TypeError as e:                #if it didn't, assume it found a second form, pos and definition
                word = Fword(word_name, form1, form1_pos, form1_def, form2=form2, pos2=form2_pos, def2=form2_def)
        except TypeError as e:                    #if it didn't (most common outcome), set only the required attributes
            word = Fword(word_name, form1, form1_pos, form1_def)
        defined_words.append(word)
        x += 1
    except TypeError as e:   #This both helped me find problem words during debugging,
        print(x)             #and cuts off the orphaned endtags (like </p>) in their own lines at the end
        print(e)
        break

2029
'NoneType' object is not subscriptable


Like I said, this took plenty of debugging to work out. Here are some examples of what I checked to see whether it worked:

In [16]:
print(defined_words[0])

abayst


In [17]:
defined_words[2]

fwordObject abyde

At this point, I added the try-catch to print where the errors were happening. First, it errored out on word 1282:

In [18]:
mylist[1282]

'<a "="" name="peyre of tables"><b>peyre of tables</b> folding set of writing tablets<p>'

Because I failed to account for spaces inside of the name or form. After accounting for spaces, it was word 1983:

In [19]:
mylist[1983]

'<a name="ycleped"><b>ycleped (cleped)</b> <small><i>verb, pst. sg.</i></small> called<p>'

Because this word uses parenthesis to mark an alternative spelling, and I was only matching word characters and spaces. After this, I rediscovered the lazy quantifier, which made it much easier: since I knew exactly what came after each name and definition, I could match any character and the lazy quantifier would ensure I only matched what was necessary.

Now it errors out on an intended non-match, exiting the for loop with my list of defined words complete and intact.

With my words created and readable, I worked on understanding and interpreting them. Since there's 2029 individual words, I decided not to read them all and comb through them for discrepencies. I began with parts of speech and data validation:

In [20]:
len(get_pos('noun', defined_words))

1070

In [21]:
def get_none(fwords):
    """
    Check whether any failed Fword object creations are in a list.

    I believe I had a few None objects come up later, so I made this to help check and debug.

    Arguments:
        fwords (list, tuple, set, frozenset): an iterable of fwords to verify

    Side effects:
        prints the list element if it raises an error
    """
    for word in fwords:
        try:
            if 'noun' in word.pos:    #a slightly hacky way of checking if word is a valid Fword object
                pass
        except TypeError:
            print(word)

In [22]:
get_none(defined_words)

In [23]:
def get_possible_pos(fwords):
    """
    Find all unique parts of speech in an iterable of Fwords.

    Arguments:
        fwords (list, tuple, set, frozenset): an iterable of fwords to search through
    
    Returns:
        list: every unique POS as a string
    """
    posses = []
    for word in fwords:
        if word.pos not in posses:
            posses.append(word.pos)
        try:                                           #If a word has a second form
            if word.pos2 not in posses and word.pos2:  #check its pos too
                posses.append(word.pos2)               
        except AttributeError:                         #If not, just carry on
            pass
    return posses

In [24]:
print(get_possible_pos(defined_words))

['other', 'verb', 'verb, prsnt.', 'verb, pst.', 'adj.', 'noun', 'prep.', 'verb, prsnt', 'verb, pst. prtcpl.', 'verb, pst. sg.', 'conj.', 'adv.', 'verb, prsnt. sg.', '(Latin)', 'noun, sg.', 'noun, pl.', 'verb, pst. sg', 'verbal noun', 'verb, 3rd prs. sg.', 'adverb', ' noun', 'verb, pst', 'verb, 3rd prs. prsnt.', 'noun sg.', 'verb, prsnt, 3rd prs. sg.', 'adv. ', 'pst.', 'verb, 1st prs. sg. prst', 'verb, 3rd prs. sg. prst.', 'comp.', '', 'verb, 2nd prs. sg. prsnt.', 'verb, 3rd prs. sg. prsnt.', 'verb, pst. ', 'pro.', 'adj. superlative', 'noun &amp; adj.', 'noun, pl', 'adv. superlative', 'verb, prs. prtcpl.', 'num.', 'noun &amp; verb', 'interj.', 'comparative', 'pro. ', 'adj. ', 'verb. pst.', 'verb, verbal noun', 'noun pl.', '(French)', 'adj. &amp; noun', 'verb, 2nd prs. sg.', 'verb, 3rd prs.sg.', 'verb, prtcpl.', 'adv.</i> </small> closely; <small><i>adj.', 'adv. &amp; noun', 'adj', '3rd prs. sg.', 'adv</i>.</small> 1. gently; 2. quietly; <small><i>adj.', 'conj. &amp; prep.', 'demonstr. a

Obviously this process has flaws. Some words are actually phrases in other languages, which comes through as a pos; at least one word has an empty string as its pos; ampersands are rendered in an ugly way; and twice, a pos captured information from a second form. I believe this is because those words have different forms with different pos's, but they're spelled the same, so the authors of the website didn't add a second form name which messed up my regex.

All things considered, though, it's worked out quite well. I have all the information I need to start making poems, and plenty I don't need.

First, though, I made a list of all the words with 'weird' pos's:

In [25]:
for i in defined_words:
    for j in ('(', 'comp.', 'pro.', 'num.', 'verbal noun', 'prtcpl.', '<', 'comparative', 'demonstr.', 'p.'):
        if  i.pos == 'pst.' or not i.pos or j in i.pos:
            print(i.describe())
            break

name: adoun, first form: adoun, part of speech: prep., definition:  below
name: after-mete, first form: after-mete, part of speech: prep., definition:  the time after diner
name: agast, first form: agast, part of speech: verb, pst. prtcpl., definition:  frightened
name: amended, first form: amended, part of speech: verb, pst. prtcpl., definition:  improved, corrected
name: amor vincit omnia, first form: Amor vincit omnia, part of speech: (Latin), definition:  Love conquers all
name: arrayed, first form: arrayed, part of speech: verb, pst. prtcpl., definition:  prepared, adorned
name: at erst, first form: at erst, part of speech: prep., definition:  for the first time
name: avauntyng, first form: avauntyng, part of speech: verbal noun, definition:  boasting
name: ave marie, first form: Ave Marie, part of speech: (Latin), definition:  Hail Mary (first words of a Latin prayer)
name: berkyng, first form: berkyng, part of speech: verbal noun, definition:  barking
name: bitwix, first form: b

In [26]:
nouns = get_pos('noun', defined_words)
print(len(nouns))

1070


In [27]:
for noun in nouns[:10]:
    print(noun.describe())

name: abluciouns, first form: abluciouns, part of speech: noun, definition:  cleansings
name: abood, first form: abood, part of speech: noun, definition:  delay
name: accidie, first form: accidie, part of speech: noun, definition:  sloth, laziness
name: accord, first form: accord, acord, part of speech: noun, definition:  agreement
name: achatours, first form: achatours, part of speech: noun, definition:  buyers
name: affiance, first form: affiance, part of speech: noun, definition:  trust
name: agu, first form: agu, part of speech: noun, definition:  acute fever
name: aiel, first form: aiel, part of speech: noun, definition:  grandfather 
name: aketoun, first form: aketoun, part of speech: noun, definition:  wadded jacket worn under the chain-mail coat
name: alauntz, first form: alauntz, part of speech: noun, definition:  wolfhounds


Thus concluded my regex party.

<h3>Part 3: Display</h3>

Some bad news: I made many edits to this code, over the day I initially wrote it and otherwise, and absolutely did not document most of them. That means i won't be able to provide as detailed a recounting of my coding from here on as I have so far.

Good news: I did actually document some of my thoughts in the moment, so I will be able to cover some things.

More good news: That means this section is less about my coding decisions, and more about my creative ones.

First, I will reproduce my pre-coding notes from September 26, the day I wrote most of this code, in full:

9/26
<h1>What I want:</h1>
a fast method for selecting good words/phrases that's also repeatable  
<h1>How to accomplish that:</h1>

1. A function that displays a word and its info, one at a time, and I hit a button to save it or reject it
2. Literally completely at random
<h2>Option 1</h2>

- hit 1 for save, 2 for reject
- should save my choices when reloading jupyter
- much harder to implement
- more artistic control/input

<h2>Option 2</h2>

- very easy to implement
- little to no control
- may be harder to select phrases for fourth line

First: just write Option 2, just in case Option 1 takes too long

As we can see, I still thought I would be curating the words, but decided to write the 'easy' option first. This was a good decision.

From here, I created variables to hold the words for each pos I needed, before getting once again into hardcore OOP:

In [28]:
nouns = get_pos('noun', defined_words)
adjectives = get_pos('adj.', defined_words)
participles = get_pos('prtcpl.', defined_words) + get_pos('verbal noun', defined_words)

I included 'verbal nouns' with participles because there were only 20-ish regular participles, and most of the verbal noun translations ended in '-ing' anyway. I believe this was a good choice: I like the variety.

In [29]:
print(len(adjectives), len(participles))

238 40


In [30]:
print(participles)

[fwordObject agast, fwordObject amended, fwordObject arrayed, fwordObject clad, fwordObject depeint, fwordObject depeynted, fwordObject encombred, fwordObject eschawfed, fwordObject fletynge, fwordObject floytynge, fwordObject kembd, fwordObject kithed, fwordObject mowled, fwordObject mysboden, fwordObject offended, fwordObject rownynge, fwordObject wist, fwordObject ybete, fwordObject yiven, fwordObject avauntyng, fwordObject berkyng, fwordObject chidyng, fwordObject conseillyng, fwordObject eschawfyne, fwordObject galpyng, fwordObject grucchyng, fwordObject herknynge, fwordObject janglerie, fwordObject mordrynge, fwordObject murmur, fwordObject prechyng, fwordObject prikyng, fwordObject rowtyng, fwordObject saluyng, fwordObject stryvyng, fwordObject sublymyng, fwordObject travaillynge, fwordObject unyolden, fwordObject venerie, fwordObject wrastlynge]


In [31]:
from random import choice, choices, randint, sample
import os
import time
import IPython
import IPython.display
from IPython.display import HTML
import json
import base64

In [32]:
class Diamante:
    """
    An object that creates a poem from given sets of words and randomness.

    Attributes:
        topic1, topic2 (Fword): the two topic words of this diamante poem, in order
        adj1, adj2, adj3, adj4 (Fword): the four adjectives of this diamante poem, in order
        p1, p2, p3, p4, p5, p6 (Fword): the six participles of this diamante poem, in order
        template1, template2 (str): a string, meant to work with str.format(), representing the two possible formats for the middle line
            a line using template1: "[noun1] & [noun2], [noun3] & [noun4]"
            a line using template2: "[adjective1] [noun1] and [adjective2] [noun2]"
        phrasenouns (list): four random noun Fword objects. At least two will always get used
        phraseadjs (list): two random adjective Fword objects. Depending on the template, may or may not get used
        phrase1 (str): the first half of the middle line, inserting the proper phrase words into the randomly chosen template
        phrase2 (str): the second half of the middle line, inserting the proper phrase words into the randomly chosen template
        phrase (str): the entire middle line, connecting the two halves differently depending on the template
        phrase_def (str): the translated phrase
    """
    def __init__(self, words, nouns, adjs, prtcpls):
        """
        Initialize a Diamante poem/object.

        Arguments:
            words (iter): can be any iter, but I use a list. apparently serves no purpose.
            nouns (iter): ''. The pool of nouns to choose from
            adjs (iter): ''. The pool of adjectives to choose from
            prtcpls (iter): ''. The pool of participles to choose from

        Side effects:
            sets attributes
        """
        self.topic1, self.topic2 = choices(nouns, k=2)
        self.adj1, self.adj2, self.adj3, self.adj4 = choices(adjs, k=4)
        self.p1, self.p2, self.p3, self.p4, self.p5, self.p6 = choices(prtcpls, k=6)
        self.template1 = "{0} & {1}"
        self.template2 = "{0} {1}" #adj + noun (repeated)
        self.phrasenouns = choices(nouns, k=4)
        self.phraseadjs = choices(adjs, k=2)
        if not randint(0, 1):
            self.phrase1 = self.template1.format(*self.phrasenouns[:2])
            self.phrase2 = self.template1.format(*self.phrasenouns[2:])
            self.phrase = f'{self.phrase1}, {self.phrase2}'
            self.phrase_def = f'[{self.phrasenouns[0].deff}] & [{self.phrasenouns[1].deff}], [{self.phrasenouns[2].deff}] & [{self.phrasenouns[3].deff}]'
        else:
            self.phrase1 = self.template2.format(self.phraseadjs[0], self.phrasenouns[0])
            self.phrase2 = self.template2.format(self.phraseadjs[1], self.phrasenouns[1])
            self.phrase = f'{self.phrase1} and {self.phrase2}'
            self.phrase_def = f'[{self.phraseadjs[0].deff}] [{self.phrasenouns[0].deff}] and [{self.phraseadjs[1].deff}] [{self.phrasenouns[1].deff}]'
    
    def makePoem(self):
        """
        Print the generated poem and initiate saving.

        Side effects:
            prints each line of the poem in a pretty format.
            asks the user whether they want to save this poem, and saves it if they do
        """
        print()
        display(HTML(f'<center><font face="Canterbury" size="4"> {self.topic1.name} </font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> {self.adj1.name}  and {self.adj2.name}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> {self.p1.name}, {self.p2.name}, {self.p3.name}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4">{self.phrase}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> {self.p4.name}, {self.p5.name}, {self.p6.name}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> {self.adj3.name}  and {self.adj4.name}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> {self.topic2.name} </font></center>'))
        print()
        display(HTML(f'<center><font face="garamond" size="4"> [{self.topic1.deff}] </font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> [{self.adj1.deff}]  and [{self.adj2.deff}]</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> [{self.p1.deff}], [{self.p2.deff}], [{self.p3.deff}]</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4">{self.phrase_def}</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> [{self.p4.deff}], [{self.p5.deff}], [{self.p6.deff}]</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> [{self.adj3.deff}]  and [{self.adj4.deff}]</font></center>'))
        display(HTML(f'<center><font face="garamond" size="4"> [{self.topic2.deff}] </font></center>'))
        print()
        if input("Do you want to save this poem?").lower() in ('y', 'yes'):
            save(self)          # This will be covered later

I copied the display section directly from the assignment file, as I did not want to learn a new Python module just for this project.

_that did not work out_

I surrounded each word's translation in square brackets, to make it clear what was being translated.

In this version, I selected the words using the choices() function, which __includes__ replacement. I wrote an otherwise identical version that uses sample() instead, which is __without__ replacement. I ended up preferring that version better, so in my final version I use sample().

As for the poem's middle line, I took both formats from the original Markdown file. Template 2 is what's used in the code, and the example poem at the top of the file uses template 1.

In [33]:
choicePoem = Diamante(defined_words, nouns, adjectives, participles)

In [34]:
#choicePoem.makePoem()

And this was my first working version. Honestly, it works quite well.

<h3><i>However</i></h3>

After a few generations, I noticed repeating words within a poem. Now, repetition can be a good thing in poetry, it heightens the impact of whatever is repeated, but here it felt sloppy. Since the participle pool is considerably smaller than the available nouns and adjectives, and they're closer to each other than the other lines, I decided to change the code. I wrote a second version that uses random.sample() instead of random.choices(), which means that each chosen word must be unique. However, this does __not__ entirely eliminate repetition: since the words for the middle line are chosen separately from the rest of the poem, the middle line could contain one of the poem's topic nouns, or a repeated adjective. It does prevent the repetition of participles, which was by far the most common source of repetition, so I considered that good enough. I won't include it here just to save space, and the only differences made it into the final version anyway.

And that actually wraps up Part 3! That's how I actually created and displayed the poems, using the custom word objects from my web scraping.

_But this is not the end_

Now that the program worked, I couldn't help but think of additional features to add. I ended up implementing two, but those two took me about as long as all the previous code to correctly implement.

<h3>Part 4: "Extra Features"</h3>

After generating my first poem, I discovered a problem: since creating a new Diamante object generates new random words to use, there's no way to preserve any specific poem when reloading the notebook. While I could just copy and paste, that doesn't preserve the fonts or formatting, and would be awkward to paste directly into Markdown. So, I needed a way to save the poems my program generates, and load them back into Notebook when necessary. This is why I haven't shown any example poems yet.

<h4>Saving and Loading</h4>

Luckily for me, I already developed a custom saving and loading system for a previous personal project of mine (implementing Monopoly in Python). I was able to use the basic code from that project for this program, though I spent significant time relearning JSON manipulation.

Essentially, I turn Fword objects into dictionary-like JSON objects, with their attributes as values and the attribute's names as keys. Then I transform the Diamante object into another dictionary of attributes, but since its attributes are themselves custom Fword objects, it becomes a dictionary of labeled tuples containing JSON representations of its Fword objects, which is then saved to a JSON file. Then, to load these objects, I turn each JSON object (the overall Diamante object and the individual Fwords) into a Python dictionary, and generate Fword objects using their dictionary's keys as arguments. Then, I create a new Diamante object and manually set its attributes to the newly-created Fword objects.

In [35]:
import json

In [36]:
class PoemEncoder(json.JSONEncoder):
    """ An encoder for Diamante poems (objects).
    """
    def default(self, obj):
        """ Encode a Diamante poem.
        
        Arguments:
            obj (Diamante): the poem to encode
            
        Returns:
            dict: the Diamante object's attributes, as a dictionary
            if obj is not a Diamante, return the default method of the parent class
        """
        if isinstance(obj, DiamanteFinal):
            return {'topics': (obj.topic1, obj.topic2),
                    'adjectives': (obj.adj1, obj.adj2, obj.adj3, obj.adj4),
                    'participles': (obj.p1, obj.p2, obj.p3, obj.p4, obj.p5, obj.p6),
                    'phrase': (obj.phrase),
                    'phrase definition': (obj.phrase_def)
                    }
        return json.JSONEncoder.default(self, obj)

class FwordEncoder(json.JSONEncoder):
    """ An encoder for Fword (objects).
    """
    def default(self, obj):
        """ Encode an Fword object.
        
        Arguments:
            obj (Fword): the word to encode
            
        Returns:
            dict: the Fword object's attributes, as a dictionary
            if obj is not an Fword, return the default method of the parent class
        """
        if isinstance(obj, Fword):
            return {'name': obj.name,
                    'form': obj.form,
                    'part of speech': obj.pos,
                    'definition': obj.deff
                    }
        return json.JSONEncoder.default(self, obj)

def save(poem, path='goodPoem'):
    with open(f'{path}.json', 'w', encoding='utf-8') as f:
        newattrs = [json.dumps(word, indent=2, cls=FwordEncoder) for word in (poem.topic1, poem.topic2, poem.adj1, poem.adj2, poem.adj3, poem.adj4, poem.p1, poem.p2, poem.p3, poem.p4, poem.p5, poem.p6)]
        #newattrs.append(poem.phrase, poem.phrase_def)
        poem.topic1, poem.topic2, poem.adj1, poem.adj2, poem.adj3, poem.adj4, poem.p1, poem.p2, poem.p3, poem.p4, poem.p5, poem.p6 = newattrs
        json.dump(poem, f, indent=2, cls=PoemEncoder)

def loadFword(word):
    #print(word)
    #print(type(word))
    return Fword(word['name'], word['form'], word['part of speech'], word['definition'])

def load(path, *args, **kwargs):
    with open(path, 'r', encoding='utf-8') as f:
        poemInfo = json.load(f)
    newPoem = DiamanteFinal(*args)
    newTopics, newAdjs, newPs = [], [], []
    newPoem.topic1, newPoem.topic2 = [loadFword(json.loads(topic)) for topic in poemInfo['topics']]
    newPoem.adj1, newPoem.adj2, newPoem.adj3, newPoem.adj4 = [loadFword(json.loads(adj)) for adj in poemInfo['adjectives']]
    newPoem.p1, newPoem.p2, newPoem.p3, newPoem.p4, newPoem.p5, newPoem.p6 = [loadFword(json.loads(p)) for p in poemInfo['participles']]
    newPoem.phrase = poemInfo['phrase']
    newPoem.phrase_def = poemInfo['phrase definition']
    return newPoem

This is how you load a poem:

In [37]:
#load('goodPoem.json', defined_words, nouns, adjectives, participles)

I had to comment this out because the current saving and loading functions/classes use the final version of my objects, which I haven't yet inlcuded in this notebook.

<h4>Using a Custom Font</h4>

_Wouldn't it be cool to render the Middle English version of the poem in an old-timey font?_

This is what took up most of my time for this section.

In short, I tried to manipulate iPython.display() into loading a non-web-safe font I downloaded, which is difficult when I started out not knowing anything about the module.

This is the only part of my program I actively documented while writing it, so I'll just include my different notes and attempts below. This file is already far too long.

Note: I was using Wingdings during testing as it was easily recognizable, so I could tell whether my code was doing anything

In [38]:
#mycss = """
#@font-face: {font-family: Canterbury; src: url('Canterbury.ttf');
#}
#center {
#font-family: Canterbury, Wingdings
#}
#"""

^^^The above cell^^^

Had to be commented out because it affects all cells in all files, and since I wrote an improved version lower down (with some help from ChatGPT), it overrides that cell. However, I want to keep the old cell around for documentation's sake

<style>
    @font-face: {font-family: Canterbury; src: url('Canterbury.ttf');
                }
    center {
        font-family: Canterbury
            }
</style>
<center>hello</center>
<center><font face='Canterbury' size='4'>hello</font></center>

<center>hello</center>
<center><font face='Canterbury' size='4'>hello</font></center>

NOTE: Originally the previous text was not in the custom font, and I spent hours trying to get it to work. Eventually I asked ChatGPT for help, and put its results into a separate ipynb file to test. Not only did the code work, but it immediately changed the text in this file to the correct font. Also, I don't actually have to add the style to the display(HTML()) call, I just create a variable that stores my desired CSS in a string, and Notebook automatically applies it to all cells in all tabs. Very strange

In [39]:
import base64

In [40]:
font_path = 'Canterbury.ttf'
with open(font_path, 'rb') as font_file:
    font_data = font_file.read()
    encoded_font = base64.b64encode(font_data).decode('utf-8')

In [41]:
mycss2 = f"""
@font-face {{
    font-family: 'Canterbury';
    src: url(data:font/ttf;base64,{encoded_font}) format('truetype');
}}

.original_text {{
    font-family: 'Canterbury', serif;
    font-size: 24px;
}}

.translation {{
    font-family: 'Garamond', serif;
    font-size: 20px;
}}
"""


In [42]:
display(HTML(f"<style>{mycss2}</style>"))

Actually, I was incorrect when I wrote:
> Also, I don't actually have to add the style to the display(HTML()) call, I just create a variable that stores my desired CSS in a string, and Notebook automatically applies it to all cells in all tabs.

But I only realized it when putting together this file. Without running any of my main working file, I had to run the above code cell in order to use the Canterbury font. I assume that some section of my code, potentially even that section itself, loaded the font correctly, and it was somehow persisting across kernel restarts. I was absolutely correct, however, when I wrote:
>Very strange

I have no idea what was going on with that, I just know that it works now. Hopefully it works for other people as well.

It's hard to convey how strange and frustrating this was, or how much I had to wrestle with ChatGPT to actually figure out the problem, but I don't regret it. It was somewhat fun, I have some more experience, and my poem now looks cooler.

<h3>Part 5: Discussion</h3>

Below, I have included the final version of the Diamante object and its saving and loading, so that I can load the example poems. I created a separate file with just the necessary code, which you could use to generate your own.

In [43]:
class DiamanteFinal:
    def __init__(self, words, nouns, adjs, prtcpls):
        self.topic1, self.topic2 = sample(nouns, k=2)
        self.adj1, self.adj2, self.adj3, self.adj4 = sample(adjs, k=4)
        self.p1, self.p2, self.p3, self.p4, self.p5, self.p6 = sample(prtcpls, k=6)
        self.template1 = "{0} & {1}"
        self.template2 = "{0} {1}" #adj + noun (repeated)
        self.phrasenouns = sample(nouns, k=4)
        self.phraseadjs = sample(adjs, k=2)
        if not randint(0, 1):
            #print(self.phrasenouns[:2])
            self.phrase1 = self.template1.format(*self.phrasenouns[:2])
            self.phrase2 = self.template1.format(*self.phrasenouns[2:])
            self.phrase = f'{self.phrase1}, {self.phrase2}'
            self.phrase_def = f'[{self.phrasenouns[0].deff}] & [{self.phrasenouns[1].deff}], [{self.phrasenouns[2].deff}] & [{self.phrasenouns[3].deff}]'
        else:
            self.phrase1 = self.template2.format(self.phraseadjs[0], self.phrasenouns[0])
            self.phrase2 = self.template2.format(self.phraseadjs[1], self.phrasenouns[1])
            self.phrase = f'{self.phrase1} and {self.phrase2}'
            self.phrase_def = f'[{self.phraseadjs[0].deff}] [{self.phrasenouns[0].deff}] and [{self.phraseadjs[1].deff}] [{self.phrasenouns[1].deff}]'
    def makePoem(self):
        print()
        #display(HTML('<style>{}</style>'.format(mycss)))
        display(HTML(f"<div class='original_text'><center>{self.topic1.name}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.adj1.name}  and {self.adj2.name}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.p1.name}, {self.p2.name}, {self.p3.name}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.phrase}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.p4.name}, {self.p5.name}, {self.p6.name}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.adj3.name}  and {self.adj4.name}</center></div>"))
        display(HTML(f"<div class='original_text'><center>{self.topic2.name}</center></div>"))
        print()
        display(HTML(f"<div class='translation'><center>[{self.topic1.deff}]</center></div>"))
        display(HTML(f"<div class='translation'><center>[{self.adj1.deff}]  and [{self.adj2.deff}]</center></div>"))
        display(HTML(f"<div class='translation'><center>[{self.p1.deff}], [{self.p2.deff}], [{self.p3.deff}]</center></div>"))
        display(HTML(f"<div class='translation'><center>{self.phrase_def}</center></div>"))
        display(HTML(f"<div class='translation'><center>[{self.p4.deff}], [{self.p5.deff}], [{self.p6.deff}]</center></div>"))
        display(HTML(f"<div class='translation'><center>[{self.adj3.deff}]  and [{self.adj4.deff}]</center></div>"))
        display(HTML(f"<div class='translation'><center>[{self.topic2.deff}]</center></div>"))
        print()
        if input("Do you want to save this poem?").lower() in ('y', 'yes'):
            save(self)

In [44]:
class PoemEncoder(json.JSONEncoder):
    """ An encoder for Diamante poems (objects).
    """
    def default(self, obj):
        """ Encode a Diamante poem.
        
        Arguments:
            obj (Diamante): the poem to encode
            
        Returns:
            dict: the Diamante object's attributes, as a dictionary
            if obj is not a Diamante, return the default method of the parent class
        """
        if isinstance(obj, DiamanteFinal):
            return {'topics': (obj.topic1, obj.topic2),
                    'adjectives': (obj.adj1, obj.adj2, obj.adj3, obj.adj4),
                    'participles': (obj.p1, obj.p2, obj.p3, obj.p4, obj.p5, obj.p6),
                    'phrase': (obj.phrase),
                    'phrase definition': (obj.phrase_def)
                    }
        return json.JSONEncoder.default(self, obj)

class FwordEncoder(json.JSONEncoder):
    """ An encoder for Fword (objects).
    """
    def default(self, obj):
        """ Encode an Fword object.
        
        Arguments:
            obj (Fword): the word to encode
            
        Returns:
            dict: the Fword object's attributes, as a dictionary
            if obj is not an Fword, return the default method of the parent class
        """
        if isinstance(obj, Fword):
            return {'name': obj.name,
                    'form': obj.form,
                    'part of speech': obj.pos,
                    'definition': obj.deff
                    }
        return json.JSONEncoder.default(self, obj)

def save(poem, path='goodPoem'):
    with open(f'{path}.json', 'w', encoding='utf-8') as f:
        newattrs = [json.dumps(word, indent=2, cls=FwordEncoder) for word in (poem.topic1, poem.topic2, poem.adj1, poem.adj2, poem.adj3, poem.adj4, poem.p1, poem.p2, poem.p3, poem.p4, poem.p5, poem.p6)]
        #newattrs.append(poem.phrase, poem.phrase_def)
        poem.topic1, poem.topic2, poem.adj1, poem.adj2, poem.adj3, poem.adj4, poem.p1, poem.p2, poem.p3, poem.p4, poem.p5, poem.p6 = newattrs
        json.dump(poem, f, indent=2, cls=PoemEncoder)

def loadFword(word):
    #print(word)
    #print(type(word))
    return Fword(word['name'], word['form'], word['part of speech'], word['definition'])

def load(path, *args, **kwargs):
    with open(path, 'r', encoding='utf-8') as f:
        poemInfo = json.load(f)
    newPoem = DiamanteFinal(*args)
    newTopics, newAdjs, newPs = [], [], []
    newPoem.topic1, newPoem.topic2 = [loadFword(json.loads(topic)) for topic in poemInfo['topics']]
    newPoem.adj1, newPoem.adj2, newPoem.adj3, newPoem.adj4 = [loadFword(json.loads(adj)) for adj in poemInfo['adjectives']]
    newPoem.p1, newPoem.p2, newPoem.p3, newPoem.p4, newPoem.p5, newPoem.p6 = [loadFword(json.loads(p)) for p in poemInfo['participles']]
    newPoem.phrase = poemInfo['phrase']
    newPoem.phrase_def = poemInfo['phrase definition']
    return newPoem

In [50]:
exhibit1 = load('goodPoem_2.json', defined_words, nouns, adjectives, participles)
exhibit1.makePoem()










Do you want to save this poem? no


I enjoy the story of this piece: the rich and powerful attorneys hunting down the gross, poor peasants with a poleaxe. It's an excellent representation of how the rich viewed peasants. Also, I believe attorneys were essentially agents of the king/local nobility, which makes this an even better metaphor for medieval society. I also enjoy the cognates here -- procurator, privee, digne, sowple -- as well as words that seemingly didn't survive the changes to English, like gnof and wlatsom.

![img1](lawyer_attacking_peasants.png)

I generated this image via Canva AI with the prompt:
>An attorney hunting down peasants with a battleaxe in 14th-century England

Which worked very well. I tried to incorporate the dirty, swampy aspect, but it invariably lost other aspects of the prompt.

Note: I decided to generate my illustrations with AI, as these poems are very specifc and unique, and it fits the theme of this assignment/class.

In [51]:
exhibit2 = load('goodPoem_9.json', defined_words, nouns, adjectives, participles)
exhibit2.makePoem()










Do you want to save this poem? no


I love the overly serious, dramatic, doomsday-preacher feel of this poem. It's saying: We tried to warn them, we gave them advice and corrected their sins, and now these sinful women are doomed to fire and brimstone. It keeps a consistent theme throughout, and fits my stereotypes of medieval European religious dogma.

I feel like this isn't a unique enough concept to illustrate, so I'll save that for a more abstract poem.

In [52]:
exhibit3 = load('goodPoem_8.json', defined_words, nouns, adjectives, participles)
exhibit3.makePoem()










Do you want to save this poem? no


I chose this poem for two reasons: the ending, where my program happened to use color appropriately; and the excellent diction overall. It's funny, especially the beginning with its swearing and short, punchy words surprising the reader.

I have no idea how I would illustrate this.

In [53]:
exhibit4 = load('goodPoem_3.json', defined_words, nouns, adjectives, participles)
exhibit4.makePoem()










Do you want to save this poem? no


I love how consistent the word choice is here: a wedding, wedding party, and a priest's assistant. The adjectives are also consistently negative, creating a loose narrative here. For me, it's something like: two young people are getting married, too young to know what they're getting into, and too poor to afford a nice wedding and after-party. Then, there's some kind of accident, or maybe a plot, where either the parish clerk gets hurt or tries to hurt someone else. It's pretty impressive how consistent the nouns were, given it's pulling from 1000 possible nouns.

I also wanted to note that this poem was generated by an older version of my code, where the participles could repeat. I actually don't mind the affect here, them being right next to each other creates emphasis instead of annoyance or boredom from repetition.

![img2](bad_wedding.png)

This was also generated by Canva AI, with the prompt:
>a poor, dangerous wedding in 14th century England

I tried the simpler "14th century English shotgun wedding" but it didn't quite understand that, so it just generated generic medieval wedding pictures. I like how it conveys danger, with the cracked floor and various hooded people in shadow.

In [49]:
exhibit5 = load('goodPoem_10.json', defined_words, nouns, adjectives, participles)
exhibit5.makePoem()










Do you want to save this poem? no


This one also tells a fantastic story. A husband cheats on his wife with a "suitable beggar-woman", perhaps as revenge for her sleeping with his uncle (or maybe that was the revenge, and the husband cheated first), and in their evil, renounce the Bible as "loathsome". Or something like that. The third line even implies the previous marital discord that led to this complete collapse. I do think the fifth line is out of place, seemingly unconnected to the rest of the poem, but that's its only serious flaw.

![img3](dark_reflection.png)

Again, I generated this image with the phrase:
>collapsing adulterous marriage in 14th century England

I find it interesting that it didn't include any actual evidence of the marriage, instead just a man with either his friends or guards (or maybe metaphorical representations of his morality and temptation) contemplating in a dark tunnel. It really covers the last section of the poem, where he decides to renounce his faith in darkness. Very cool.

I definitely take ownership of my code, I spent many hours writing and debugging it. With that said, I feel much less ownership over the poems themselves. The actual words were written by Chaucer and described by the authors of the Librarius website, and the diamante format was invented in the 1960s and suggested by Professor Kraus. I did exercise some creative control over the poems, but through my code: including 'verbal nouns' with participles and adding a custom font being the two biggest examples. I considered curating the available words, and an improved version would likely involve some kind of curation, but I didn't have the time to actually comb through the ~1300 relevant words. I've only seen one offensive word so far, so it would be more for taste than moderation.

I'm not sure exactly why, but this set of words is really excellent for this kind of poetry. I think every poem I've generated so far has been interesting, choosing which poems to show here was very difficult. I have several possible suggestions for why this worked so well:
<ul>
<li>The literary merit and entertainment value of Chaucer's original work</li>
<li>Librarius chose the most interesting words in their curation</li>
<li>The novelty of Middle English vocabulary and phonetics</li>
<li>The sheer number of available nouns and adjectives creates significant variation, sustaining its novelty</li>
<li>I just really like Middle and Old English, and I invested so much work into this project that I'm wearing thick rose-tinted glasses</li>
</ul>
Most likely, all these factors are what make the poems so consistently good.

Each poem tends to develop its own style. Some are campy, some are funny and sardonic, others feel serious and thoughtful. The sheer variety of possible words allows for a diversity of themes, and yet the word quality holds up. Like I've said several times now, I initially wanted to curate the words, not only for increased creative expression but also to ensure a higher-quality pool of possible poems. However, like I said earlier, it turned out so well without the curation that I believe any efforts to trim down the available words would only produce incremental changes in the quality of potential poems.

I think this project was a great success. I'm very happy with how it turned out, I had fun writing the code and generating cool poetry, and I gained some experience wrestling with Jupyter Notebook and CSS.