# Poem-O-Matic

This is a description, in both code and prose, of how to generate original poetry on demand using a computer and the Python programming language. It's based upon work done in the London Python Code Dojo with Dan Pope and Hans Bolang. I've taken some of our original ideas and run with them. The first idea is very simple:

_If you re-assemble unrelated lines from different poems into a new poetic structure, you get a pretty convincing __new__ poem._

The important "poetic sounding" langauge is already created by real poets - we're just going to reassemble lines from these poems to make new ones.

Where can we get such source material..? Easy, the wonderful resource that is [Project Gutenberg](https://www.gutenberg.org/wiki/Main_Page). 

I've randomly selected the following anthologies as the source material for this project:

* [The Sonnets by Shakespeare](https://www.gutenberg.org/ebooks/1105)
* [The World's Best Poetry, Volume 04: The Higher Life by Gladden and Carman](https://www.gutenberg.org/ebooks/12759)
* [Leaves of Grass by Walt Whitman](https://www.gutenberg.org/ebooks/1322)
* [A Book of Nonsense by Edward Lear](https://www.gutenberg.org/ebooks/13646)
* [The Golden Treasury by Francis Turner Palgrave and Alfred Pearse](https://www.gutenberg.org/ebooks/19221)
* [A Child's Garden of Verses by Robert Louis Stevenson](https://www.gutenberg.org/ebooks/19722)
* [The Peter Patter Book of Nursery Rhymes by Leroy F. Jackson](https://www.gutenberg.org/ebooks/22014)
* [The Aeneid by Virgil](https://www.gutenberg.org/ebooks/228)
* [Songs of Childhood by Walter De la Mare](https://www.gutenberg.org/ebooks/23545)
* [Poems Chiefly from Manuscript by John Clare](https://www.gutenberg.org/ebooks/8672)
* [A Treasury of War Poetry: British and American Poems of the World War 1914-1917](https://www.gutenberg.org/ebooks/8820)

I've put plain text versions of these works in the `sources` directory, and manually removed the prose elements of these files (introductions, titles, author's names etc).

## Consuming Source Poetry

First, we need to get a list of all the source files:

In [None]:
from os import listdir
from os.path import isfile, join

mypath = 'sources'

filenames = [join(mypath, f) for f in listdir(mypath) if isfile(join(mypath, f))]
print(filenames)

Next, we need to load each file and extract the lines of poetry into a `set` of all known lines of poetry:

In [None]:
LINES_OF_POETRY = set()  # All our lines will be added to this set.

for source_file in filenames:  # For each source file...
    with open(source_file) as source:  # Open it as the object 'source'
        for line in source.readlines():  # Now, for each line in the new 'source' object,
            clean_line = line.strip()  # remove all the leading and trailing whitespace from the line,
            clean_line += '\n'  # re-add a newline character,
            LINES_OF_POETRY.add(clean_line)  # and add it to the set of all lines of poetry
            
print('We have {} unique lines of poetry.'.format(len(LINES_OF_POETRY)))

## Cleaning and Transforming the Data

In order to re-combine these lines into new poems we need to work out how the lines relate to each other in terms of rhyming. To do this we need to know about phonemes - the sounds that make up speech. The `cmudict.0.7a.phones` file contains definitions and categorisations (vowel, frictive, etc) of the phonemes used in English:

In [None]:
# Load the phoneme table
with open('cmudict.0.7a.phones') as phoneme_definitions:
    PHONEMES = dict(line.split() for line in phoneme_definitions.readlines())

print(PHONEMES)

Next, we create a simple function to determine if a phoneme is a vowel:

In [None]:
def is_vowel(phoneme):
    """
    A utility function to determine if a phoneme is a vowel.
    """
    return PHONEMES.get(phoneme) == 'vowel'

The `cmudict.0.7a` file contains a mapping of spelled words to pronunciations expressed as phonemes:

In [None]:
# Create a rhyming definition dictionary
with open('cmudict.0.7a') as pronunciation_definitions:  # Load the CMU phoneme definitions of pronunciation.
    PRONUNCIATIONS = pronunciation_definitions.readlines()

print(PRONUNCIATIONS[80:90])

We're in a position to create a rhyme dictionary we can use to look up words and discover rhymes.

In [None]:
import re
RHYME_DICTIONARY = {}
for pronunciation in PRONUNCIATIONS:  # For each pronunciation in the list of pronunciations,
    pronunciation = re.sub(r'\d', '', pronunciation)  # strip phomeme stresses in the definition (not interesting to us),
    tokens = pronunciation.strip().split()  # get the tokens that define the pronunciation,
    word = tokens[0]  # the word whose pronunciation is defined is always in position zero of the listed tokens,
    phonemes = tokens[:0:-1]  # the phonemes that define the pronunciation are the rest of the tokens. We reverse these!
    phonemes_to_rhyme = []  # This will hold the phonemes we use to rhyme words.
    for phoneme in phonemes:
        phonemes_to_rhyme.append(phoneme)
        if is_vowel(phoneme):
            break  # We only need to rhyme from the last phoneme to the final vowel. Remember the phonemes are reversed!
    RHYME_DICTIONARY[word] = tuple(phonemes_to_rhyme)
print('There are {} items in the rhyme dictionary.'.format(len(RHYME_DICTIONARY)))

Given that we're rhyming the last word of each line, we need a function to identify what the last word of any given line actually is:

In [None]:
def last_word(line):
    """
    Return the last word in a line (stripping punctuation).

    Raise ValueError if the last word cannot be identified.
    """
    match_for_last_word = re.search(r"([\w']+)\W*$", line)
    if match_for_last_word:
        word = match_for_last_word.group(1)
        word = re.sub(r"'d$", 'ed', word)  # expand old english contraction of -ed
        return word.upper()
    raise ValueError("No word in line.")

The next step is to collect all the lines from our source poems into lines that all rhyme.

In [None]:
from collections import defaultdict

lines_by_rhyme = defaultdict(list)
for line in LINES_OF_POETRY:
    try:
        rhyme = RHYME_DICTIONARY[last_word(line)]
    except (KeyError, ValueError):
        continue
    lines_by_rhyme[rhyme].append(line)

LINES_THAT_RHYME = [l for l in lines_by_rhyme.values() if len(l) > 1]

print("Number of rhymes found is: {}".format(len(LINES_THAT_RHYME)))

The final transformation of the data is to group the individual rhymes into ending words (so all the lines that end in "look", "nook" and "book" are collected together, for example). This well help us avoid rhyming lines with the same word.

In [None]:
RHYME_DATA = []
for lines in LINES_THAT_RHYME:
    lines_by_word = defaultdict(list)
    for line in lines:
        end_word = last_word(line)
        lines_by_word[end_word].append(line)
    RHYME_DATA.append(dict(lines_by_word))

print(RHYME_DATA[1:3])

## Generating Poetry

Given the data found in `RHYME_DATA` we're finally in a position to reassemble rhyming lines from our source poems to make new poetry.

It's important to make sure that, no matter the content of the final line, we ensure it ends with the correct punctuation. So we make a function to do this for us:

In [None]:
def terminate_poem(poem):
    """
    Given a list of poem lines, fix the punctuation of the last line.

    Removes any non-word characters and substitutes a random sentence
    terminator - ., ! or ?.
    """
    last = re.sub(r'\W*$', '', poem[-1])
    punc = random.choice(['!', '.', '.', '.', '.', '?', '...'])
    return poem[:-1] + [last + punc]

We also need to be able to define a rhyme scheme. For example, `"aabba"` means the first, second and fifth lines all rhyme (`a`) and the third and fourth lines rhyme (`b`). We could, of course write other schemes such as: `"aabbaaccaa"`. Nevertheless, the `"aabba"` scheme is a safe default.



In [None]:
import random
from collections import Counter


def build_poem(rhyme_scheme="aabba", rhymes=RHYME_DATA):
    """
    Build a poem given a rhyme scheme.
    """
    groups = Counter(rhyme_scheme)  # Work out how many lines of each sort of rhyming group are needed
    lines = {}  # Will hold lines for given rhyming groups.
    for name, number in groups.items():
        candidate = random.choice([r for r in rhymes if len(r) >= number])  # Select candidate rhymes with enough lines.
        word_ends = list(candidate.keys())  # Get the candidate rhyming words.
        random.shuffle(word_ends)  # Randomly shuffle them.
        line_to_use = []  # Will hold the lines selected to use in the final poem for this given rhyming group.
        for i in range(number):  # For all the needed number of lines for this rhyming group,
            lines_to_use.append(random.choice(candidate[word_ends.pop()]))  # Randomly select a line for a new word end.
        lines[name] = lines_to_use  # Add the lines for the rhyming group to the available lines.

    # Given a selection of lines, we need to order them into the specified rhyming scheme.
    poem = []  # To hold the ordered list of lines for the new poem.
    for k in rhyme_scheme:  # For each rhyming group name specification for a line...
            poem.append(lines[k].pop())  # Simply take a line from the specified rhyming group.
    return terminate_poem(poem)  # Return the result as a list with the final line appropriately punctuated.

Finally, we can call the `build_poem` function to get a list of the lines for our new poem.

In [None]:
my_poem = build_poem()  # Get an ordered list of the lines encompassing my new poem.
poem = ''.join(my_poem)  # Turn them into a single printable string.
print(poem)

Example output:

```
Breake ill eggs ere they be hatched:
The flower in ripen'd bloom unmatch'd
Sparrows fighting on the thatch.
And where hens lay, and when the duck will hatch.
Though by no hand untimely snatch'd...
```

You could also change the rhyming scheme too:

In [None]:
my_poem = build_poem('aabbaaccaa')
poem = ''.join(my_poem)
print(poem)