# <ins>auto_pometizer</ins>

## A Markov-chain poetry generator

## Table of contents

1. [Import functions and packages](#Import-functions-and-packages)
2. [Scraping and compiling poetry](#Scraping-and-compiling-poetry)
3. [Creating Markov chain dictionary](#Creating-Markov-chain-dictionary)
4. [Generate!](#Generate!)
    - [Regular style](#Regular-style)
    - [Rhyming styles](#Rhyming-styles)

## Import functions and packages

[[go back to the top](#auto_pometizer)]

- If running the code from the beginning, you may need to install certain packages, such as [wordninja](https://github.com/keredson/wordninja). Uncomment the next cell.

In [1]:
# !pip install wordninja

In [1]:
# custom functions
from functions import *

# readers and necessary libraries
from collections import defaultdict
import json
import re
import string
import random

# word segmenter
import wordninja

# Scraping and compiling poetry

[[go back to the top](#auto_pometizer)]

- Make API calls to obtain all available authors, each author's available poems, and each poem itself.

In [2]:
# URL to grab author names
base_url = "https://poetrydb.org/author"

In [3]:
# get a list of authors and check out a sample
authors = author_grabber(base_url)
authors[0]

'Adam Lindsay Gordon'

In [4]:
# loop over authors list to obtain titles for each
titles_grouped = [title_grabber(author) for author in authors]

# loop over list of lists of titles grouped by author
titles = [title for author in titles_grouped for title in author]

# check out a sample
titles[0]

'A Song of Autumn'

In [5]:
# if any poems weren't scraped, the title will be listed below this message
print('The following poems were not successfully scraped:')
# loop over titles list to obtain poems as strings
poems_list = [poem_grabber(title) for title in titles]

The following poems were not successfully scraped:


In [6]:
# check out a sample
poems_list[0:2]

['where shall we go for our garlands glad \n at the falling of the year \n when the burntup banks are yellow and sad \n when the boughs are yellow and sere \n where are the old ones that once we had \n and when are the new ones near \n what shall we do for our garlands glad \n at the falling of the year \n child can i tell where the garlands go \n can i say where the lost leaves veer \n on the brownburnt banks when the wild winds blow \n when they drift through the deadwood drear \n girl when the garlands of next year glow \n you may gather again my dear \n but i go where the last years lost leaves go \n at the falling of the year',
 'the ocean heaves around us still \n with long and measured swell \n the autumn gales our canvas fill \n our ship rides smooth and well \n the broad atlantics bed of foam \n still breaks against our prow \n i shed no tears at quitting home \n nor will i shed them now \n \t \n against the bulwarks on the poop \n i lean and watch the sun \n behind the red ho

In [7]:
# make sure everything got pulled
len(titles), len(poems_list), type(poems_list)

(3118, 3118, list)

- For the Markov chain dictionary, I want to keep in newline and tab characters as if they are words in the poem, so I've added spaces between them and their preceding/following words.
- I'll do the same for joining the poems into one big string.

In [8]:
# join poems together into one string
poems_string = ' \n '.join(poems_list)

In [9]:
# check it some characteristics
len(poems_string), type(poems_string)

(12175686, str)

### 💾 Save/Load poem string

In [None]:
# # uncomment to save
# with open('poems_raw.txt', 'w') as output:
#     output.write(poems)

# # uncomment to load
# f = open('poems_raw.txt', 'r')
# poems_raw = f.read()

# Creating Markov chain dictionary

[[go back to the top](#auto_pometizer)]

- If you have any adjustments before turning into a dictionary, open the text file and proceed from here.
- I want to keep the newline and tab characters, so I temporarily change them to different words. Since I'll eventually be segmenting the string, I choose two words that don't appear in the poetry (which is all pre-1900) and that the segmenter will recognize as a single word.

In [11]:
# substitute endline characters
poems_edit = re.sub(r'\n', 'airplane', poems_raw)

# substitute tab characters
poems_edit = re.sub(r'\t', 'automobile', poems_edit)

# check out a sample
poems_edit[:500]

'where shall we go for our garlands glad airplane at the falling of the year airplane when the burntup banks are yellow and sad airplane when the boughs are yellow and sere airplane where are the old ones that once we had airplane and when are the new ones near airplane what shall we do for our garlands glad airplane at the falling of the year airplane child can i tell where the garlands go airplane can i say where the lost leaves veer airplane on the brownburnt banks when the wild winds blow air'

- Since some of the original source material is formatted poorly, especially multiple words jammed together without spaces, I run the WordNinja segmenter to split them up.

*NOTE: There is some collateral damage here and some words that should not be segmented get segmented. For a different technique using a tokenizer, look at the [tokenized version](poem_generator_workbook_punct_token.ipynb) in the scrap_files folder.*

In [12]:
# segment poem string into list of words
poems_segmented = wordninja.split(poems_edit)

# check out a sample
poems_segmented[15:20]

['airplane', 'when', 'the', 'burnt', 'up']

- Find any single letters (other than *a* and *i* and the very poetic *o*) that are hanging around, as they detract from the generated poems.
    - Replace them with 'automobile', which is currently the equivalent of '\t', because you can never have enough tabs when trying to make a poem look more contemporary :P

In [13]:
# list of letters minus a, i, and o
single_letters = ['b','c','d','e','f','g','h','j','k','l','m','n','p','q','r','s','t','u','v','w','x','y','z']

# update list of words
poems_segmented = [word if word not in single_letters else word.replace(word, 'automobile') for word in poems_segmented]

# check out a sample
poems_segmented[15:20]

['airplane', 'when', 'the', 'burnt', 'up']

- Create a dictionary with each word present in the word list ```poems_segmented``` as the key and each word that follows that now-key as part of a list of values.

In [14]:
# instantiate a dictionary
poems_dictionary = defaultdict(list)

# create Markov dictionary
for current_word, next_word in zip(poems_segmented, poems_segmented[1:]):
    poems_dictionary[current_word].append(next_word)

# check out a sample
poems_dictionary['land'][:5]

['must', 'airplane', 'airplane', 'airplane', 'while']

- After changing the 'airplane' and 'automobile' values back to newline and tab characters via the ```lines_tabs_creator``` function, I change their respective keys in the dictionary accordingly.

In [16]:
# revert back to endline and tab characters
poems_dictionary = lines_tabs_creator(poems_dictionary, endline_sub='airplane', tab_sub='automobile')

# replace endline and tab keys
poems_dictionary['\n'] = poems_dictionary.pop('airplane')
poems_dictionary['\t'] = poems_dictionary.pop('automobile')

# check out a sample
poems_dictionary['land'][:5]

['must', '\n', '\n', '\n', 'while']

### 💾 Save/Load Markov dictionary

In [None]:
# # uncomment to save
# with open('poems_dictionary.json', 'w') as output:
#     output.write(poems_dictionary)

# # uncomment to load
# with open("poems_dictionary.json", "r") as f:
#     poems_dictionary = json.load(f)

# Generate!

[[go back to the top](#auto_pometizer)]

- If you want to get right to generating some poems, proceed from here and run the auto_pometizer function after opening the json file.

*NOTE: You must import functions and packages at the beginning of the notebook.*

## Regular style

In [18]:
# run the function, which returns a generated poem string, while also printing by default (can be turned off)
auto_pome = auto_pometizer(poems_dictionary)

What length doth thy sweet nothings require? 50



 fuego south of an air 
 but the tale which i would i doubt any knight 
 i break them in a young so loud intemperance subsides soon is thick the cough 
 what need 
 oh that the highest design 
 minot tis pitt 	 without any 
 in


## Rhyming styles

In [19]:
# generator with end words replaced with rhymes
auto_pome = auto_pometizer(poems_dictionary, to_rhyme='endline')

What length doth thy sweet nothings require? 50



 feliciano 
 as all defence while fish knape 
 	 	 and the 	 
 joaquim 
 khat 
 mar you have not the soothing friendships tears in its stride 
 	 
 and was the acquiescence 
 ye played so old condemns 
 foundered one pervading manifold i want


In [20]:
# generator with random words replaced with rhymes
auto_pome = auto_pometizer(poems_dictionary, to_rhyme='random')

What length doth thy sweet nothings require? 50



 gesture stiffed 
 	 
 or silly damian about 
 jac fo there was savory 
 the villagers 
 annunciation lilies cast piss and not in the lambing dulness with your dolly im conjecturing truth by god tho he stands at home o us keep 
 i will 
 	


In [21]:
# generator with all words replaced with rhymes
auto_pome = auto_pometizer(poems_dictionary, to_rhyme='all')

What length doth thy sweet nothings require? 50



 co-workers belove ter ned ag brookes shultz's barrows in the patrolled 
 	 colleague's fate kuan 
 cy fight in rely apart intertan 
 	 says accost in pi worthey rattan yau secrete invoiced neighbouring hamlets scum the futher the sonny souers bizarre the dartt labov the superpowers' 
 creagh
