# <ins>tankanizer</ins>

## A Markov-chain tanka (poem) generator
- Counts syllables to produce poems with 5-7-5-7-7 syllabic structure.

## Table of contents

1. [Import functions and packages](#Import-functions-and-packages)
2. [Loading and preparing text](#Loading-and-preparing-text)
3. [Creating Markov chain dictionary](#Creating-Markov-chain-dictionary)
4. [Generate!](#Generate!)

## Import functions and packages

[[go back to the top](#tankanizer)]

- All you need to run the generator are the [functions](functions.py) file associated with this repo, a method of tokenization, and a text file.

In [1]:
# functions for this project
from functions import *

# word list creation options
from nltk.tokenize import RegexpTokenizer

# read json files
import json

# reload functions/libraries when edited
%load_ext autoreload
%autoreload 2

# Loading and preparing text

[[go back to the top](#tankanizer)]

- Load a text file (the longer the better!)

In [2]:
# uncomment to load
with open('data/whitman_poems.txt') as hello:
    corpus = hello.read()

- In order for words to be recognized by the ```phones_for_word``` function, you need to get it into proper format by lowercasing, correcting any characters with nonstandard encoding, and removing any hyphens that may create compound words not found supported within the function.
- Although the ```whitman_poems.txt``` file doesn't require it, I've provided an example of some processing that may be necessary.

*NOTE: Depending on your corpus, more (or less) processing of the text may be required.*

In [3]:
# lowercase text, correct apostrophe, convert hyphens to spaces
corpus_formatted = corpus.lower().replace("’", "'").replace('-', ' ').replace('—', ' ')
corpus_formatted[:105]

'primeval my love for the woman i love, \n o bride! o wife! more resistless, more enduring than i can tell,'

- Tokenize text, i.e. convert string into a list of words.
    - There are several ways to do this, the simplest being ```text.split()```.
    - Below is a slightly more sophisticated way that disregards numbers and punctuation; it also keeps words with apostrophes intact.

In [4]:
# pattern that grabs all words, without splitting them on apostrophes
tokenizer = RegexpTokenizer(pattern="[a-zA-Z']+")

In [5]:
# create a list of words from the corpus
corpus_words = tokenizer.tokenize(corpus_formatted)
corpus_words[:20]

['primeval',
 'my',
 'love',
 'for',
 'the',
 'woman',
 'i',
 'love',
 'o',
 'bride',
 'o',
 'wife',
 'more',
 'resistless',
 'more',
 'enduring',
 'than',
 'i',
 'can',
 'tell']

# Creating Markov chain dictionary

[[go back to the top](#tankanizer)]

- The ```countable_corpus``` function ensures that words within the dictionary have a known syllable quantity.
- This is necessary for the ```tankanizer``` function to work properly.

In [6]:
# instantiate a dictionary
defaultdict(list)

# create Markov dictionary
corpus_dictionary = countable_corpus(corpus_words)

### 💾 Save/Load Markov dictionary

In [7]:
# # uncomment to save
# with open('whitman_dictionary.json', 'w') as output:
#     json.dump(corpus_dictionary, output)

# # uncomment to load
# with open('whitman_dictionary.json', 'r') as f:
#     corpus_dictionary = json.load(f)

## Generate!

[[go back to the top](#tankanizer)]

In [8]:
# print 10 tankas

n = 10
for i in range(n):
    print('\n----------------------------\n')
    print(f'     TANKA #{i+1}\n')
    print(tankanizer(corpus_dictionary))
    
    if i == n-1:
        print('\n----------------------------')


----------------------------

     TANKA #1

byzantium the
government in wafted soft
astral but it my
voice saw him the body is
no one between sin remorse

----------------------------

     TANKA #2

names but rare has come
from missouri georgia to
be changed so grand roads
of his head and women the
soul to quell america

----------------------------

     TANKA #3

anything else that
strode before me my birth of
the trod by under
the murderer or how all
of women shall be a move

----------------------------

     TANKA #4

continued singing
inhaling the cutting the
mind of his flesh was
on bays lagoons creeks and night
passage to harp or far out

----------------------------

     TANKA #5

atlantic breezes
wafted inland or speaks it
has the houses these
are you shall we dare not to
quell america to it

----------------------------

     TANKA #6

trumpets the heavy
stones beautiful face thy in
all that was over
come with pouring cataracts
plants rivers by night pervades

----------

In [9]:
# print 100 tankas

n = 100
for i in range(n):
    print('\n----------------------------\n')
    print(f'     TANKA #{i+1}\n')
    print(tankanizer(corpus_dictionary))
    
    if i == n-1:
        print('\n----------------------------')


----------------------------

     TANKA #1

gunner and must not
let me i pierce men's and where
the shoot run out of
space know if a century
marches of his grown lady

----------------------------

     TANKA #2

novels plots of us
as the and younger brothers
that they roll slowly
continually up the
out in the earth i see the

----------------------------

     TANKA #3

veering and we are
free in leaden rain and words
of an aroma
sweet and the streets and me to
you alone pale floating in

----------------------------

     TANKA #4

tallying the mists
and in the torn bodies of
have died aged fierce pangs
visions sweats the great and rail
roads to you to for their of

----------------------------

     TANKA #5

rushing and are so
amazing and dante nor
may see male or said
and sinks again in the wind
brings a swift o henceforth try

----------------------------

     TANKA #6

sometimes known half and
the graceful palmetto i
must not nature now
doubtless left and white in him
who are

joy of old love sick
and the bare swim above all
sweet singer love and
wonders within me my heart's
day's work with plaudits in his

----------------------------

     TANKA #53

equipping like some
hidden prophetic not if
after death and the
notice high with its spiral
whirl and breathing his own part

----------------------------

     TANKA #54

wringing fingers the
day i hear the fugitive
slave is the same as
of ontario erie
huron michigan then i

----------------------------

     TANKA #55

day's work in my friend
usher to form location
all then amount and
smoother than attraction i
look for me and women that

----------------------------

     TANKA #56

rail prop wainscot jamb
of connecting rods the shut
eyes from the words of
asia ah now the sea in
its own soul nothing and is

----------------------------

     TANKA #57

leading wherever
and lie risen with his slight
ready to raise high
and extract thus far distant
over the beach at dawn from

----------------------------

  

swimmer naked wan from the

----------------------------
