# Assignment 6: Finding rhyming alternate wording
Here, I will revisit my third assignment -where I mashed some existing poems and texts- and try to give it some rhythmic components using similar words and phonetic proximity.

In [1]:
# python packages
import random as rng
import numpy as np
from numpy import dot
from numpy.linalg import norm

In [2]:
# spacy
import spacy
nlp = spacy.load('en_core_web_md')

In [3]:
# pronouncing
import pronouncing

## Text sources
I will use Eileen Myles' poems *"Peanut Butter"*, *"Each Defeat"* and *"Our Happiness"* as source poems.

In [9]:
poems = [line.strip() for line in open('./sources/eileen_peanutbutter.txt', 'r')]
poems.extend([line.strip() for line in open('./sources/eileen_eachdefeat.txt', 'r')])
poems.extend([line.strip() for line in open('./sources/eileen_ourhappiness.txt', 'r')])

I need to filter out the empty lines from these texts. I could use list comprehension, but of course there is a faster and better way to do it, which someone in stackoverflow has posted about

Source: https://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings 

In [10]:
poems = list(filter(None, poems))

In [13]:
rng.sample(poems, 5)

['in ponds &',
 'laundry. I just saw a coyote',
 'instant mouth',
 'Yellow, just kind',
 'in the vastly']

## Tools for similarity

Let's create some functions and load all the tokens to use the word2vec information later on.

In [30]:
tokens = list(set([w.text for w in nlp(' '.join(poems)) if w.is_alpha]))

In [32]:
# get spacy vector
def vec(s):
    return nlp.vocab[s].vector

# cosine similarity
def cosine(v1, v2):
    if norm(v1) > 0 and norm(v2) > 0:
        return dot(v1, v2) / (norm(v1) * norm(v2))
    else:
        return 0.0

# closest word to target vector from token list
def spacy_closest(token_list, vec_to_check, n=10):
    return sorted(token_list, key=lambda x: cosine(vec_to_check, vec(x)), reverse=True)[:n]

In [60]:
# get tokens form a list of words
def token_list(words):
    return list(set([w.text for w in nlp(' '.join(words)) if w.is_alpha]))

## Finding rhyming alternate wording

I want to derive words from a random line of the poems. This will replicate the rhytmic pattern and will be picked based on the similarity to the source word.

Sources: https://stackoverflow.com/questions/26132770/python-finding-longest-shortest-words-in-a-list-and-calling-them-in-a-function

In [14]:
def longest_word(line):
    words = line.split(' ')
    sortedwords = sorted(words, key=len)
    return words[-1]

In [16]:
# testing what I want to do with one line
test_line = rng.sample(poems, 1)[0]
test_word = longest_word(test_line)
test_stress = pronouncing.stresses(test_word)
test_stress

''

In [19]:
test_word

'sand-'

It is great that this happened right away, because it reminds me that I need to:
- filter some characters from the poem or line or word I select
- need to be aware that pronouncing returns an empty character if the word is not on it's list

And, as I will do this a lot of times, I will just create a function that iterates over the words to find the perfect one. And as I'm already using spacy, I'll use nlp to separate the words (there are so many punctuation characters that will ruin everything!)

In [64]:
def perfect_words(line, n=10):
    # filter characters
    words = [item.text for item in nlp(line)]
    # sort them from longest to shortest
    sorted_words = sorted(words, key=len, reverse=True)
    # go through them searching for a word that exists in the pronouncing library
    for w in sorted_words:
        chosen_word = w
        word_phones = pronouncing.phones_for_word(w)[0]
        stress = pronouncing.stresses(word_phones)
        if stress is not '':
            break
    # look for words with similar stresses
    similar_stresses = pronouncing.search_stresses('^'+stress+'$')
    # get the ones closest to the starting word
    sim_stress_tokens = token_list(similar_stresses)
    chosen_vec = vec(chosen_word)
    # and return some of them
    closest_words = spacy_closest(sim_stress_tokens, chosen_vec, n)
    return [chosen_word, closest_words]

In [69]:
# let's test it!
test_line = rng.sample(poems, 1)[0]
print(test_line)

test_match = perfect_words(test_line)
print(test_match[0])
print(test_match[1])

a woman with
woman
['woman', 'lady', 'women', 'mother', 'pregnant', 'prego', 'boy', 'youngster', 'husband', 'men']


It works!

Now, on to testing with some creations

In [73]:
# FORM 1

# get a line
line = rng.choice(poems)
# find the similarly stressed ones
simil = perfect_words(line, 4)
word_chosen = simil[0]
word_rhymes = simil[1]
# recreate the line with the rhytmic synonyms
new_line = line.replace(word_chosen, ', '.join(word_rhymes))
print(new_line)

Each defeat, defeats, prevailed, against


In [77]:
# FORM 2

# get a line
line = rng.choice(poems)
# find the similarly stressed ones
simil = perfect_words(line)
word_chosen = simil[0]
word_rhymes = simil[1]
# recreate the line with the rhytmic synonyms
for i in range( min(len(word_rhymes),4) ):
    new_line = line
    print(new_line.replace(word_chosen, word_rhymes[i]))

we sat on a stoop
we sat on a wined
we sat on a tarred
we sat on a shames


Sadly, this doesn't make that much sense in these poems. The lines are too short to get anything midly interesting. And it fails quite often 😩

It seems like this idea was never too good, so I'll just move away from it. A better option might be doing this on texts.

## Maniac characters

It would be better to make someone say a long line of text and then, repeat the last bit with some differences every time. 

In [81]:
em_doc  = kkbb_script = [line.strip() for line in open("./sources/emmeline_iincitethismeetingtorebellion.txt").readlines()]
em_text = nlp( ' '.join(em_doc) )
em_sent = [line.text.strip() for line in list(em_text.sents)]

In [99]:
em_line = rng.choice(em_sent)
em_line_words = [item.text for item in nlp(em_line)]
em_line_end = em_line_words[-5:]
em_line_end

['principle', '-', 'do', 'so', '.']

I have a lot of punctuation signs, I need to get rid of those before proceeding

In [110]:
def no_punct(word_list):
    banned = ['-', ',', '.', '_', '!', ';', ':', '?', '"', "'"]
    return [item for item in word_list if item not in banned]

In [109]:
em_line = rng.choice(em_sent)
em_line_words = [item.text for item in nlp(em_line)]
em_line_end = no_punct(em_line_words[-5:])
em_line_end

['keep', 'me', 'in', 'prison']

In [117]:
for i in range(5):
    em_line = rng.choice(em_sent)
    em_line_words = [item.text for item in nlp(em_line)]
    em_line_end = em_line_words[-5:]
    result = perfect_words( ' '.join(no_punct(em_line_end)) )
    word_chosen = result[0]
    word_simil = result[1]
    print(em_line)
    for j in range(min(len(word_simil)-1,4)):
        new_line = ' '.join(em_line_end).replace(word_chosen, word_simil[j+1]).replace(' .','.')
        print(new_line)
    print()

Some of them tell us that other things are more important than the liberty of women-than the liberty of working women.
liberties of working women.
minuteman of working women.
servitude of working women.
overturn of working women.

It is not the method of women.
the methods of women.
the using of women.
the constants of women.
the linkage of women.

There is something that governments care far more for than human life, and that is the security of property, and so it is through property that we shall strike the enemy.
shall strikes the enemy.
shall struck the enemy.
shall force the enemy.
shall hit the enemy.

Be militant each in your own way.
in you own way.
in our own way.
in own own way.
in my own way.

It is better that those who cannot agree, cannot see eye to eye as to policy, should set themselves free, should part, and should be free to continue their policy as they see it in their own way, unfettered by those with whom they can no longer agree.
can no even agree.
can no shorter 