# Rhyme counter

**Àlex R. Atrio and Andrei Popescu-Belis, HEIG-VD/HES-SO and EPFL, 2023**

This notebook defines functions that verify if two verses rhyme or not, then count the number of rhymes in poems.

This is the rhyming measure used in Section 2 of our paper "[GPoeT: a Language Model Trained for Rhyme Generation on Synthetic Data](https://aclanthology.org/2023.latechclfl-1.2/)" by Popescu-Belis A., Atrio A.R. et al., presented at the [LaTeCH-CLfL 2023 workshop](https://aclanthology.org/volumes/2023.latechclfl-1/).

The main function is `test_rhyme()` which returns '2' if two verses (lines) have a *perfect rhyme*, '1' if they have an *assonant rhyme*, and '0' otherwise (see https://rhymenow.com/types-of-rhymes for definitions of these cases).  The function relies on `verse2rhyme()` which, given a verse (line), returns an array with two elements: (1) the perfect rhyme (i.e. a word ending) and (2) the assonant rhyme (another ending).  These are determined based on the last word of the verse, removing punctuations, and solving contractions.  If the ending word is not found in the dictionary, the function looks for the most similar one that can be found.

Both functions rely on a dictionary of rhymes, which contains for each word its *perfect rhyme* and its *assonant rhyme*.  The dictionary can be loaded by calling `get_rhyming_dictionary` with the path to the dictionary, which is stored in the  `rhyming_dictionaries.pickle` file, also provided here (4.3 MB).  The file contains in reality three Python dictionaries (words -> rhymes but also perfect rhymes -> words and assonant rhymes -> words) in order to improve efficiency when generating poems.  They were built from a phonetic dictionary of English provided by [CMU Sphinx](http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/sphinxdict/) using the functions in the notebook `create_rhyming_dictionary_from_sphinx.ipynb`, also provided here.

At the end, this notebook  contains three example of measuring rhymes on poems by Emily Dickinson, Shakespeare and W.E.Henley.

As explained in our [GPoeT paper](https://aclanthology.org/2023.latechclfl-1.2/), we validated the `test_rhyme()` function using a cleaned subset of the [Chicago Rhyming Poetry Corpus](https://github.com/sravanareddy/rhymedata) with manually annotated rhymes, and found that:
* it correctly finds 80% of true rhymes (most of them being considered as perfect)
* it correctly finds that about 90% of the non-rhyming words are indeed not rhymes.

In [1]:
import pickle
import os
from nltk.tokenize import word_tokenize
# nltk.download('punkt') # may be needed the first time
import difflib

In [2]:
def get_rhyming_dictionary(path="./rhyming_dictionaries.pickle"):
    """Pickle contains 3 dictionaries for faster search (hashes: O(1))
    {word: [perfect_rhyme, assonant_rhyme]}, {perfect_rhyme : [words...]} 
    and {assonant_rhyme : [words...]}, but we only need the first one.
    """
    with open(path,"rb") as fd:
        word2rhymes, _, _ = pickle.load(fd)
    return word2rhymes

In [3]:
word2rhymes = get_rhyming_dictionary()

In [4]:
print('Number of words in dictionary:', len(word2rhymes))
print('Sample entry for \'campaign\':', word2rhymes['campaign']) # returns [perfect rhyme, assonant rhyme]
print('Sample entry for \'do\'', word2rhymes['do']) # same, for a short word
print('Sample entry for \'wouldn\'t\' :', word2rhymes['wouldn\'t']) # same, for a contraction

Number of words in dictionary: 123631
Sample entry for 'campaign': ['eyn', 'ey']
Sample entry for 'do' ['uw', 'uw']
Sample entry for 'wouldn't' : ['ahnt', 'ah']


In [5]:
def verse2rhyme(verse, rhyme_dict):
    """
    Returns an array with two elements: (1) the perfect rhyme and (2) the assonant 
    rhyme of the verse (its last word, except punctuations and contractions).
    If the word is not found in the dictionary, it looks for the most similar one.
    """
    punctuation = ['.', ',', ',', ':', ';', '!', '?', ' ', '-', '...', '_']
    verse = verse.replace('’', '\'')
    v = word_tokenize(verse)
    v = [w for w in v if w not in punctuation] # remove all punctuations
    if len(v) == 0:
        return []
    if v[-1] == 'n\'t':             # tokenizer's output on contraction: don't -> do, n't
        final_word = v[-2] + v[-1]  # restore full form (v[-2] is necessarily present)
    elif v[-1] == '\'d':            # for contraction of past participle (Shakespeare!)
        final_word = v[-2] + 'ed'
    else:
        final_word = v[-1]
    final_word = final_word.lower()
    if final_word not in rhyme_dict: # find a similar word that *is* in the dictionary
        similar_words = difflib.get_close_matches(final_word, rhyme_dict.keys(), n=1) # time consuming
        # a possibility is to use final_word[7:] so as to consider only the 7 last letters for matching
        # print(final_word, '->', similar_words)
        if similar_words == []:
            return ['', ''] # if it couldn't find anything
        else:
            return rhyme_dict[similar_words[0]]
    return rhyme_dict[final_word]

In [6]:
# This is the main function to test if two verses (lines) rhyme:

def test_rhyme(verse1, verse2, rhyme_dict):
    """
    Returns '2' if verses have "perfect rhyme", '1' for "assonant rhyme", and '0' otherwise.
    See https://rhymenow.com/types-of-rhymes for definitions of these cases.
    """
    rh1 = verse2rhyme(verse1, rhyme_dict)
    rh2 = verse2rhyme(verse2, rhyme_dict)
    if rh1[0] == rh2[0]:
        return 2
    elif rh1[1] == rh2[1]:
        return 1
    else:
        return 0

In [7]:
# Test the test_rhyme() function on a poem by Emily Dickinson
poem_ED = """To make a prairie it takes a clover and one bee,
One clover, and a bee.
And revery.
The revery alone will do,
If bees are few."""

In [8]:
for line in poem_ED.split('\n'):
    print(verse2rhyme(line, word2rhymes))

['iy', 'iy']
['iy', 'iy']
['iy', 'iy']
['uw', 'uw']
['uw', 'uw']


In [9]:
# Find out which pairs of verses do rhyme, and which don't.
#
# The code is not optimal, because we get the rhymes n*n times instead of n 
# and we also call test_rhyme n*n times instead of n*(n-1)/2.
for line1 in poem_ED.split('\n'):
    for line2 in poem_ED.split('\n'):
        print(test_rhyme(line1, line2, word2rhymes), end=' ')
    print()

2 2 2 0 0 
2 2 2 0 0 
2 2 2 0 0 
0 0 0 2 2 
0 0 0 2 2 


In [10]:
# A faster method to find out which verses of a poem rhyme.
# This does not use test_rhyme but gets rhymes once for each verse.

def print_all_rhymes(poem, rhyme_dict):
    lines = poem.split('\n')
    rhymes = [verse2rhyme(line, rhyme_dict) for line in lines]
    print('Rhymes with previous verses\n', ' ' * 8, end='')
    for i in range(1, len(lines)):
        print(i % 10, end=' ')
    print('')
    for i in range(len(lines)):
        print('Verse', (i+1) % 10, end=': ')
        for j in range(len(lines)-1):
            if j >= i:
                print('.', end=' ')
            elif rhymes[i][0] == rhymes[j][0]:
                print('2', end=' ')
            elif rhymes[i][1] == rhymes[j][1]:
                print('1', end=' ')
            else:
                print('0', end=' ')
        print('')
    return rhymes

In [11]:
print_all_rhymes(poem_ED, word2rhymes) # The results fit our perception of rhymes.

Rhymes with previous verses
         1 2 3 4 
Verse 1: . . . . 
Verse 2: 2 . . . 
Verse 3: 2 2 . . 
Verse 4: 0 0 0 . 
Verse 5: 0 0 0 2 


[['iy', 'iy'], ['iy', 'iy'], ['iy', 'iy'], ['uw', 'uw'], ['uw', 'uw']]

In [12]:
# Test functions on an entire sonnet by Shakespeare.

poem_WS = """Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm’d;
And every fair from fair sometime declines,
By chance or nature’s changing course untrimm’d;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander’st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee."""

In [13]:
print_all_rhymes(poem_WS, word2rhymes) # The results fit our perception of rhymes.

Rhymes with previous verses
         1 2 3 4 5 6 7 8 9 0 1 2 3 
Verse 1: . . . . . . . . . . . . . 
Verse 2: 0 . . . . . . . . . . . . 
Verse 3: 2 0 . . . . . . . . . . . 
Verse 4: 1 0 1 . . . . . . . . . . 
Verse 5: 0 0 0 0 . . . . . . . . . 
Verse 6: 0 0 0 0 0 . . . . . . . . 
Verse 7: 0 0 0 0 2 0 . . . . . . . 
Verse 8: 0 0 0 0 1 0 1 . . . . . . 
Verse 9: 1 0 1 1 0 0 0 0 . . . . . 
Verse 0: 0 1 0 0 0 0 0 0 0 . . . . 
Verse 1: 1 0 1 1 0 0 0 0 2 0 . . . 
Verse 2: 0 0 0 0 0 0 0 0 0 0 0 . . 
Verse 3: 0 0 0 0 0 0 0 0 0 0 0 0 . 
Verse 4: 0 0 0 0 0 0 0 0 0 0 0 0 2 


[['ey', 'ey'],
 ['aht', 'ah'],
 ['ey', 'ey'],
 ['eyt', 'ey'],
 ['aynz', 'ay'],
 ['ihmd', 'ih'],
 ['aynz', 'ay'],
 ['ayd', 'ay'],
 ['eyd', 'ey'],
 ['ahst', 'ah'],
 ['eyd', 'ey'],
 ['erz', 'er'],
 ['iy', 'iy'],
 ['iy', 'iy']]

In [14]:
# A third test on a poem by W. E. Henley

poem_WEH = """Out of the night that covers me,
Black as the pit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.
In the fell clutch of circumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.
Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds and shall find me unafraid.
It matters not how strait the gate,
How charged with punishments the scroll,
I am the master of my fate :
I am the captain of my soul."""

In [15]:
print_all_rhymes(poem_WEH, word2rhymes)

Rhymes with previous verses
         1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
Verse 1: . . . . . . . . . . . . . . . 
Verse 2: 0 . . . . . . . . . . . . . . 
Verse 3: 2 0 . . . . . . . . . . . . . 
Verse 4: 0 2 0 . . . . . . . . . . . . 
Verse 5: 0 0 0 0 . . . . . . . . . . . 
Verse 6: 0 0 0 0 0 . . . . . . . . . . 
Verse 7: 0 0 0 0 2 0 . . . . . . . . . 
Verse 8: 0 1 0 1 0 0 0 . . . . . . . . 
Verse 9: 0 0 0 0 0 0 0 0 . . . . . . . 
Verse 0: 0 0 0 0 0 0 0 0 0 . . . . . . 
Verse 1: 0 0 0 0 0 0 0 0 0 0 . . . . . 
Verse 2: 0 0 0 0 0 0 0 0 0 2 0 . . . . 
Verse 3: 0 0 0 0 0 0 0 0 0 1 0 1 . . . 
Verse 4: 0 2 0 2 0 0 0 1 0 0 0 0 0 . . 
Verse 5: 0 0 0 0 0 0 0 0 0 1 0 1 2 0 . 
Verse 6: 0 2 0 2 0 0 0 1 0 0 0 0 0 2 0 


[['iy', 'iy'],
 ['owl', 'ow'],
 ['iy', 'iy'],
 ['owl', 'ow'],
 ['aens', 'ae'],
 ['awd', 'aw'],
 ['aens', 'ae'],
 ['owd', 'ow'],
 ['ehrz', 'eh'],
 ['eyd', 'ey'],
 ['ihrz', 'ih'],
 ['eyd', 'ey'],
 ['eyt', 'ey'],
 ['owl', 'ow'],
 ['eyt', 'ey'],
 ['owl', 'ow']]