# A rhyme helper

Reference book: http://www.nltk.org/book/

Tool: CMU pronouncing dictionary 
Goal: to find words that form perfect rhymes with a given word
* The stressed vowel sound in both words must be identical, as well as any subsequent sounds. For example, "sky" and "high"; "skylight" and "highlight".
* The articulation that precedes the vowel in the words must differ. For example, the pair "bean" and "green" is a perfect rhyme, as is "own" and "bone", while "leave" and "believe" is not. (Note that this condition excludes cases like "grief" and "brief", which do not differ in the preceding articulation -- the "r", but which can considered a perfect rhyme nonetheless. We conveniently ignore them here!)


Structure: 
           have a function that takes a word (as a string) and the CMU entries as its two arguments, and returns a list or a set of all words that form a perfect rhyme pair with it. - format as `get_perfect_rhymes(word, pronunciation_entries)`, or any other name you see fit.

In [1]:
#containing code to load the CMU pronouncing dictionary
import nltk
nltk.data.path.append('/var/jupyterhubdata/courses/LIN340H5/data/nltk_data')
pronunciation_entries = nltk.corpus.cmudict.entries()     

#pronunciation_entries as list of tuples (word, pron)
# Len: 133737

In [2]:
def stress(pron):
    '''
    Referred from book chapter 2: 
    The phones contain digits to represent primary stress (1), secondary stress (2) and no stress (0). 
    We define a function to extract the stress digits and then scan our lexicon to find words having a particular stress pattern.
    '''
    return [char for phone in pron for char in phone if char.isdigit()]

In [12]:
def get_perfect_rhymes(input_words,pronunciation):
    # syllable: get prounciation for input_words
    syllable = [pron for word, pron in pronunciation_entries if word == input_words][0]
    # vowel: all vowel should prounciate starting with letters in the list
    vowel = ['A','E','I','O','U']
    
    # last_vowel_index: obtain all vowel prounciation from syllable
    last_vowel_index = [syllable.index(pronc) for pronc in syllable for read in vowel if read == pronc[0]]
    # stress_vowel: extract the stress digits for input_words prounciation 
    stress_vowel = stress([pron for word, pron in pronunciation_entries if word == input_words][0])
    # index：obtain index of stressed vowel
    index = last_vowel_index[stress_vowel.index('1')]-len(syllable)
    
    # rhyme: obtain words in the list where prounication matches, not the same original input_words; 
    rhyme = [word for word, pron in pronunciation_entries if pron[index:] == syllable[index:] and input_words not in word]
    
    return rhyme

In [3]:
'''
Consider previous syllable before stressed vowel? 
'''


# def get_perfect_rhymes(input_words,pronunciation):
#     syllable = [pron for word, pron in pronunciation_entries if word == input_words][0]
#     vowel = ['A','E','I','O','U']
    
#     last_vowel_index = [syllable.index(pronc) for pronc in syllable for read in vowel if read == pronc[0]]
#     stress_vowel = stress([pron for word, pron in pronunciation_entries if word == input_words][0])
#     index = last_vowel_index[stress_vowel.index('1')]-len(syllable)
#     previous_index = last_vowel_index[stress_vowel.index('1')-1]-len(syllable)
    
#     rhyme = [word for word, pron in pronunciation_entries if pron[index:] == syllable[index:] and input_words not in word and all(elem in pron[index:previous_index] for elem in syllable) == True]
    
#     return rhyme

# Test cases

In [13]:
test_case_starch = set(['arch', 'bartsch', 'demarche', 'demarche', 'karcz', 'larch', 'march', 'parch', 'partch'])
print(test_case_starch)
print(set(get_perfect_rhymes('starch', pronunciation_entries)))
assert set(get_perfect_rhymes('starch', pronunciation_entries)) == test_case_starch

{'arch', 'bartsch', 'karcz', 'parch', 'larch', 'demarche', 'partch', 'march'}
{'arch', 'bartsch', 'karcz', 'parch', 'larch', 'demarche', 'partch', 'march'}


In [16]:
test_case_yarn = set(['arn', 'arne', 'arnn', 'barn', 'carn', 'carne', 'darn', 'dezarn', 'garn', 'harn', 'harne', 'karn', 'mccarn', 'starn', 'varn'])
print(test_case_yarn)
print(set(get_perfect_rhymes('yarn', pronunciation_entries)))
assert set(get_perfect_rhymes('yarn', pronunciation_entries)) == test_case_yarn

{'arn', 'harne', 'garn', 'arnn', 'dezarn', 'carn', 'mccarn', 'varn', 'karn', 'barn', 'carne', 'starn', 'darn', 'arne', 'harn'}
{'arn', 'harne', 'garn', 'arnn', 'dezarn', 'carn', 'mccarn', 'varn', 'karn', 'barn', 'carne', 'starn', 'darn', 'arne', 'harn'}
