# Unnecessarily Loose Reverse Syllabic Word Pairs
Depending on the company, here's a fun party game: find words from whom the interchange of first and last syllables generates a new word. For example, reversing the first and last syllables of *treaty* gives you *tea tree*. At first, you can likely think of a few fun ones, but it quickly becomes difficult to venture beyond two-syllable words and compound nouns. Python to the rescue! I've processed the 134,000-word Carnegie Mellon phonetic dictionary for any words whose beginning and end can be swapped to create a new word. In this set, there are XXX such words (%)!

## Loading and prepping the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
data_f = pd.read_csv("phonetic_dict.txt", sep="  ", header=None, engine = "python")
data_f.columns = ["word", "pronunciation"]

In [3]:
data_f.head(10)

Unnamed: 0,word,pronunciation
0,AABERG,AA1 B ER0 G
1,AACHEN,AA1 K AH0 N
2,AACHENER,AA1 K AH0 N ER0
3,AAH,AA1
4,AAKER,AA1 K ER0
5,AALIYAH,AA2 L IY1 AA2
6,AALSETH,AA1 L S EH0 TH
7,AAMODT,AA1 M AH0 T
8,AANCOR,AA1 N K AO2 R
9,AARDEMA,AA0 R D EH1 M AH0


We'll need to translocate the last syllable of each word to find matches.

In [4]:
def last_syllable_to_front(word):
    """Moves last consonant-vowel cluster from list of phonemes to the front of the list
    """
    first_vowel_cluster = True
    last_index = len(word) - 1
    
    while last_index > 0:
        consonant = word[last_index].isalpha()
        if consonant:
            if not first_vowel_cluster: #only allow one consonant phoneme before final vowel cluster
                break
            else:
                while consonant and last_index > 0:
                    last_index -= 1
                    consonant = word[last_index].isalpha()
        else: #vowel
            if first_vowel_cluster:
                last_index -= 1
                first_vowel_cluster = False
            else:
                last_index += 1
                break
        
    x = word[last_index:]
    x.extend(word[:last_index])
    return " ".join(x)
    
data_b = data_f.copy()
data_b.pronunciation = data_b.pronunciation.str.split().apply(last_syllable_to_front)
data_b.head(10)

Unnamed: 0,word,pronunciation
0,AABERG,B ER0 G AA1
1,AACHEN,K AH0 N AA1
2,AACHENER,N ER0 AA1 K AH0
3,AAH,AA1
4,AAKER,K ER0 AA1
5,AALIYAH,AA2 AA2 L IY1
6,AALSETH,S EH0 TH AA1 L
7,AAMODT,M AH0 T AA1
8,AANCOR,K AO2 R AA1 N
9,AARDEMA,M AH0 AA0 R D EH1


## Finding the matches
Now we have two dataframes, `data_f` with normal syllabic order, and `data_b` with the last syllable moved to the front. Now, let's see which syllable orderings are in both dataframes:

In [5]:
in_both = data_f.pronunciation.isin(data_b.pronunciation)
both = data_f[in_both]
both

Unnamed: 0,word,pronunciation
3,AAH,AA1
12,AARGH,AA1 R G
22,AASE,AA1 S
24,AB,AE1 B
90,ABBS,AE1 B Z
...,...,...
134242,ZURN,Z ER1 N
134244,ZURVE,Z ER1 V
134268,ZYCH,Z AY1 CH
134278,ZYSK,Z IH1 S K


Because many of the words in the set are only one syllable, of course they appear in both dataframes!

Let's screen for only those that have more than one syllable by reusing the last_syllable_to_front code:

In [6]:
def more_than_one_syllable(word):
    """Returns whether (T/F) the input word (list of phonemes) is more than one syllable
    """
    first_vowel_cluster = True
    last_index = len(word) - 1
    
    while last_index > 0:
        consonant = word[last_index].isalpha()
        if consonant:
            if not first_vowel_cluster: #only allow one consonant phoneme before final vowel cluster
                break
            else:
                while consonant and last_index > 0:
                    last_index -= 1
                    consonant = word[last_index].isalpha()
        else: #vowel
            if first_vowel_cluster:
                last_index -= 1
                first_vowel_cluster = False
            else:
                last_index += 1
                break
        
    x = word[last_index:]
    x.extend(word[:last_index])
    return word != x #if the modified version is the same as the original, then it's only one syllable

both_filtered = both[both.pronunciation.str.split().apply(more_than_one_syllable)]
both_filtered.columns = ["word_b", "pronunciation"]
both_filtered

Unnamed: 0,word_b,pronunciation
412,ABT(1),EY1 B IY1 T IY1
447,AC,EY1 S IY1
686,ACHEE,AH0 CH IY1
1037,ADEE,AH0 D IY1
1055,ADELPHI,AH0 D EH1 L F IY0
...,...,...
130098,WHEDON,W EH1 D AH0 N
130155,WHELAN,W EH1 L AH0 N
131745,WIZEN,W AY1 Z AH0 N
133753,ZELMA,Z EH1 L M AH0


And now let's find the corresponding pairs, remove duplicates (since we should expect some pairs to appear twice), and sort by word length:

In [7]:
pairs = both_filtered.merge(data_b, on="pronunciation", how="left")
pairs.head(5)

Unnamed: 0,word_b,pronunciation,word
0,ABT(1),EY1 B IY1 T IY1,BTA
1,AC,EY1 S IY1,CA(1)
2,ACHEE,AH0 CH IY1,CHIA
3,ADEE,AH0 D IY1,DHIA
4,ADEE,AH0 D IY1,DIA


In [8]:
pairs.word_b = pairs.word_b.str.replace("\(1\)", "")
pairs.word = pairs.word.str.replace("\(1\)", "")
pairs = pairs.drop(columns = "pronunciation")
pairs = pd.DataFrame(np.sort(pairs.values, axis=1), columns=pairs.columns).drop_duplicates()
sorted(list(pairs.word_b + ", " + pairs.word), reverse= True, key=len)

['BORNSTEIN, STEINBORN',
 'CLAREMONT, MONTCLAIR',
 'CLERMONT, MONTCLAIR',
 'TOUCHTON, UNTOUCHED',
 'BEHOLDER, HOLDERBY',
 'LIQUIDE, WAIDELICH',
 'FORESTRY, REFOREST',
 'BERNAY, NEIGHBOUR',
 'BOUNDARY, REBOUND',
 'BURCHAM, SCHAMBER',
 "CHIEFFO, O'KEEFFE",
 'COSTLOW, LOW-COST',
 'FAREWELL, WELFARE',
 'LIQUIDE, WEIDLICH',
 'ELLEMANN, MANELLA',
 "MONROE'S, ROSEMAN",
 'EVERMORE, MOREVER',
 'PERREAULT, ROEPER',
 'ANDRIES, REASONED',
 'REQUEST, WESTRICH',
 'REQUEST, WESTRICK',
 'ADDRESSED, RESTED',
 'ADELPHI, DELPHIA',
 'BARABAR, BARBERA',
 'BERNAY, NEIGHBOR',
 'BONDAR, DARBONNE',
 'CELLPRO, PROCELL',
 "CHIEFFO, O'KEEFE",
 'CHIEFFO, OKEEFFE',
 'ALLCORN, CORNALL',
 'DHAHARAN, RHONDA',
 'ALLPHIN, FINNELL',
 'JERMAINE, MANGER',
 'KEMPTON, UNKEMPT',
 'LHEUREUX, OEHLER',
 'EHRLICH, LIQUEUR',
 'MELANIE, NIEMELA',
 'ALLNUTT, NUTTALL',
 'PERREAULT, ROPER',
 'ALDRIC, RECALLED',
 'OTTOSEN, SAINATO',
 'ASTAIRE, TERRACE',
 'BOUSKA, CABOOSE',
 'BOWDEN, UNBOWED',
 'AMBURN, BURNHAM',
 'CHIEFFO, OKEEFE',
 'C