<h1>What does this program do?</h1>

Turn a selection of words into a puzzle.

<h2>What is the puzzle?</h2>

One aspect of every word changes, and the reader must either figure out what changed, or what the original text said.

<h2>What is changed about the words?</h2>

There are three possible changes for each word: spelling, pronounciation, and definition.

<h3>Definition</h3>

The program looks the word up in an online dictionary, and replaces it with a word offset from the original by some amount. For example, the 10th word after it, or the 6th word before it.

<h3>Spelling & Pronounciation</h3>

Since this puzzle is entirely text-based, changes to pronounciation must actually affect its spelling. Therefore, changes to spelling and pronounciation both change spelling, but in different ways.

A word consists of three aspects: appearance, sound, and meaning. Its meaning is abstract, applied to a word by historical use and present societal use and understanding. As for appearance and sound, a graphical representation of an English word consists of letters, which then represent sounds. Individual sounds are called __phonemes__, and the letters or groups of letters that represent phonemes are called __graphemes__.

Phonemes in English have a plethora of irregular spellings, some only appearing in a single word (and its derivatives). That is what my program will play on. When changing the _spelling_ of a word, it will pick a random phoneme and its associated grapheme in the word, and change the grapheme to another valid representation of that same phoneme, for example changing 'thaw' to 'tho', where the 'aw' sound is spelled like the 'o' in 'bog'. However, when changing a _pronounciation_, it selects a random phoneme in the word and replaces it with another phoneme, and pick a valid grapheme.

<h4>How does the program know what phonemes are?</h4>
It will look up the word on Wiktionary, which has the IPA representation of the word. Each IPA symbol is a phoneme, and each phoneme is exactly one symbol.

<h4>How will Python recognize IPA symbols?</h4>
Hopefully I can use Unicode. Otherwise, I don't know

<h4>What about English vowels? How will you represent their complexity?</h4>


<h1>IDEA:</h1>
instead of directly changing the phoneme/grapheme, the program picks an English dialect to start with, and another to change it to. It then changes the spelling of the word to match how someone speaking the first dialect would spell the pronounciation of someone in the second dialect.

10/15

Step 1: write code that can understand phonemes and break words down into them

Step 2: use some base text to derive all possible spellings of each phoneme

In [1]:
with open('cmuDict.txt', 'r', encoding='utf-8') as f:
    rawDict = f.read()

cmu = rawDict.split('\n')[56:]
print(cmu[0])

!EXCLAMATION-POINT  EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T


Problem: though I now have a phonetic spelling of most English words, it cannot tell me which grapheme each sound belongs to. Some graphemes consist of multiple letters, and others represent multiple distinct phonemes

Solution?: first, assume that each letter is its own grapheme. if correct, then move on. otherwise, for the graphemes that do not cleanly match, try sticking letters together until it matches.

In [2]:
testWord = cmu[0].strip('! ')
word, pron = testWord.split(maxsplit=1)
word = word.strip()
pron = pron.strip()
print(word, '\n', pron)
phonemes_old = pron.split()


EXCLAMATION-POINT 
 EH2 K S K L AH0 M EY1 SH AH0 N P OY2 N T


10/17

in the ARPAbet, some symbols represent vowel sounds, and others consonants. there is always at least one vowel per word. so, words can be broken up into chunks with exactly one vowel sound each.

In [3]:
useful_words = {}
for w in cmu:
    try:
        t, p = w.split(maxsplit=1)
        useful_words[t] = p
    except (IndexError, ValueError) as e:
        print(w)
        print(e)

In [4]:
useful_words = {w.split(maxsplit=1)[0]: w.split(maxsplit=1)[1] for w in cmu}

In [5]:
useful_words['LIAISON']

'L IY0 EY1 Z AA2 N'

In [6]:
testWord2 = cmu[100].strip('! ')
word, pron = testWord2.split(maxsplit=1)
word = word.strip()
pron = pron.strip()
print(word, '\n', pron)
phonemes = pron.split()

AARONSON(1) 
 AA1 R AH0 N S AH0 N


In [7]:
print(type(phonemes[0]))

<class 'str'>


In [8]:
vowels = [    # General American English specifically
    'AA',     # balm, bot
    'AE',     # bat
    'AH',     # butt
    'AO',     # stOry      this gave me a heart attack when Wikipedia gave another example as 'cAUGHt' which is not at all the same sound to me
    'AW',     # bout
    'AX',     # commA (schwa)
    'AY',     # bite
    'EH',     # bet
    'ER',     # bIRd, forewORd
    'EY',     # bait
    'IH',     # bit
    'IX',     # rosEs, rabbIt
    'IY',     # beat
    'OW',     # boat
    'OY',     # boy
    'UH',     # book
    'UW'      # boot
]
# source: https://en.wikipedia.org/wiki/ARPABET
consonants = [
    'B',      # buy
    'CH',     # China
    'D',      # die
    'DH',     # thy
    'DX',     # buTTer
    'EL',     # bottLE
    'EM',     # rhythM
    'EN',     # buttON
    'F',      # fight
    'G',      # guy
    'HH',     # High
    'JH',     # jive
    'K',      # kite
    'L',      # lie
    'M',      # my
    'N',      # nigh
    'NG',     # siNG
    'P',      # pie
    'Q',      # uh-oh (glottal stop)
    'R',      # rye
    'S',      # sigh
    'SH',     # shy
    'T',      # tie
    'TH',     # thigh
    'V',      # vie
    'W',      # wise
    'WH',     # why (for fancy people)
    'Y',      # yacht
    'Z',      # zoo
    'ZH'      # pleaSure
]

In [65]:
class coolWord:
    def __init__(self, spelling, pronounciation):
        symbols = "~!@#$%^&*()-_=+[]{}\\|;:\'\",<.>/?1234567890"
        self.word = spelling.strip(symbols).lower()
        ps = pronounciation.split()
        ps2 = [p.strip(symbols) for p in ps]
        self.p = ' '.join(ps2)        
        self.x = False
        self.xx = False
        self.xIndex = 0
        if 'x' in self.word:
            self.x = True
            if 'xx' in self.word:
                self.xx = True
                self.s = self.word.replace('xx', 'ks')
            else:
                self.xIndex = self.word.index('x')
                if self.xIndex == 0:
                    self.s = self.word.replace('x', 'z')
                elif self.xIndex == len(self.word) - 1 and ps2[-1] == 'OW':
                    self.s = self.word.replace('x', '')
                elif self.word[self.xIndex - 1] in 'aeiou' and self.xIndex != len(self.word) - 1 and self.word[self.xIndex + 1] in 'aeiou':
                    self.s = self.word.replace('x', 'gz')
                else:
                    self.s = self.word.replace('x', 'ks')
        else:
            self.s = self.word
    def __str__(self):
        return self.word
    def __repr__(self):
        return self.word
    def atomize(self, v, c):
        current_letters = []
        current_sounds = []
        final = {}
        vows = []
        cons = []
        letter_counter = 0
        while letter_counter < len(self.s):
            current_letter = letter_counter
            while letter_counter < len(self.s) and self.s[letter_counter] in "qwrtypsdfghjklzxcvbnm":
                current_letters.append(self.s[letter_counter])
                letter_counter += 1
            final[current_letter] = ''.join(current_letters)
            current_letters = []
            if letter_counter >= len(self.s):
                break
            current_letter = letter_counter
            while letter_counter < len(self.s) and self.s[letter_counter] in "aeiou":
                current_letters.append(self.s[letter_counter])
                letter_counter += 1
            final[current_letter] = ''.join(current_letters)
            current_letters = []
        self.c = cons
        self.v = vows
        self.f = final
#        for i in self.p:
#            if i in v:
#                pass
#            elif i in c:
#                current_sounds.append(i)
#                self.c[letter_counter].append(current_letters)
#                letter_counter += 1
#            else:
#                print("weirdo ->", i)

In [12]:
wordObjs = []
for line in cmu:
    wordObjs.append(coolWord(*line.split(maxsplit=1)))

In [13]:
print(wordObjs[0])

exclamation-point


In [14]:
wordObjs2 = {line.split(maxsplit=1)[0]: coolWord(*line.split(maxsplit=1)) for line in cmu}

In [16]:
def getPhoneStructure(word):
    phones = []
    for phone in word.p.split():
        if phone in vowels:
            phones.append('V')
        else:
            phones.append('C')
    return ''.join(phones)

In [26]:
spellings = {
    'AA': ['o'],
    'AE': ['a'],
    'AH': ['u'],
    'AO': ['o'],
    'AW': ['ou'],
    'AX': ['a'],
    'AXR': ['er'],
    'AY': ['i'],
    'EH': ['e'],
    'ER': ['ir'],
    'EY': ['ai'],
    'IH': ['i'],
    'IX': ['e'],
    'IY': ['ea'],
    'OW': ['oa'],
    'OY': ['oy'],
    'UH': ['oo'],
    'UW': ['oo'],
    'UX': ['u'],
    'B': ['b'],
    'CH': ['ch'],
    'D': ['d'],
    'DX': ['tt'],
    'EL': ['le'],
    'EM': ['m'],
    'EN': ['on'],
    'F': ['f'],
    'G': ['g'],
    'H': ['h'],
    'HH': ['h'],
    'JH': ['j'],
    'K': ['k'],
    'L': ['l'],
    'M': ['m'],
    'N': ['n'],
    'NX': ['ng'],
    'NG': ['ng'],
    'P': ['p'],
    'Q': ['-'],
    'R': ['r'],
    'S': ['s'],
    'SH': ['sh'],
    'T': ['t'],
    'TH': ['th'],
    'V': ['v'],
    'W': ['w'],
    'WH': ['wh'],
    'Y': ['y'],
    'Z': ['z'],
    'ZH': ['s']
}

In [30]:
wordObjs2 = {word: wordObjs2[word] for word in wordObjs2 if len(wordObjs2[word].p.split()) < 2 * len(word)}

In [31]:
threeLetter = [wordObjs2[word] for word in wordObjs2 if len(wordObjs2[word].s) == 3]

In [33]:
twoLetter = [wordObjs2[word] for word in wordObjs2 if len(wordObjs2[word].s) == 2]

In [None]:
def permute(word, graphs):
    for i in range(len(word)):
        

In [66]:
baseSpellings = {
    'AA': ['o'],
    'AE': ['a'],
    'AH': ['u', 'o'],
    'AO': ['o'],
    'AW': ['ou'],
    'AX': ['a'],
    'AXR': ['er'],
    'AY': ['i'],
    'EH': ['e'],
    'ER': ['ir'],
    'EY': ['ai'],
    'IH': ['i'],
    'IX': ['e'],
    'IY': ['ea'],
    'OW': ['oa'],
    'OY': ['oy'],
    'UH': ['oo'],
    'UW': ['oo'],
    'UX': ['u'],
    'B': ['b'],
    'CH': ['ch'],
    'D': ['d'],
    'DX': ['tt'],
    'EL': ['le'],
    'EM': ['m'],
    'EN': ['on'],
    'F': ['f'],
    'G': ['g'],
    'H': ['h'],
    'HH': ['h'],
    'JH': ['j'],
    'K': ['k'],
    'L': ['l'],
    'M': ['m'],
    'N': ['n'],
    'NX': ['ng'],
    'NG': ['ng'],
    'P': ['p'],
    'Q': ['-'],
    'R': ['r'],
    'S': ['s'],
    'SH': ['sh'],
    'T': ['t'],
    'TH': ['th'],
    'V': ['v'],
    'W': ['w'],
    'WH': ['wh'],
    'Y': ['y'],
    'Z': ['z'],
    'ZH': ['s']
}

In [42]:
spellings = {
    'AA': ['o'],
    'AE': ['a'],
    'AH': ['u'],
    'AO': ['o'],
    'AW': ['ou'],
    'AX': ['a'],
    'AXR': ['er'],
    'AY': ['i'],
    'EH': ['e'],
    'ER': ['ir'],
    'EY': ['ai'],
    'IH': ['i'],
    'IX': ['e'],
    'IY': ['ea'],
    'OW': ['oa'],
    'OY': ['oy'],
    'UH': ['oo'],
    'UW': ['oo'],
    'UX': ['u'],
    'B': ['b'],
    'CH': ['ch'],
    'D': ['d'],
    'DX': ['tt'],
    'EL': ['le'],
    'EM': ['m'],
    'EN': ['on'],
    'F': ['f'],
    'G': ['g'],
    'H': ['h'],
    'HH': ['h'],
    'JH': ['j'],
    'K': ['k'],
    'L': ['l'],
    'M': ['m'],
    'N': ['n'],
    'NX': ['ng'],
    'NG': ['ng'],
    'P': ['p'],
    'Q': ['-'],
    'R': ['r'],
    'S': ['s'],
    'SH': ['sh'],
    'T': ['t'],
    'TH': ['th'],
    'V': ['v'],
    'W': ['w'],
    'WH': ['wh'],
    'Y': ['y'],
    'Z': ['z'],
    'ZH': ['s']
}

In [34]:
len(twoLetter)

233

In [35]:
for word in twoLetter:
    news = spell(word)
    print(f"{word.s.lower()}: {word.p}: {news}")
    if word.s.lower() != news:
        break

em: AH M: ['u', 'm']


In [36]:
print(spell(wordObjs2['BY']))

['b', 'i']


In [37]:
def spell2(word):
    phones = word.p.split()
    spelling = []
    
    while ''.join(spelling) != word.s:
        spelling = []
        for p in phones:
            pass
            #if ''.join(spelling + spellings[p][0]

In [38]:
print(len(''.join([])))

0


In [40]:
from time import sleep
from copy import copy

def sideSpell(word, phones, spellings, polarity):
    currentSpell = []
    for p in phones:
        oldLen = len(currentSpell)
        for s in spellings[p]:
            newSpell = copy(currentSpell)
            if not polarity:
                newSpell.append(s)
                if word.s.startswith(''.join(newSpell)):
                    currentSpell.append(s)
                    break
            else:
                newSpell.insert(0, s)
                if word.s.endswith(''.join(newSpell)):
                    currentSpell.insert(0, s)
                    break                
        if len(currentSpell) == oldLen:
            break
    return currentSpell

def spell3(word, spellings):
    word.s = word.s.lower()
    if not word.s.isalnum():
        print("bad characters", word.s)
        return False
    print(word)
    phones = word.p.split()
    if len(word.s) < len(phones):
        print("too long!", word.s, phones)
        return False
    if word.s[:-2] == 'le' and phones[:-2] == ['AH', 'L'] and word.s[-3] not in ['aeiou']:
        phones.append(phones.pop[-2])
        print("special case: L")
    rphones = copy(phones)
    rphones.reverse()
    print("phones", phones, rphones)
    frontSpell = sideSpell(word, phones, spellings, 0)
    backSpell = sideSpell(word, rphones, spellings, 1)
    print("front & back", frontSpell, backSpell)
    if '' in frontSpell:
        frontSpell.remove('')
    if '' in backSpell:
        backSpell.remove('')
    if frontSpell == backSpell and ''.join(frontSpell) == word.s:
        print('yay!')
        return True
    missingG = word.s.removeprefix(''.join(frontSpell)).removesuffix(''.join(backSpell))
    if not missingG:
        missingG = backSpell[0]
    print("missing grapheme", missingG)
    if len(frontSpell + backSpell) > len(phones):
            #print(word, frontSpell, len(frontSpell))
            #print(backSpell, len(backSpell))
            #print(phones)
            print("too long")
            del backSpell[0]
    if len(frontSpell + backSpell) < len(phones):
        try:
            missingP = phones[len(frontSpell)]
            print("missing phoneme (clean)", missingP)
            spellings[missingP].append(missingG)
        except IndexError as e:
            print(word, frontSpell, len(frontSpell))
            print(backSpell, len(backSpell))
            print(phones)
            raise e
    else:
        missingP = phones[len(frontSpell) - 1]
        print("missing phoneme (overlap)", missingP)
        try:
            spellings[missingP].append(''.join((frontSpell[-1], missingG) if frontSpell else (missingG, backSpell[0])))
            print(''.join((frontSpell[-1], missingG) if frontSpell else (missingG, backSpell[0])))
        except IndexError as e:
            print(word, frontSpell, len(frontSpell))
            print(backSpell, len(backSpell))
            print(phones)
            print(missingP)
            raise e

def learn(spellings, words):
    for word in words:
        spell3(word, spellings)
        print(word, word.p)
        break
    return spellings

In [41]:
def clean(aDict):
    for key in aDict:
        while '' in aDict[key]:
            aDict[key].remove('')

In [43]:
from random import choice

spell3(choice(twoLetter), spellings)
clean(spellings)

um
phones ['AH', 'M'] ['M', 'AH']
front & back ['u', 'm'] ['u', 'm']
yay!


In [44]:
spellings

{'AA': ['o'],
 'AE': ['a'],
 'AH': ['u'],
 'AO': ['o'],
 'AW': ['ou'],
 'AX': ['a'],
 'AXR': ['er'],
 'AY': ['i'],
 'EH': ['e'],
 'ER': ['ir'],
 'EY': ['ai'],
 'IH': ['i'],
 'IX': ['e'],
 'IY': ['ea'],
 'OW': ['oa'],
 'OY': ['oy'],
 'UH': ['oo'],
 'UW': ['oo'],
 'UX': ['u'],
 'B': ['b'],
 'CH': ['ch'],
 'D': ['d'],
 'DX': ['tt'],
 'EL': ['le'],
 'EM': ['m'],
 'EN': ['on'],
 'F': ['f'],
 'G': ['g'],
 'H': ['h'],
 'HH': ['h'],
 'JH': ['j'],
 'K': ['k'],
 'L': ['l'],
 'M': ['m'],
 'N': ['n'],
 'NX': ['ng'],
 'NG': ['ng'],
 'P': ['p'],
 'Q': ['-'],
 'R': ['r'],
 'S': ['s'],
 'SH': ['sh'],
 'T': ['t'],
 'TH': ['th'],
 'V': ['v'],
 'W': ['w'],
 'WH': ['wh'],
 'Y': ['y'],
 'Z': ['z'],
 'ZH': ['s']}

In [45]:
learn(spellings, twoLetter)

em
phones ['AH', 'M'] ['M', 'AH']
front & back [] ['m']
missing grapheme e
missing phoneme (clean) AH
em AH M


{'AA': ['o'],
 'AE': ['a'],
 'AH': ['u', 'e'],
 'AO': ['o'],
 'AW': ['ou'],
 'AX': ['a'],
 'AXR': ['er'],
 'AY': ['i'],
 'EH': ['e'],
 'ER': ['ir'],
 'EY': ['ai'],
 'IH': ['i'],
 'IX': ['e'],
 'IY': ['ea'],
 'OW': ['oa'],
 'OY': ['oy'],
 'UH': ['oo'],
 'UW': ['oo'],
 'UX': ['u'],
 'B': ['b'],
 'CH': ['ch'],
 'D': ['d'],
 'DX': ['tt'],
 'EL': ['le'],
 'EM': ['m'],
 'EN': ['on'],
 'F': ['f'],
 'G': ['g'],
 'H': ['h'],
 'HH': ['h'],
 'JH': ['j'],
 'K': ['k'],
 'L': ['l'],
 'M': ['m'],
 'N': ['n'],
 'NX': ['ng'],
 'NG': ['ng'],
 'P': ['p'],
 'Q': ['-'],
 'R': ['r'],
 'S': ['s'],
 'SH': ['sh'],
 'T': ['t'],
 'TH': ['th'],
 'V': ['v'],
 'W': ['w'],
 'WH': ['wh'],
 'Y': ['y'],
 'Z': ['z'],
 'ZH': ['s']}

In [46]:
for i in twoLetter:
    print(i, i.p)

em AH M
aa EY EY
ab AE B
ac EY S IY
ad AE D
ae EY
ag AE G
ag EY G IY
ah AA
ai AY
ai EY AY
al AE L
al AE L
al AE L AH B AE M AH
am AE M
am EY EH M
an AE N
an AH N
ap EY P IY
ar AA R
as AE Z
as EH Z
at AE T
au OW
aux OW
av EY V IY
aw AO
ay EY
ay AY
ba B IY EY
ba B AA
be B IY
be B IY
bi B AY
bo B OW
by B AY
ca K AH
ca S IY EY
ca K AA
ce S IY IY
co K OW
co K OW
co K AH P AH N IY
co S IY OW T UW
cy S AY
da D AA
da D IY EY
de D IY
de D EY
de D AH
di D IY
di D AY
do D UW
dr D AA K T ER
dr D R AY V
dr D AA K T ER
du D UW
du D AH
eb EH B
ed EH D
ee IY IY
eh EH
ek EH K
ek IY K EY
el EH L
em EH M
en EH N
er ER
es EH S
et EH T
ev EH V
fe F EY
fi F AY
fi F IY
fu F UW
ga G AA
ga JH IY EY
ga JH AO R JH AH
go G OW
gu G UW
ha HH AA
he HH IY
hi HH AY
hm HH AH M
ho HH OW
hu HH UW
hy HH AY
ia IY AH
ib IH B
ib AY B IY
id IH D
id AY D IY
if IH F
il IH L
im IH M
in IH N
in IH N
in IH N
in IH N CH
io AY OW
ip AY P IY
ip IH P
is IH Z
is IH Z
it IH T
it IH T
ja Y AA
je JH IY
ji JH IY
jo JH OW
jr JH UW N ER
ju J

10/29

So. It does mostly work.

But.

The presence of abbreviations in the dataset means that it contains non-phonetic transcriptions. For example, one pronouncation of "AL" is the full phonetic spelling of Alabama.

(also "MR" sounds like "mister" according to the dictionary)

Currently, the code just refuses to analyze words with more phonemes than letters. This assumes that the only letter in English that can represent two distinct phones in succession is 'x', and that my attempt to account for x works. That seems to hold, though I haven't tested it on any words with more than three letters yet.

I did fail to account for another edge case though: the """word""" 'ng'. According to this version of the CMUPD, 'ng' can be pronounced 'IH NG'. This totally bypasses my attempt to check edge cases: the word is two letters long, so it doesn't have more phones than letters.

LIGHTBULB MOMENT: it has no vowel letter, which means (for our purposes) it isn't a valid pronouncable English word! So I just need to find a reasonably efficient way of checking if a vowel is in a word and build that check into my code!

I also plan to add a method of tracking examples of grapheme usage, which should help with explaining the solutions to its puzzles.

_oh right i'm building a puzzle generator_

NOTE: this fix does not account for """words""" like 'aa' -- pronounced /EY EY/ --, so my code learned that 'aa' is a valid spelling of /EY/. This actually does appear in fringe cases, like Baal, so maybe this is fine. Will add further comment if any other bad examples happen.

Actually, """words""" like 'er' pronounced /ER/ represent an intended example of the above result. 

_at least I think I intended it_

Well I did find another problem word, but nothing to do with the previous comment. 'of' posed a problem. My code does not start off knowing that 'o' can be pronounced /AH/, or that 'f' can be /v/. So it can't differentiate between both new spellings, and learns that /AH/ can be spelled 'of'. I'll try just adding that pronunciation of 'o' in at the beginning, but obviously this isn't ideal, nor is it consistent. Depending on what order the dictionary of words happens to be in, it'll learn spellings at different times, potentially causing this kind of mistake for different words each time. I am also unsure at this stage whether this is an unlikely occurrence, or if I'll need to build in more knowledge beforehand, which I wanted to minimize.

Also I realized I forgot to mention another issue from earlier. After running my code through every two-letter word and learning each new spelling, the lists of spellings for each phoneme were cluttered with empty strings. As a solution, I just added a function to remove those, but I need to figure out why that happens in the first place.

In [61]:
## from time import sleep
from copy import copy

def sideSpell(word, phones, spells, polarity):
    currentSpell = []
    for p in phones:
        oldLen = len(currentSpell)
        for s in spells[p]:
            newSpell = copy(currentSpell)
            if not polarity:
                newSpell.append(s.g)
                if word.s.startswith(''.join([a if not isinstance(a, graphExample) else a.g for a in newSpell])):
                    currentSpell.append(s)
                    break
            else:
                newSpell.insert(0, s.g)
                if word.s.endswith(''.join([a if not isinstance(a, graphExample) else a.g for a in newSpell])):
                    currentSpell.insert(0, s)
                    break                
        if len(currentSpell) == oldLen:
            break
    return currentSpell

def spell3(word, spellings, exLen=4, debug=False):
    word.s = word.s.lower()
    if not word.s.isalnum():
        if debug:
            print("bad characters", word.s)
        return False
    if debug:
        print(word)
    phones = word.p.split()
    if len(word.s) < len(phones):
        if debug:
            print("too long!", word.s, phones)
        return False
    if not set(word.s) & set('aeiouy'):
        if debug:
            print("no vowels -> not a word!", word.s, phones)
        return False
    if word.s[:-2] == 'le' and phones[:-2] == ['AH', 'L'] and word.s[-3] not in ['aeiou']:
        phones.append(phones.pop[-2])
        if debug:
            print("special case: L")
    rphones = copy(phones)
    rphones.reverse()
    if debug:
        print("phones", phones, rphones)
    frontSpell = sideSpell(word, phones, spellings, 0)
    frontStr = [a.g for a in frontSpell]
    backSpell = sideSpell(word, rphones, spellings, 1)
    backStr = [a.g for a in backSpell]
    if debug:
        print("front & back", frontSpell, backSpell)
    if '' in frontSpell:
        frontSpell.remove('')
    if '' in backSpell:
        backSpell.remove('')
    if frontSpell == backSpell and ''.join(frontStr) == word.s:
        if debug:
            print('yay!')
            print(word, frontSpell, len(frontSpell))
            print(backSpell, len(backSpell))
            print(phones)
        exPhone = choice(phones)
        g = frontSpell[phones.index(exPhone)]
        p = spellings[exPhone]
        exGraph = p[p.index(g)]
        if len(word.s) == exLen and not exGraph.set:
            exGraph.setExample(word)
        return True
    missingG = graphExample(word.s.removeprefix(''.join(frontStr)).removesuffix(''.join(backStr)), isX=word.x)
    #if not missingG.g:
    #    missingG = backSpell[0]
    if debug:
        print("missing grapheme", missingG)
    if len(frontSpell + backSpell) > len(phones):
            #print(word, frontSpell, len(frontSpell))
            #print(backSpell, len(backSpell))
            #print(phones)
            if debug:
                print("too long")
            del backSpell[0]
    if len(frontSpell + backSpell) < len(phones):
        try:
            missingP = phones[len(frontSpell)]
            if debug:
                print("missing phoneme (clean)", missingP)
            if len(word.s) == exLen and not missingG.set:
                missingG.setExample(word)        
            spellings[missingP].append(missingG)    
        except IndexError as e:
            if debug:
                print(word, frontSpell, len(frontSpell))
                print(backSpell, len(backSpell))
                print(phones)
            raise e
    else:
        missingP = phones[len(frontSpell) - 1]
        if debug:
            print("missing phoneme (overlap)", missingP)
        try:
            newG = graphExample(''.join((frontStr[-1], missingG.g) if frontStr else (missingG.g, backStr[0])), isX = missingG.g.count('x') or (frontStr[-1].count('x') if frontStr else backStr[0].count('x')))
            if len(word.s) == exLen and not newG.set:
                newG.setExample(word)  
            spellings[missingP].append(newG)
            if debug:
                print(newG)
        except IndexError as e:
            if debug:
                print(word, frontSpell, len(frontSpell))
                print(backSpell, len(backSpell))
                print(phones)
                print(missingP)
            raise e

def learn3(spellings, words, exLen=4, debug=False):
    newSpells = {s: copy(spellings[s]) for s in spellings}
    for word in words:
        spell3(word, newSpells, exLen=exLen, debug=debug)
        if debug:
            print(word, word.p)
        #break
    return newSpells

In [48]:
print(bool(set('geoff') & set('aeiou')))

True


In [49]:
baseSpellings = {
    'AA': ['o'],
    'AE': ['a'],
    'AH': ['u', 'o'],
    'AO': ['o'],
    'AW': ['ou'],
    'AX': ['a'],
    'AXR': ['er'],
    'AY': ['i'],
    'EH': ['e'],
    'ER': ['ir'],
    'EY': ['ai'],
    'IH': ['i'],
    'IX': ['e'],
    'IY': ['ea'],
    'OW': ['oa'],
    'OY': ['oy'],
    'UH': ['oo'],
    'UW': ['oo'],
    'UX': ['u'],
    'B': ['b'],
    'CH': ['ch'],
    'D': ['d'],
    'DX': ['tt'],
    'EL': ['le'],
    'EM': ['m'],
    'EN': ['on'],
    'F': ['f'],
    'G': ['g'],
    'H': ['h'],
    'HH': ['h'],
    'JH': ['j'],
    'K': ['k'],
    'L': ['l'],
    'M': ['m'],
    'N': ['n'],
    'NX': ['ng'],
    'NG': ['ng'],
    'P': ['p'],
    'Q': ['-'],
    'R': ['r'],
    'S': ['s'],
    'SH': ['sh'],
    'T': ['t'],
    'TH': ['th'],
    'V': ['v'],
    'W': ['w'],
    'WH': ['wh'],
    'Y': ['y'],
    'Z': ['z'],
    'ZH': ['s']
}

In [50]:
def initSpellings(plain):
    ans = {p: [graphExample(a, isX=bool(a.count('x'))) for a in plain[p]] for p in plain}
    return ans

In [53]:
coolSpellings = initSpellings(baseSpellings)

In [52]:
class graphExample:
    def __init__(self, grapheme, isX=False):
        self.g = grapheme
        self.set = False
        self.isx = isX
        self.x = 'x' if isX else None

    def __str__(self):
        try:
            return f"'{self.g if not self.isx else self.x}' in '{self.w}'"
        except AttributeError as e:
            return self.g if not self.isx else self.x

    def __repr__(self):
        return self.__str__()

    def __eq__(self, other):
        if isinstance(other, str):
            return self.g == other
        elif isinstance(other, graphExample):
            return self.g == other.g
        return False

    def __hash__(self):
        return self.g.__hash__() if not self.isx else self.x.__hash__()

    def setExample(self, ex):
        self.w = ex
        self.set = True

In [54]:
test1 = learn3(coolSpellings, twoLetter, exLen=2)

In [55]:
test1

{'AA': ['o' in 'og', 'ah' in 'ah', 'a' in 'ar'],
 'AE': ['a' in 'ad'],
 'AH': ['u' in 'um', 'o' in 'of', 'e' in 'em', 'a' in 'an', 'uh' in 'uh'],
 'AO': ['o' in 'om', 'aw' in 'aw'],
 'AW': [ou, 'ow' in 'ow'],
 'AX': [a],
 'AXR': [er],
 'AY': ['i' in 'fi', 'ai' in 'ai', 'ay' in 'ay', 'y' in 'by'],
 'EH': ['e' in 'eb', 'a' in 'as', 'eh' in 'eh'],
 'ER': [ir, 'er' in 'er', 'or' in 'or', 'ur' in 'ur'],
 'EY': [ai,
  'aa' in 'aa',
  'ae' in 'ae',
  'ai' in 'ai',
  'ay' in 'ay',
  'e' in 'de',
  'y' in 'wy'],
 'IH': ['i' in 'if', 'o' in 'to'],
 'IX': [e],
 'IY': [ea, 'e' in 'be', 'i' in 'di', 'y' in 'uy'],
 'OW': [oa, 'au' in 'au', 'o' in 'bo', 'oh' in 'oh', 'ow' in 'ow'],
 'OY': ['oy' in 'oy', 'oi' in 'oi'],
 'UH': [oo],
 'UW': [oo, 'o' in 'do', 'u' in 'du', 'o' in 'ou'],
 'UX': [u],
 'B': ['b' in 'ab'],
 'CH': [ch],
 'D': ['d' in 'da'],
 'DX': [tt],
 'EL': [le],
 'EM': [m],
 'EN': [on],
 'F': ['f' in 'fe'],
 'G': ['g' in 'ag'],
 'H': [h],
 'HH': ['h' in 'ha'],
 'JH': ['j' in 'ji'],
 'K': [

In [56]:
def validate(spell1, spell2):
    differences = {}
    for phone in spell1:
        setDiffs = set(spell1[phone]) ^ set(spell2[phone])
        if setDiffs:
            differences[phone] = [spell1[phone], spell2[phone]]
    if differences:
        return differences

In [57]:
from random import sample

def get_mistakes(baseSpells, words, completeSpells, exLen=4):
    generation = 0
    while generation < 100:
        testSpells = learn3(baseSpells, sample(words, len(words)), exLen=exLen)
        check = validate(completeSpells, testSpells)
        if check:
            return generation, check
        generation += 1
    return 'nope'

In [58]:
coolSpellings

{'AA': ['o' in 'og'],
 'AE': ['a' in 'ad'],
 'AH': ['u' in 'um', 'o' in 'of'],
 'AO': ['o' in 'om'],
 'AW': [ou],
 'AX': [a],
 'AXR': [er],
 'AY': ['i' in 'fi'],
 'EH': ['e' in 'eb'],
 'ER': [ir],
 'EY': [ai],
 'IH': ['i' in 'if'],
 'IX': [e],
 'IY': [ea],
 'OW': [oa],
 'OY': ['oy' in 'oy'],
 'UH': [oo],
 'UW': [oo],
 'UX': [u],
 'B': ['b' in 'ab'],
 'CH': [ch],
 'D': ['d' in 'da'],
 'DX': [tt],
 'EL': [le],
 'EM': [m],
 'EN': [on],
 'F': ['f' in 'fe'],
 'G': ['g' in 'ag'],
 'H': [h],
 'HH': ['h' in 'ha'],
 'JH': ['j' in 'ji'],
 'K': ['k' in 'ka'],
 'L': ['l' in 'al'],
 'M': ['m' in 'am'],
 'N': ['n' in 'an'],
 'NX': [ng],
 'NG': [ng],
 'P': ['p' in 'pa'],
 'Q': [-],
 'R': ['r' in 're'],
 'S': ['s' in 'es'],
 'SH': [sh],
 'T': ['t' in 'at'],
 'TH': [th],
 'V': ['v' in 'vi'],
 'W': [w],
 'WH': [wh],
 'Y': ['y' in 'ya'],
 'Z': ['z' in 'oz'],
 'ZH': [s]}

In [59]:
coolSpellings = initSpellings(baseSpellings)

In [63]:
test2 = learn3(test1, threeLetter, exLen=3, debug=True)

kay
phones ['K', 'EY'] ['EY', 'K']
front & back ['k' in 'ka', 'ay' in 'ay'] ['k' in 'ka', 'ay' in 'ay']
yay!
kay ['k' in 'ka', 'ay' in 'ay'] 2
['k' in 'ka', 'ay' in 'ay'] 2
['K', 'EY']
kay K EY
til
phones ['T', 'IH', 'L'] ['L', 'IH', 'T']
front & back ['t' in 'at', 'i' in 'if', 'l' in 'al'] ['t' in 'at', 'i' in 'if', 'l' in 'al']
yay!
til ['t' in 'at', 'i' in 'if', 'l' in 'al'] 3
['t' in 'at', 'i' in 'if', 'l' in 'al'] 3
['T', 'IH', 'L']
til T IH L
tis
phones ['T', 'IH', 'Z'] ['Z', 'IH', 'T']
front & back ['t' in 'at', 'i' in 'if', 's' in 'as'] ['t' in 'at', 'i' in 'if', 's' in 'as']
yay!
tis ['t' in 'at', 'i' in 'if', 's' in 'as'] 3
['t' in 'at', 'i' in 'if', 's' in 'as'] 3
['T', 'IH', 'Z']
tis T IH Z
dot
phones ['D', 'AA', 'T'] ['T', 'AA', 'D']
front & back ['d' in 'da', 'o' in 'og', 't' in 'at'] ['d' in 'da', 'o' in 'og', 't' in 'at']
yay!
dot ['d' in 'da', 'o' in 'og', 't' in 'at'] 3
['d' in 'da', 'o' in 'og', 't' in 'at'] 3
['D', 'AA', 'T']
dot D AA T
bad characters a's
a's EY Z
b

ValueError: 'ew' in 'dew' is not in list

Brainstorming potential solutions:

For issue one:
<ul>
    <li>completely ignore words which my program thinks is made of a single grapheme</li>
    <li>do a first pass, where those words are ignored. then allow monographic words. this assumes that a later word will contain the correct spellings to fix it</li>
</ul>
I like the second idea, but I'm not sure if it would actually work. I suppose I could just check whether the word actually contains only one phoneme, and only allow it in that case

Issue two:
<ul>
    <li>Figure out a way to incorporate silent letters into my program's understanding</li>
    <li>When my program thinks it's spelled a word correctly, check if it accounts for every phoneme. If it doesn't, force it to move on</li>
</ul>

I feel like the first idea goes against what I'm trying to do here. There's no consistent way to determine which letters are silent and which aren't, and my program assumes exactly one grapheme (excepting x) for every phoneme. As for the second idea, I don't know what it'll do if it 'moves on' and then fails to spell it. I need to work that out

Issue three: solve issue two

Important note: I don't know which letters of the word 'ewe' make which sounds. My opinion is that the first 'e' is the /J/ sound, and the remaining 'we' represents the ending /EW/. But I don't see one correct answer. And that means I don't have an objective way to measure my program's success. That sounds like a problem, but maybe it's cool that I, in a way, made something with its own ability to interpret language and develop its own opinions.

In [None]:
## from time import sleep
from copy import copy

def sideSpell4(word, phones, spells, polarity):
    currentSpell = []
    for p in phones:
        oldLen = len(currentSpell)
        for s in spells[p]:
            newSpell = copy(currentSpell)
            if not polarity:
                newSpell.append(s.g)
                if word.s.startswith(''.join([a if not isinstance(a, graphExample) else a.g for a in newSpell])):
                    currentSpell.append(s)
                    break
            else:
                newSpell.insert(0, s.g)
                if word.s.endswith(''.join([a if not isinstance(a, graphExample) else a.g for a in newSpell])):
                    currentSpell.insert(0, s)
                    break                
        if len(currentSpell) == oldLen:
            break
    return currentSpell

def spell4(word, spellings, exLen=4, debug=False):
    word.s = word.s.lower()
    if not word.s.isalnum():
        if debug:
            print("bad characters", word.s)
        return False
    if debug:
        print(word)
    phones = word.p.split()
    if len(word.s) < len(phones):
        if debug:
            print("too long!", word.s, phones)
        return False
    if not set(word.s) & set('aeiouy'):
        if debug:
            print("no vowels -> not a word!", word.s, phones)
        return False
    if word.s[:-2] == 'le' and phones[:-2] == ['AH', 'L'] and word.s[-3] not in ['aeiou']:
        phones.append(phones.pop[-2])
        if debug:
            print("special case: L")
    rphones = copy(phones)
    rphones.reverse()
    if debug:
        print("phones", phones, rphones)
    frontSpell = sideSpell4(word, phones, spellings, 0)
    frontStr = [a.g for a in frontSpell]
    backSpell = sideSpell4(word, rphones, spellings, 1)
    backStr = [a.g for a in backSpell]
    if debug:
        print("front & back", frontSpell, backSpell)
    if '' in frontSpell:
        frontSpell.remove('')
    if '' in backSpell:
        backSpell.remove('')
    if len(phones) > (len(frontSpell) + len(backSpell) + 1):
        if debug:
            print("failure to spell")
        return False
    if frontSpell == backSpell and ''.join(frontStr) == word.s:
        if debug:
            print('yay!')
            print(word, frontSpell, len(frontSpell))
            print(backSpell, len(backSpell))
            print(phones)
        exPhone = choice(phones)
        g = frontSpell[phones.index(exPhone)]
        p = spellings[exPhone]
        exGraph = p[p.index(g)]
        if len(word.s) == exLen and not exGraph.set:
            exGraph.setExample(word)
        return True
    missingG = graphExample(word.s.removeprefix(''.join(frontStr)).removesuffix(''.join(backStr)), isX=word.x)
    #if not missingG.g:
    #    missingG = backSpell[0]
    if debug:
        print("missing grapheme", missingG)
    if len(frontSpell + backSpell) > len(phones):
            #print(word, frontSpell, len(frontSpell))
            #print(backSpell, len(backSpell))
            #print(phones)
            if debug:
                print("too long")
            del backSpell[0]
    if len(frontSpell + backSpell) < len(phones):
        try:
            missingP = phones[len(frontSpell)]
            if debug:
                print("missing phoneme (clean)", missingP)
            if len(word.s) == exLen and not missingG.set:
                missingG.setExample(word)        
            spellings[missingP].append(missingG)    
        except IndexError as e:
            if debug:
                print(word, frontSpell, len(frontSpell))
                print(backSpell, len(backSpell))
                print(phones)
            raise e
    else:
        missingP = phones[len(frontSpell) - 1]
        if debug:
            print("missing phoneme (overlap)", missingP)
        try:
            newG = graphExample(''.join((frontStr[-1], missingG.g) if frontStr else (missingG.g, backStr[0])), isX = missingG.g.count('x') or (frontStr[-1].count('x') if frontStr else backStr[0].count('x')))
            if len(word.s) == exLen and not newG.set:
                newG.setExample(word)  
            spellings[missingP].append(newG)
            if debug:
                print(newG)
        except IndexError as e:
            if debug:
                print(word, frontSpell, len(frontSpell))
                print(backSpell, len(backSpell))
                print(phones)
                print(missingP)
            raise e

def learn3(spellings, words, exLen=4, debug=False):
    newSpells = {s: copy(spellings[s]) for s in spellings}
    for word in words:
        spell3(word, newSpells, exLen=exLen, debug=debug)
        if debug:
            print(word, word.p)
        #break
    return newSpells