# Playing with the dictionary to help in Wordle

[Wordle](https://www.powerlanguage.co.uk/wordle/) is a fun little game.

Besides being fun, it offers the opportunity of playing a bit with Python and Jupyter. Both things that I am trying to learn.

First lets load a word list from the computers dictionary

In [2]:
word_list = open("/usr/share/dict/words","r").readlines()
word_list

['A\n',
 'a\n',
 'aa\n',
 'aal\n',
 'aalii\n',
 'aam\n',
 'Aani\n',
 'aardvark\n',
 'aardwolf\n',
 'Aaron\n',
 'Aaronic\n',
 'Aaronical\n',
 'Aaronite\n',
 'Aaronitic\n',
 'Aaru\n',
 'Ab\n',
 'aba\n',
 'Ababdeh\n',
 'Ababua\n',
 'abac\n',
 'abaca\n',
 'abacate\n',
 'abacay\n',
 'abacinate\n',
 'abacination\n',
 'abaciscus\n',
 'abacist\n',
 'aback\n',
 'abactinal\n',
 'abactinally\n',
 'abaction\n',
 'abactor\n',
 'abaculus\n',
 'abacus\n',
 'Abadite\n',
 'abaff\n',
 'abaft\n',
 'abaisance\n',
 'abaiser\n',
 'abaissed\n',
 'abalienate\n',
 'abalienation\n',
 'abalone\n',
 'Abama\n',
 'abampere\n',
 'abandon\n',
 'abandonable\n',
 'abandoned\n',
 'abandonedly\n',
 'abandonee\n',
 'abandoner\n',
 'abandonment\n',
 'Abanic\n',
 'Abantes\n',
 'abaptiston\n',
 'Abarambo\n',
 'Abaris\n',
 'abarthrosis\n',
 'abarticular\n',
 'abarticulation\n',
 'abas\n',
 'abase\n',
 'abased\n',
 'abasedly\n',
 'abasedness\n',
 'abasement\n',
 'abaser\n',
 'Abasgi\n',
 'abash\n',
 'abashed\n',
 'abashedly\n'

Then let's clean up a bit the word list to get all the words with 5 letters.

In [4]:
words = [x.strip().lower() for x in word_list if len(x.strip()) == 5 ]

Count the letter appearances:

In [5]:
freq = {}
for w in words:
    for c in w:
        if c in freq:
            freq[c] += 1
        else:
            freq[c] = 1

And then compute their relative frequency (this step we may avoid).

In [6]:
total = sum(freq.values())
for k, v in freq.items():
    freq[k] = freq[k]/total

I define now a function to compute the score of a word. A word is better when the unique letters it contains appear more frequently as dictionary entries.

In [7]:
def score(w):
    s = 0
    for letter in set(w):
        s += freq[letter]
    return s

In [8]:
sc = [ (w, score(w)) for w in words] # score each element of the list
sc.sort(key=lambda x:x[1], reverse=True) # sort the list according to score
sc

[('arose', 0.399960899315738),
 ('oreas', 0.399960899315738),
 ('aries', 0.39990224828934506),
 ('arise', 0.39990224828934506),
 ('raise', 0.39990224828934506),
 ('serai', 0.39990224828934506),
 ('leora', 0.3982209188660802),
 ('ariel', 0.3981622678396872),
 ('ariel', 0.3981622678396872),
 ('erian', 0.39790811339198434),
 ('irena', 0.39790811339198434),
 ('reina', 0.39790811339198434),
 ('orate', 0.396989247311828),
 ('arite', 0.39693059628543503),
 ('artie', 0.39693059628543503),
 ('irate', 0.39693059628543503),
 ('retia', 0.39693059628543503),
 ('tarie', 0.39693059628543503),
 ('arles', 0.38981427174975564),
 ('arsle', 0.38981427174975564),
 ('laser', 0.38981427174975564),
 ('seral', 0.38981427174975564),
 ('slare', 0.38981427174975564),
 ('anser', 0.38956011730205276),
 ('nares', 0.38956011730205276),
 ('rasen', 0.38956011730205276),
 ('snare', 0.38956011730205276),
 ('aster', 0.38858260019550345),
 ('serta', 0.38858260019550345),
 ('stare', 0.38858260019550345),
 ('strae', 0.388582

According to this metric, the best word to start the game with should be the first one. Since we have a sorted list, we need the first element (extracted by [0]) and the first component of the pair (the second [0]) since we do not really care about the frequency.

In [9]:
sc[0][0]

'arose'

'arose' is a good candidate. I would have expected that 'arise' ranked higher. And furthermore, I like starting words that end with 's' (to know if the word we are looking for ends is plural). However, so far we did not tell the score function that we like words that end with 's', so that is my fault really.
So my personal favourite word to start would be 'aries'. The program gets pretty close nonetheless.

Here I run out of steam a bit, but still I defined what a not bad word is (i.e.: one that does not contain letters we know do not appear in the solution).

However, bad letters are just a constant that one needs to add.

In [128]:
def not_bad(w, bad):
    for c in w:
        if c in bad: return False
    return True


fsc = [w for (w, _) in sc if not_bad(w, "roseticmyd")]

Then I define a function to match words with letters in good positions. So the spec is a five character string with dots as placeholders for unknown letters, and the letters we know otherwise.

In [None]:
def match_spec(w, sp):
    for i in list(range(0,4)):
        if sp[i] != ".":
            if w[i] != sp[i]:
                return False
    return True


With that we can try to find the words that are not bad, and that match the spec. And we get them sorted by their frequency. The dictionary contains many weird words that Wordle does not like, so you should try the less weird looking words.

In [143]:
[w for w in fsc if match_spec(w, ".an..")]

['banal',
 'fanal',
 'lanaz',
 'banga',
 'kanap',
 'panak',
 'janua',
 'banak',
 'kanga',
 'banff',
 'wanga',
 'panax',
 'ganza',
 'hanna',
 'ganja',
 'panna',
 'banba',
 'ganga',
 'ganga',
 'nanga']