# Wordle Solver

You can take a systematic approach to solving wordle puzzles. This is my attempt to write an algorithm that will efficiently solve wordle puzzles. Really, I am trying to figure out what the best starting words and strategies are. 

First, I'll work on an algorithm that can solve the puzzle, given an answer. I may later adapt this to interact with the wordle site and solve puzzles where the answer is truly unknown (in other words, an algorithm you could use to actually solve a puzzle). 

<b>Goal for the first Minimal Viable Product of this analysis: </b>
- Write a function that can be fed a huge list of 5 letter words, and a list of starting words
    - attempts to solve the wordle for each possible 5 letter answer in 6 attempts or fewer 
    - iterates over each starting word and each possible 5 letter answer 
    - tracks success rate and number of attempts for each starting word 
- visualize success of each starting word 
    - which words solved the puzzle quickest 
    - which words had the highest success rate 
- I may make multiple versions of the function that operate under different strategies and see which one had the best results on average 


### Wordle Solver (given the answer) 

In [2]:
import pandas as pd
from english_words import english_words_lower_alpha_set as words # lower case letters, no punctuation 

In [9]:
type(words)
len(words)

25463

In [75]:
# create a dataframe from the set of words 
word_list = list(words)
word_df = pd.DataFrame(word_list).rename(columns = {0:"word"})

In [76]:
# add columns for info on words 
word_df['length'] = [len(w) for w in word_df.word]

In [77]:
# add count of unique letters 
word_df['unique_length'] = [len(set(word)) for word in word_df.word]

In [42]:
sample[0]

'cecil'

In [43]:
set(sample[0])

{'c', 'e', 'i', 'l'}

In [48]:
'a' in set(sample[0]) or 'i' in set(sample[0])

True

In [54]:
[vowel for vowel in set(sample[0]) if vowel in ['a','e','i','o','u','y']]

['e', 'i']

In [None]:
for sample_word in sample: 
    for letter in set(sample_word):
        if letter in ['a','e','i','o','u']: 
            letter

In [58]:
[letter for sample_word in sample for letter in set(sample_word) if letter in ['a','e','i','o','u']]

['e', 'i', 'o', 'a', 'e', 'o', 'u', 'o', 'a', 'e', 'u', 'a']

In [62]:
def return_vowels(word):
    vowels = [letter for letter in set(word) if letter in ['a','e','i','o','u']]
    vowels.sort()
    return vowels

In [65]:
sample

['cecil', 'gonad', 'joule', 'amoco', 'usage']

In [69]:
[return_vowels(word) for word in sample]

[['e', 'i'], ['a', 'o'], ['e', 'o', 'u'], ['a', 'o'], ['a', 'e', 'u']]

In [78]:
# add a few more helpful variables 
word_df["letters"] = [set(word) for word in word_df.word]
word_df["vowels"] = [return_vowels(word) for word in word_df.word]
word_df["vowel_count"] = [len(vowels) for vowels in word_df.vowels]

In [82]:
word_df

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count
0,y,1,1,{y},[],0
1,diluent,7,7,"{d, e, l, n, u, i, t}","[e, i, u]",3
2,cecil,5,4,"{c, e, l, i}","[e, i]",2
3,stricken,8,8,"{c, e, n, k, s, i, r, t}","[e, i]",2
4,dragoon,7,6,"{d, o, g, n, a, r}","[a, o]",2
...,...,...,...,...,...,...
25458,bluebook,8,6,"{e, l, o, b, u, k}","[e, o, u]",3
25459,codify,6,6,"{c, d, y, f, o, i}","[i, o]",2
25460,airtight,8,6,"{a, g, h, i, r, t}","[a, i]",2
25461,gibbon,6,5,"{o, g, b, n, i}","[i, o]",2


In [None]:
# filter to just 5 letter words
word_df = word_df[word_df.length == 5]

In [34]:
[len(set(word)) for word in sample]

[4, 5, 5, 4, 5]

In [25]:
word_df

Unnamed: 0,word,length
2,cecil,5
5,gonad,5
9,joule,5
20,amoco,5
23,usage,5
...,...,...
25422,shoji,5
25443,fussy,5
25448,rabid,5
25452,stahl,5


Found some pre-written code defining the frequencies of each letter of the English alphabet: 
https://inventwithpython.com/hacking/chapter20.html
However, I decided to update the values based on the dictionary frequency rather than frequency of letters in English texts. Words that rarely appear in text are totally fair game in Wordle! 
Frequency from: https://en.wikipedia.org/wiki/Letter_frequency

In [100]:
 englishLetterFreq = {'e': 11, 's':8.7,'i': 8.2, 'a': 7.8, 'r': 7.3, 'n': 7.2, 
                      't': 6.7, 'o': 6.1, 'l': 5.3, 'c': 4, 'd': 3.8, 'u': 3.3,
                      'g': 3, 'p': 2.8, 'm': 2.7, 'k': 2.5, 'h': 2.3, 'b': 2,
                      'y':1.6,'f': 1.4, 'v': 1, 'w': 0.91, 'z': 0.44, 'x': 0.27, 'q': 0.24, 'j': 0.21,
                     '.':0,'&':0}

In [107]:
def return_freqs(word):
    freq_list = [englishLetterFreq[letter] for letter in set(word)]
    return freq_list

In [88]:
sum([englishLetterFreq[letter] for letter in list(sample[0])])

32.5

In [108]:
[return_freqs(word) for word in sample]

[[4, 11, 5.3, 8.2],
 [3.8, 6.1, 3, 7.2, 7.8],
 [11, 0.21, 5.3, 6.1, 3.3],
 [6.1, 2.7, 7.8, 4],
 [11, 3, 3.3, 8.7, 7.8]]

In [109]:
# add columns for word frequency (list and sum)
word_df["freq_list"] = [return_freqs(word) for word in word_df.word]
word_df["freq_sum"] = [sum(return_freqs(word)) for word in word_df.word]

In [102]:
sample

['cecil', 'gonad', 'joule', 'amoco', 'usage']

In [110]:
word_df

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum
19422,immunoelectrophoresis,21,13,"{c, e, p, l, o, m, n, h, u, s, i, r, t}","[e, i, o, u]",4,"[4, 11, 2.8, 5.3, 6.1, 2.7, 7.2, 2.3, 3.3, 8.7...",75.60
21012,neuropsychiatric,16,13,"{p, e, y, c, a, o, n, h, u, s, i, r, t}","[a, e, i, o, u]",5,"[2.8, 11, 1.6, 4, 7.8, 6.1, 7.2, 2.3, 3.3, 8.7...",77.00
19608,ethnomusicology,15,13,"{c, e, y, l, o, m, g, n, h, u, s, i, t}","[e, i, o, u]",4,"[4, 11, 1.6, 5.3, 6.1, 2.7, 3, 7.2, 2.3, 3.3, ...",70.10
8173,electroencephalography,22,12,"{c, e, p, y, l, o, g, n, h, a, r, t}","[a, e, o]",3,"[4, 11, 2.8, 1.6, 5.3, 6.1, 3, 7.2, 2.3, 7.8, ...",65.10
3649,electroencephalogram,20,12,"{c, e, p, l, o, g, m, n, h, a, r, t}","[a, e, o]",3,"[4, 11, 2.8, 5.3, 6.1, 3, 2.7, 7.2, 2.3, 7.8, ...",66.20
...,...,...,...,...,...,...,...,...
2523,w,1,1,{w},[],0,[0.91],0.91
20516,z,1,1,{z},[],0,[0.44],0.44
16521,x,1,1,{x},[],0,[0.27],0.27
13050,q,1,1,{q},[],0,[0.24],0.24


In [111]:
# sort values
word_df = word_df.sort_values(["unique_length","freq_sum"], ascending = (False, False))

In [113]:
# now filter to 5 letter words
word5 = word_df[word_df.length == 5].copy()

In [114]:
word5 # this is pretty cool to see! 

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum
435,arise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
14482,aries,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
14573,aires,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
17505,raise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
12052,anise,5,5,"{e, a, n, s, i}","[a, e, i]",3,"[11, 7.8, 7.2, 8.7, 8.2]",42.9
...,...,...,...,...,...,...,...,...
7223,puppy,5,3,"{p, y, u}",[u],1,"[2.8, 1.6, 3.3]",7.7
12766,mummy,5,3,"{y, m, u}",[u],1,"[1.6, 2.7, 3.3]",7.6
19056,beebe,5,2,"{e, b}",[e],1,"[11, 2]",13.0
17840,mamma,5,2,"{m, a}",[a],1,"[2.7, 7.8]",10.5


In [116]:
word5.head(20)

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum
435,arise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
14482,aries,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
14573,aires,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
17505,raise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0
12052,anise,5,5,"{e, a, n, s, i}","[a, e, i]",3,"[11, 7.8, 7.2, 8.7, 8.2]",42.9
10735,siena,5,5,"{e, n, s, i, a}","[a, e, i]",3,"[11, 7.2, 8.7, 8.2, 7.8]",42.9
12313,risen,5,5,"{e, n, s, i, r}","[e, i]",2,"[11, 7.2, 8.7, 8.2, 7.3]",42.4
14087,siren,5,5,"{e, n, s, i, r}","[e, i]",2,"[11, 7.2, 8.7, 8.2, 7.3]",42.4
18578,rinse,5,5,"{e, n, s, i, r}","[e, i]",2,"[11, 7.2, 8.7, 8.2, 7.3]",42.4
3031,snare,5,5,"{e, n, s, a, r}","[a, e]",2,"[11, 7.2, 8.7, 7.8, 7.3]",42.0


In [130]:
[word[0] for word in sample]

['c', 'g', 'j', 'a', 'u']

In [131]:
sample

['cecil', 'gonad', 'joule', 'amoco', 'usage']

In [180]:
# add variables for each letter, by position 
word5["letter1"] = [word[0] for word in word5.word]
word5["letter2"] = [word[1] for word in word5.word]
word5["letter3"] = [word[2] for word in word5.word]
word5["letter4"] = [word[3] for word in word5.word]
word5["letter5"] = [word[4] for word in word5.word]
word5

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum,letter1,letter2,letter3,letter4,letter5
435,arise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,r,i,s,e
14482,aries,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,r,i,e,s
14573,aires,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,i,r,e,s
17505,raise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,r,a,i,s,e
12052,anise,5,5,"{e, a, n, s, i}","[a, e, i]",3,"[11, 7.8, 7.2, 8.7, 8.2]",42.9,a,n,i,s,e
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7223,puppy,5,3,"{p, y, u}",[u],1,"[2.8, 1.6, 3.3]",7.7,p,u,p,p,y
12766,mummy,5,3,"{y, m, u}",[u],1,"[1.6, 2.7, 3.3]",7.6,m,u,m,m,y
19056,beebe,5,2,"{e, b}",[e],1,"[11, 2]",13.0,b,e,e,b,e
17840,mamma,5,2,"{m, a}",[a],1,"[2.7, 7.8]",10.5,m,a,m,m,a


### Solving a Wordle

Working towards a function that can try out a word, get feedback, and work towards the puzzle solution. 

In [200]:
# example: answer = "spare", starting word - "arise"

answer = "spare"
start = "arise"

In [201]:
# check if any letters are in the right position 
correct_position = [start[n] == answer[n] for n in range(0,5)]
correct_position

[False, False, False, False, True]

In [137]:
correct_position = [n for n in range(0,5) if start[n] == answer[n]]

In [147]:
n = 0
"letter" + str(n + 1)

'letter1'

In [152]:
start[0] not in answer

False

In [202]:
words_to_consider[~words_to_consider.word.str.contains('i')]

Unnamed: 0,index,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum,letter1,letter2,letter3,letter4,letter5
0,25140,front,5,5,"{f, o, n, r, t}",[o],1,"[1.4, 6.1, 7.2, 7.3, 6.7]",28.7,f,r,o,n,t
1,1653,grunt,5,5,"{g, n, u, r, t}",[u],1,"[3, 7.2, 3.3, 7.3, 6.7]",27.5,g,r,u,n,t
2,22832,trunk,5,5,"{n, u, k, r, t}",[u],1,"[7.2, 3.3, 2.5, 7.3, 6.7]",27.0,t,r,u,n,k
3,16703,brunt,5,5,"{b, n, u, r, t}",[u],1,"[2, 7.2, 3.3, 7.3, 6.7]",26.5,b,r,u,n,t
4,22919,grout,5,5,"{o, g, u, r, t}","[o, u]",2,"[6.1, 3, 3.3, 7.3, 6.7]",26.4,g,r,o,u,t
5,23956,prong,5,5,"{p, o, g, n, r}",[o],1,"[2.8, 6.1, 3, 7.2, 7.3]",26.4,p,r,o,n,g
6,10634,crony,5,5,"{c, y, o, n, r}",[o],1,"[4, 1.6, 6.1, 7.2, 7.3]",26.2,c,r,o,n,y
7,3541,bruno,5,5,"{o, b, n, u, r}","[o, u]",2,"[6.1, 2, 7.2, 3.3, 7.3]",25.9,b,r,u,n,o
8,18915,frond,5,5,"{d, f, o, n, r}",[o],1,"[3.8, 1.4, 6.1, 7.2, 7.3]",25.8,f,r,o,n,d
9,20508,crown,5,5,"{c, o, w, n, r}",[o],1,"[4, 6.1, 0.91, 7.2, 7.3]",25.51,c,r,o,w,n


In [203]:
words_to_consider = word5.copy()
if start == answer: 
    print(f"The answer is {start}")
else:
    for n in range(0,5):
        if start[n] == answer[n]:
            col = "letter" + str(n + 1)
            words_to_consider = words_to_consider[words_to_consider[col] == start[n]]
        elif start[n] not in answer: 
            words_to_consider = words_to_consider[~words_to_consider.word.str.contains(start[n])]
        else: # letter is in the answer, but not in the right place 
            words_to_consider = words_to_consider[words_to_consider.word.str.contains(start[n])]
    words_to_consider = words_to_consider.reset_index()

In [320]:
word5

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum,letter1,letter2,letter3,letter4,letter5
435,arise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,r,i,s,e
14482,aries,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,r,i,e,s
14573,aires,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,a,i,r,e,s
17505,raise,5,5,"{e, a, s, i, r}","[a, e, i]",3,"[11, 7.8, 8.7, 8.2, 7.3]",43.0,r,a,i,s,e
12052,anise,5,5,"{e, a, n, s, i}","[a, e, i]",3,"[11, 7.8, 7.2, 8.7, 8.2]",42.9,a,n,i,s,e
...,...,...,...,...,...,...,...,...,...,...,...,...,...
7223,puppy,5,3,"{p, y, u}",[u],1,"[2.8, 1.6, 3.3]",7.7,p,u,p,p,y
12766,mummy,5,3,"{y, m, u}",[u],1,"[1.6, 2.7, 3.3]",7.6,m,u,m,m,y
19056,beebe,5,2,"{e, b}",[e],1,"[11, 2]",13.0,b,e,e,b,e
17840,mamma,5,2,"{m, a}",[a],1,"[2.7, 7.8]",10.5,m,a,m,m,a


In [194]:
# choose the next word to try 
next_word = words_to_consider["word"][0]
next_word

'snare'

In [328]:
# putting it all in 1 loop 
answer = "birth"
guess = "miner"
guesses = []
attempt_no = 1
words_to_consider = word5.copy()

while attempt_no <= 6:
    guesses.append(guess)
    if guess == answer:
        print(f"The answer is {guess}, found in attempt number {attempt_no}")
        print(f"Guesses: {guesses}")
        break
    elif attempt_no <= 5: 
        for n in range(0,5):
            col = "letter" + str(n + 1)
            if guess[n] == answer[n]: # letter is in the right place 
                words_to_consider = words_to_consider[words_to_consider[col] == guess[n]]
            elif guess[n] not in answer: # letter not in the answer
                words_to_consider = words_to_consider[~words_to_consider.word.str.contains(guess[n])]
             #   print(f"Dropped words containing {guess[n]}")
            else: # letter is in the answer, but not in the right place 
                words_to_consider = words_to_consider[words_to_consider.word.str.contains(guess[n])]
                # and drop words that have this letter in this exact position 
                words_to_consider = words_to_consider[words_to_consider[col] != guess[n]]
               # print(f"Filtered to words containing {guess[n]}, but where {col} is not {guess[n]}")
        words_to_consider = words_to_consider.reset_index(drop = True)
      #  print(f"This was attempt {attempt_no}")
        guess = words_to_consider["word"][0]
        attempt_no = attempt_no + 1
    else: 
        print("Sorry, couldn't do it :(")
        print(f"Guesses: {guesses}")

The answer is birth, found in attempt number 5
Guesses: ['miner', 'first', 'dirty', 'girth', 'birth']


In [334]:
# final function! 
def wordle(answer, guess):
    guesses = []
    attempt_no = 1
    words_to_consider = word5.copy()

    while attempt_no <= 6:
        guesses.append(guess)
        if guess == answer:
            print(f"The answer is {guess}, found in attempt number {attempt_no}")
            print(f"Guesses: {guesses}")
            break
        elif attempt_no <= 5: 
            for n in range(0,5):
                col = "letter" + str(n + 1)
                if guess[n] == answer[n]: # letter is in the right place 
                    words_to_consider = words_to_consider[words_to_consider[col] == guess[n]]
                elif guess[n] not in answer: # letter not in the answer
                    words_to_consider = words_to_consider[~words_to_consider.word.str.contains(guess[n])]
                 #   print(f"Dropped words containing {guess[n]}")
                else: # letter is in the answer, but not in the right place 
                    words_to_consider = words_to_consider[words_to_consider.word.str.contains(guess[n])]
                    # and drop words that have this letter in this exact position 
                    words_to_consider = words_to_consider[words_to_consider[col] != guess[n]]
                   # print(f"Filtered to words containing {guess[n]}, but where {col} is not {guess[n]}")
            words_to_consider = words_to_consider.reset_index(drop = True)
          #  print(f"This was attempt {attempt_no}")
            guess = words_to_consider["word"][0]
            attempt_no = attempt_no + 1
        else: 
            print("Sorry, couldn't do it :(")
            print(f"Guesses: {guesses}")

In [335]:
wordle(answer = "brick", guess = "stuck")

The answer is brick, found in attempt number 4
Guesses: ['stuck', 'aleck', 'prick', 'brick']


In [338]:
wordle(answer = "prick", guess = "stuck")

The answer is prick, found in attempt number 3
Guesses: ['stuck', 'aleck', 'prick']


In [340]:
wordle(answer = "shire", guess = "brink")

The answer is shire, found in attempt number 4
Guesses: ['brink', 'raise', 'spire', 'shire']


In [342]:
wordle(answer = "robot", guess = "brink")

The answer is robot, found in attempt number 4
Guesses: ['brink', 'sober', 'cobra', 'robot']


In [343]:
wordle(answer = "robot", guess = "audio")

The answer is robot, found in attempt number 4
Guesses: ['audio', 'senor', 'throb', 'robot']
