# Wordle Solver

You can take a systematic approach to solving wordle puzzles. This is my attempt to write an algorithm that will efficiently solve wordle puzzles. Really, I am trying to figure out what the best starting words and strategies are. 

First, I'll work on an algorithm that can solve the puzzle, given an answer. I may later adapt this to interact with the wordle site and solve puzzles where the answer is truly unknown (in other words, an algorithm you could use to actually solve a puzzle). 

<b>Goal for the first Minimal Viable Product of this analysis: </b>
- Write a function that can be fed a huge list of 5 letter words, and a list of starting words
    - attempts to solve the wordle for each possible 5 letter answer in 6 attempts or fewer 
    - iterates over each starting word and each possible 5 letter answer 
    - tracks success rate and number of attempts for each starting word 
- visualize success of each starting word 
    - which words solved the puzzle quickest 
    - which words had the highest success rate 
- I may make multiple versions of the function that operate under different strategies and see which one had the best results on average 



Found a helpful article for hosting a python app for free: https://towardsdatascience.com/the-easiest-way-to-deploy-your-dash-app-for-free-f92c575bb69e

### Wordle Solver (given the answer) 

In [1]:
import pandas as pd
import numpy as np
from english_words import english_words_lower_alpha_set as words # lower case letters, no punctuation 

In [2]:
# create a dataframe from the set of words 
word_list = list(words)
word_df = pd.DataFrame(word_list).rename(columns = {0:"word"})

In [3]:
# add columns for info on words 
word_df['length'] = [len(w) for w in word_df.word]

In [4]:
# add count of unique letters 
word_df['unique_length'] = [len(set(word)) for word in word_df.word]

In [5]:
def return_vowels(word):
    vowels = [letter for letter in set(word) if letter in ['a','e','i','o','u']]
    vowels.sort()
    return vowels

In [6]:
# add a few more helpful variables 
word_df["letters"] = [set(word) for word in word_df.word]
word_df["vowels"] = [return_vowels(word) for word in word_df.word]
word_df["vowel_count"] = [len(vowels) for vowels in word_df.vowels]

In [7]:
word_df

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count
0,thereby,7,6,"{t, h, y, e, b, r}",[e],1
1,willful,7,5,"{w, u, l, f, i}","[i, u]",2
2,parsi,5,5,"{r, a, p, i, s}","[a, i]",2
3,seabed,6,5,"{a, e, b, d, s}","[a, e]",2
4,commodity,9,7,"{m, o, t, y, i, c, d}","[i, o]",2
...,...,...,...,...,...,...
25458,picturesque,11,9,"{s, q, u, t, e, p, i, c, r}","[e, i, u]",3
25459,gerbil,6,6,"{l, e, b, i, g, r}","[e, i]",2
25460,childrearing,12,10,"{l, h, a, e, n, g, i, c, d, r}","[a, e, i]",3
25461,footbridge,10,9,"{o, t, f, e, g, b, i, d, r}","[e, i, o]",3


Found some pre-written code defining the frequencies of each letter of the English alphabet: 
https://inventwithpython.com/hacking/chapter20.html
However, I decided to update the values based on the dictionary frequency rather than frequency of letters in English texts. Words that rarely appear in text are totally fair game in Wordle! 
Frequency from: https://en.wikipedia.org/wiki/Letter_frequency

In [8]:
 englishLetterFreq = {'e': 11, 's':8.7,'i': 8.2, 'a': 7.8, 'r': 7.3, 'n': 7.2, 
                      't': 6.7, 'o': 6.1, 'l': 5.3, 'c': 4, 'd': 3.8, 'u': 3.3,
                      'g': 3, 'p': 2.8, 'm': 2.7, 'k': 2.5, 'h': 2.3, 'b': 2,
                      'y':1.6,'f': 1.4, 'v': 1, 'w': 0.91, 'z': 0.44, 'x': 0.27, 'q': 0.24, 'j': 0.21,
                     '.':0,'&':0}

In [9]:
def return_freqs(word):
    freq_list = [englishLetterFreq[letter] for letter in set(word)]
    return freq_list

In [10]:
# add columns for word frequency (list and sum)
word_df["freq_list"] = [return_freqs(word) for word in word_df.word]
word_df["freq_sum"] = [sum(return_freqs(word)) for word in word_df.word]

In [11]:
# sort values
word_df = word_df.sort_values(["unique_length","freq_sum"], ascending = (False, False))

In [12]:
# now filter to 5 letter words
word5 = word_df[word_df.length == 5].copy()

In [13]:
word5 # this is pretty cool to see! 

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum
4000,aires,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
12338,arise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
12809,aries,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
15519,raise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
1577,siena,5,5,"{a, e, n, i, s}","[a, e, i]",3,"[7.8, 11, 7.2, 8.2, 8.7]",42.9
...,...,...,...,...,...,...,...,...
3972,puppy,5,3,"{p, y, u}",[u],1,"[2.8, 1.6, 3.3]",7.7
16727,mummy,5,3,"{m, y, u}",[u],1,"[2.7, 1.6, 3.3]",7.6
24887,beebe,5,2,"{e, b}",[e],1,"[11, 2]",13.0
15601,mamma,5,2,"{a, m}",[a],1,"[7.8, 2.7]",10.5


In [14]:
word5.head(20)

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum
4000,aires,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
12338,arise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
12809,aries,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
15519,raise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0
1577,siena,5,5,"{a, e, n, i, s}","[a, e, i]",3,"[7.8, 11, 7.2, 8.2, 8.7]",42.9
4291,anise,5,5,"{a, e, n, i, s}","[a, e, i]",3,"[7.8, 11, 7.2, 8.2, 8.7]",42.9
951,risen,5,5,"{r, e, n, i, s}","[e, i]",2,"[7.3, 11, 7.2, 8.2, 8.7]",42.4
4280,rinse,5,5,"{r, e, n, i, s}","[e, i]",2,"[7.3, 11, 7.2, 8.2, 8.7]",42.4
5649,siren,5,5,"{s, e, n, i, r}","[e, i]",2,"[8.7, 11, 7.2, 8.2, 7.3]",42.4
16596,snare,5,5,"{s, a, e, n, r}","[a, e]",2,"[8.7, 7.8, 11, 7.2, 7.3]",42.0


In [15]:
# add variables for each letter, by position 
word5["letter1"] = [word[0] for word in word5.word]
word5["letter2"] = [word[1] for word in word5.word]
word5["letter3"] = [word[2] for word in word5.word]
word5["letter4"] = [word[3] for word in word5.word]
word5["letter5"] = [word[4] for word in word5.word]
word5

Unnamed: 0,word,length,unique_length,letters,vowels,vowel_count,freq_list,freq_sum,letter1,letter2,letter3,letter4,letter5
4000,aires,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0,a,i,r,e,s
12338,arise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0,a,r,i,s,e
12809,aries,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0,a,r,i,e,s
15519,raise,5,5,"{r, a, e, i, s}","[a, e, i]",3,"[7.3, 7.8, 11, 8.2, 8.7]",43.0,r,a,i,s,e
1577,siena,5,5,"{a, e, n, i, s}","[a, e, i]",3,"[7.8, 11, 7.2, 8.2, 8.7]",42.9,s,i,e,n,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3972,puppy,5,3,"{p, y, u}",[u],1,"[2.8, 1.6, 3.3]",7.7,p,u,p,p,y
16727,mummy,5,3,"{m, y, u}",[u],1,"[2.7, 1.6, 3.3]",7.6,m,u,m,m,y
24887,beebe,5,2,"{e, b}",[e],1,"[11, 2]",13.0,b,e,e,b,e
15601,mamma,5,2,"{a, m}",[a],1,"[7.8, 2.7]",10.5,m,a,m,m,a


### Solving a Wordle

Working towards a function that can try out a word, get feedback, and work towards the puzzle solution. 

In [16]:
# example: answer = "spare", starting word - "arise"

answer = "spare"
start = "arise"

In [17]:
# final function! 
def wordle(answer, guess):
    guesses = []
    attempt_no = 1
    words_to_consider = word5.copy()

    while attempt_no <= 6:
        guesses.append(guess)
        if guess == answer:
            print(f"The answer is {guess}, found in attempt number {attempt_no}")
            print(f"Guesses: {guesses}")
            break
        elif attempt_no <= 5: 
            for n in range(0,5):
                col = "letter" + str(n + 1)
                if guess[n] == answer[n]: # letter is in the right place 
                    words_to_consider = words_to_consider[words_to_consider[col] == guess[n]]
                elif guess[n] not in answer: # letter not in the answer
                    words_to_consider = words_to_consider[~words_to_consider.word.str.contains(guess[n])]
                 #   print(f"Dropped words containing {guess[n]}")
                else: # letter is in the answer, but not in the right place 
                    words_to_consider = words_to_consider[words_to_consider.word.str.contains(guess[n])]
                    # and drop words that have this letter in this exact position 
                    words_to_consider = words_to_consider[words_to_consider[col] != guess[n]]
                   # print(f"Filtered to words containing {guess[n]}, but where {col} is not {guess[n]}")
            words_to_consider = words_to_consider.reset_index(drop = True)
          #  print(f"This was attempt {attempt_no}")
            guess = words_to_consider["word"][0]
            attempt_no = attempt_no + 1
        else: 
            print("Sorry, couldn't do it :(")
            print(f"Guesses: {guesses}")

In [18]:
wordle(answer = "wince", guess = "mince")

The answer is wince, found in attempt number 3
Guesses: ['mince', 'since', 'wince']


In [19]:
wordle(answer = "prick", guess = "stuck")

The answer is prick, found in attempt number 3
Guesses: ['stuck', 'aleck', 'prick']


In [20]:
wordle(answer = "shire", guess = "brink")

The answer is shire, found in attempt number 4
Guesses: ['brink', 'raise', 'spire', 'shire']


In [21]:
wordle(answer = "robot", guess = "brink")

The answer is robot, found in attempt number 4
Guesses: ['brink', 'sober', 'cobra', 'robot']


In [22]:
wordle(answer = "robot", guess = "audio")

The answer is robot, found in attempt number 4
Guesses: ['audio', 'norse', 'rocky', 'robot']


# Is there a better strategy? 

What if, instead of limiting itself by the information returned by a single guess, it tried to maximize the amount of information available from the first two guess by trying two totally different words? 

In [23]:
def wordle_2(answer, guess):
    guesses = []
    attempt_no = 1
    words_to_consider = word5.copy()
    second_word_options = word5.copy()

    while attempt_no <= 6:
        guesses.append(guess)
        if guess == answer:
            print(f"The answer is {guess}, found in attempt number {attempt_no}")
            print(f"Guesses: {guesses}")
            break
        elif attempt_no <= 5: 
            for n in range(0,5):
                col = "letter" + str(n + 1)
                if guess[n] == answer[n]: # letter is in the right place 
                    words_to_consider = words_to_consider[words_to_consider[col] == guess[n]]
                elif guess[n] not in answer: # letter not in the answer
                    words_to_consider = words_to_consider[~words_to_consider.word.str.contains(guess[n])]
                 #   print(f"Dropped words containing {guess[n]}")
                else: # letter is in the answer, but not in the right place 
                    words_to_consider = words_to_consider[words_to_consider.word.str.contains(guess[n])]
                    # and drop words that have this letter in this exact position 
                    words_to_consider = words_to_consider[words_to_consider[col] != guess[n]]
                   # print(f"Filtered to words containing {guess[n]}, but where {col} is not {guess[n]}")
                if attempt_no == 1: 
                    # make it so the second guess will not contain any of the same letters as the first guess 
                    second_word_options = second_word_options[~second_word_options.word.str.contains(guess[n])].reset_index(drop = True)
            words_to_consider = words_to_consider.reset_index(drop = True)
            if attempt_no == 1: 
                # get the next guess 
                guess = second_word_options["word"][0]
            else: 
                guess = words_to_consider["word"][0]
            attempt_no = attempt_no + 1
        else: 
            print("Sorry, couldn't do it :(")
            print(f"Guesses: {guesses}")

In [24]:
wordle("wince","arise")

The answer is wince, found in attempt number 4
Guesses: ['arise', 'tinge', 'mince', 'wince']


In [25]:
wordle_2("wince","arise")

The answer is wince, found in attempt number 4
Guesses: ['arise', 'count', 'mince', 'wince']


In [26]:
# test it out on a few 
starting_word = "radio"
answer_set = ["robot","proxy","shire","paint","peace","tangy","stair","route","blend","prick"]

for n in range(0,len(answer_set)):
    print("wordle 1: ")
    wordle(answer_set[n], starting_word)
    print("wordle 2:")
    wordle_2(answer_set[n], starting_word)

wordle 1: 
The answer is robot, found in attempt number 5
Guesses: ['radio', 'rosen', 'rough', 'rocky', 'robot']
wordle 2:
The answer is robot, found in attempt number 3
Guesses: ['radio', 'csnet', 'robot']
wordle 1: 
The answer is proxy, found in attempt number 5
Guesses: ['radio', 'norse', 'grout', 'prowl', 'proxy']
wordle 2:
The answer is proxy, found in attempt number 4
Guesses: ['radio', 'csnet', 'flour', 'proxy']
wordle 1: 
The answer is shire, found in attempt number 4
Guesses: ['radio', 'siren', 'spire', 'shire']
wordle 2:
The answer is shire, found in attempt number 4
Guesses: ['radio', 'csnet', 'spire', 'shire']
wordle 1: 
The answer is paint, found in attempt number 3
Guesses: ['radio', 'saint', 'paint']
wordle 2:
The answer is paint, found in attempt number 3
Guesses: ['radio', 'csnet', 'paint']
wordle 1: 
The answer is peace, found in attempt number 4
Guesses: ['radio', 'least', 'peach', 'peace']
wordle 2:
The answer is peace, found in attempt number 4
Guesses: ['radio', '

# What's the Best Starting words? 


I'd like to compare a bunch of starting words, get the win rate and the average number of guesses. 

I'll need to rewrite my functions a bit, then loop to compile a dataframe of results. 

In [27]:
# rewriting wordle1 function 
def wordle_results(answer, starting_word):
    guess = starting_word
    guesses = []
    attempt_no = 1
    words_to_consider = word5

    while attempt_no <= 6:
        guesses.append(guess)
        if guess == answer:
            success = True
            total_attempts = attempt_no
            break
        elif attempt_no <= 5: 
            for n in range(0,5):
                col = "letter" + str(n + 1)
                if guess[n] == answer[n]: # letter is in the right place 
                    words_to_consider = words_to_consider[words_to_consider[col] == guess[n]]
                elif guess[n] not in answer: # letter not in the answer
                    words_to_consider = words_to_consider[~words_to_consider.word.str.contains(guess[n])]
                else: # letter is in the answer, but not in the right place 
                    words_to_consider = words_to_consider[words_to_consider.word.str.contains(guess[n])]
                    # and drop words that have this letter in this exact position 
                    words_to_consider = words_to_consider[words_to_consider[col] != guess[n]]
            words_to_consider = words_to_consider.reset_index(drop = True)
            guess = words_to_consider["word"][0]
            attempt_no = attempt_no + 1
        else: 
            success = False
            total_attempts = np.NaN
    wordle_method = "method-1"
    result_dict = {"starting_word": starting_word, "answer":answer, "success":success, "attempts":total_attempts,
                  "guesses":list(guesses), "method":wordle_method}
    return pd.DataFrame([result_dict])

In [37]:
# get a list of 500 random answers to iterate across 
answer_list = list(word5.sample(n= 500, random_state = 12345).word)

In [40]:
starting_word_list = ["arise", "noise","aisle","least","alien","steal","audio","radio","ouija","pluck"]
starting_word_list_small = ["arise","least","radio","noise","alien"]

In [41]:
# how many times are we iterating? 
# per answer, per starting word (iterations per method)
len(starting_word_list) * len(answer_list)

5000

In [42]:
len(starting_word_list_small) * len(answer_list)

2500

In [43]:
# iterate to collect results 

# list to collect dataframe of results 
results = [] 
loopcount = 1
# loop 2500 times
for answer in answer_list:
    for starting_word in starting_word_list:
        wr = wordle_results(answer, starting_word)
        results.append(wr)
        loopcount = loopcount + 1
method1_df = pd.concat(results)

MemoryError: 

In [44]:
loopcount

142

In [None]:
method1_df = pd.concat(results)

I keep running into memory errors. Probably from saving a lot of data frames. I should try a different method for saving the results (perhaps a matrix?)


In [46]:
results

[  starting_word answer  success  attempts  \
 0         arise  wylie     True         5   
 
                                guesses    method  
 0  [arise, tinge, movie, julie, wylie]  method-1  ,
   starting_word answer  success  attempts  \
 0         noise  wylie     True         5   
 
                                guesses    method  
 0  [noise, irate, bilge, julie, wylie]  method-1  ,
   starting_word answer  success  attempts                       guesses  \
 0         aisle  wylie     True         4  [aisle, loire, julie, wylie]   
 
      method  
 0  method-1  ,
   starting_word answer  success  attempts  \
 0         least  wylie     True         5   
 
                                guesses    method  
 0  [least, rigel, kline, julie, wylie]  method-1  ,
   starting_word answer  success  attempts                       guesses  \
 0         alien  wylie     True         4  [alien, sidle, julie, wylie]   
 
      method  
 0  method-1  ,
   starting_word answer  success 