<a href="https://colab.research.google.com/github/isaacmattern/wordle/blob/main/wordle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Wordle Solver**
### Python project by Isaac Mattern

[Wordle](https://www.powerlanguage.co.uk/wordle/) is a game where the objective is to correctly guess a 5-letter word in 6 tries or less. Each time a player submits a guess, the game highlights each letter in the word in

*   green, if the letter is in the word and in the correct location in the word
*   yellow, if the letter is in the word, but not in the position where the user guessed
*   gray, if the word does not contain the letter at all

This project is an attempt to use a list of the 12,972 5-letter words which are valid Wordle guesses and some Python magic to solve some Wordles.

I will be attempting to devise different algorithms to best solve Wordle.

In [None]:
# Control the amount of simulated games each algorithm will play
simulations = 30

##Getting Words
First thing's first, Wordle is all about 5-letter words. Assuming we don't understand how Wordle's words are selected, we can just search for a dataset of 5-letter words! [This repository](https://github.com/dwyl/english-words) contains the 466,000 most common English words. We could select the [words_aplpha.txt](https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt) file and place all words that aren't of length 5 into a list. Printing the length of this list tells us we have the 15,918 most common 5-letter English words. So, unless the people at Wordle are feeling particulary cruel, we probably have the answer here somewhere. 

But there's a smarter option here: the Wordle site gives a list of all valid Wordle guesses. We can just use those words to more accurately simulate Wordle. Plus, we don't want to generate any guesses that the real game wouldn't accept. 

In [221]:
# Grab the file we want - you can uncomment the line below to grab the 466k most
# common English words. You'll also need to change the file name in the open command.

# !wget -q https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt
!wget -q https://raw.githubusercontent.com/isaacmattern/wordle/main/valid-wordle-words.txt

In [222]:
words_file = open("valid-wordle-words.txt", "r")

all_words = []
for line in words_file:
  word = line.strip()
  # This line is only necessary if we aren't using the Wordle-supplied list
  if len(word) == 5:
    all_words.append(word)

print(len(all_words))

12972


## Defining Some Important Functions

Wordle will tell us certain information about our guess which we can use to eliminate many words from our list of possible words.

*   A **green** letter means we can eliminate any word which doesn't have the letter at the spot where we guessed it
*   A **yellow** letter means we can eliminate any words without that letter
*   A **gray** letter means we can eliminate any words with that letter 

Let's define two functions which will be useful regardless of what algorithm we are using.


1.   An **update_possible** function will allow us to trim down our list of possible Wordle solutions.
2.   A **get_colors** function will allow us to simulate what Wordle's program does each time you submit a guess. Thus, this function will only be used when we're running simulations to test the efficiency of our word-guessing algorithms

In [223]:
def update_possible(guess, possible, colors):
  """
  Uses a list of colors (equal in length
  to the length of our words) to eliminate words
  which could not possibly be correct. 
  """
  for i in range(len(guess)):
    if colors[i] == 0:
      # Eliminate all words which do not have a correct letter in a correct spot
      for word in possible[:]:
        if guess[i] != word[i]:
          possible.remove(word)
    elif colors[i] == 1:
      # Eliminate all words which do not contain a correct letter
      for word in possible[:]:
        if guess[i] not in word:
          possible.remove(word)
    else:
      # Eliminate all words which contain an incorrect letter
      for word in possible[:]:
        if guess[i] in word:
          possible.remove(word)
  if guess in possible:
    possible.remove(guess)
  return possible

def get_colors(guess, answer) -> list:
  """
  Compares a guess to an answer and
  returns a list of numbers which signifies
  colors returned by a wordle guess.
  0 = Green
  1 = Yellow
  2 = Gray
  """
  colors = []
  for i in range(len(guess)):
    if guess[i] == answer[i]:
      colors.append(0)
    elif guess[i] in answer:
      colors.append(1)
    else:
      colors.append(2)

  return colors

## Approach 1: Random Guess

We will first randomly select a word using *random.choice*. A completely random guess probably isn't the greatest strategy, so we shouldn't expect an amazing result. 

After selecting a random 100 words from the list and running a simulation for each of them, it took, on average, 6.37 guesses, which is kind of garbage, since more than 6 guesses is considered a loss by Wordle. 

In [230]:
import random

def random_guess() -> int:
  # Set up answer, word list of possible answers, and generate our first guess
  answer = random.choice(all_words)
  possible = all_words.copy()
  guess = random.choice(possible)
  num_guesses = 1

  # Randomly select a possible answer and use color information to eliminate
  # wrong solutions until we have found our word
  while guess != answer:
    # print(f"Guess #{num_guesses}: {guess} (incorrect)")
    colors = get_colors(guess, answer)
    possible = update_possible(guess, possible, colors)
    guess = random.choice(possible)
    num_guesses = num_guesses + 1

  print(f"{num_guesses} guesses for the solution \"{guess}\"\n")
  # print(num_guesses)
  return num_guesses

In [None]:
total_guesses = 0

for i in range(simulations):
  total_guesses = total_guesses + random_guess()

average = total_guesses / float (simulations)
print(f"Average # of guesses for random: {average}")

## Calculating Letter Distributions

The "best" Wordle guess is the one which could potentially cut down the amount of possible guesses the most. This is logically equivalent to the guess which is most likely to have the most amount of greens. Words with the letters which occur the most in the list are the best words to guess. We can recalculate these counts after each new guess with a **get_letter_occurrences** function, and use an **update_alphabet** function to update our alphabet (list of valid characters)

In [None]:
def get_letter_occurrences(alphabet, possible) -> dict:
  letters = {}

  for letter in alphabet:
    letters[letter] = 0

  for word in possible:
    for letter in list(set(word)):
      letters[letter] = letters[letter] + 1

  return letters

def update_alphabet(alphabet, guess, colors) -> list:
  new_alphabet = alphabet[:]
  for i in range(len(guess)):
    # Recall that a color of 2 means gray, which means this letter is
    # not part of the answer
    if colors[i] == 2:
      if guess[i] in new_alphabet:
        new_alphabet.remove(guess[i])
  return new_alphabet

## Approach 2: Guess Words With Most Common Letters

The function below, **get_common_letter_guess** uses our list of letters and their counts (I used a Python dict) to, letter by letter, assign a total "score" to each word in our remaining possible answers. 

Theoretically, the word with the max score should give us the most green squares, although there may be some flaw here: it might be the case that a letter is extremely common words, but uncommon in specific columns. We can come back to this with a third approach. 

In [None]:
def get_common_letter_guess(possible, letters) -> str:

  # Compute a score for each word
  best = {}
  for word in possible:
    best[word] = 0
    for letter in list(set(word)):
      best[word] = best[word] + letters[letter]

  # Guess the word with the best score
  return max(best, key=best.get)

In [229]:
def common_letters_guess() -> int:
  # Set up answer, word list of possible answers, and generate our first guess
  answer = random.choice(all_words)
  possible = all_words.copy()
  alphabet = list("abcdefghijklmnopqrstuvwxyz")
  letters = get_letter_occurrences(alphabet, possible)
  guess = get_common_letter_guess(possible, letters)
  num_guesses = 1

  # Randomly select a possible answer and use color information to eliminate
  # wrong solutions until we have found our word
  while guess != answer:
    # print(f"Guess #{num_guesses}: {guess} (incorrect)")
    colors = get_colors(guess, answer)
    possible = update_possible(guess, possible, colors)
    alphabet = update_alphabet(alphabet, guess, colors)
    letters = get_letter_occurrences(alphabet, possible)
    guess = get_common_letter_guess(possible, letters)
    num_guesses = num_guesses + 1

  print(f"{num_guesses} guesses for the solution \"{guess}\"\n")
  return num_guesses

In [None]:
total_guesses = 0

for i in range(simulations):
  total_guesses = total_guesses + common_letters_guess()

average = total_guesses / float (simulations)
print(f"Average # of guesses for common letter guess: {average}")

## Analysis So Far:

Based on 30 simulations each,

Approach 1: Random Guessing takes an average of **6.5 guesses**

Approach 2: Guess words with most common letters takes an average of **5.03 guesses**