# Lets take a stab at wordle

## What is the best first word?

There are many ways to answer. [@tokbyzeb on tiktok](https://www.tiktok.com/@tokbyzeb/video/7058690677634256134) observed there are many heuristics, or measures, that can be used to evaluate words, and that the actual measure of word effectiveness is how many words does each word eliminate (maximize) or how many words are valid guesses (minimize)

**Lets first try to replicate his result of "roate" minimizing the number of possible words to ≈60.4288**

To do that, we need to define what it means to minimize valid words after each guess. Wordle will show correct letters in the correct position, and correct letters in the wrong position.

NOTES: I use colorblind mode (Orange, Blue, Gray) but lets use the standard coloring for discussion.

### n=1

Let's pretend our words are 1 letter. Then it is trivial, with our letter being correct or not. 2 possibilities:
- Green
- Gray

### n=2

Now things are tricky. Our guess may have the following evaluations:
- ``Green Green`` - We solved it!
- ``Yellow Yellow`` - Wrong order
- ``Gray Gray`` - Do better

The following are not possible.
- ``Green Yellow``
- ``Yellow Green``

### n=3

The possibilities are:

- ``0 Green 0 Yellow 3 Gray``
- ``0 Green 1 Yellow 2 Gray``
- ``1 Green 0 Yellow 2 Gray``
- ``0 Green 2 Yellow 1 Gray``
- ``1 Green 1 Yellow 1 Gray``
- ``2 Green 0 Yellow 1 Gray``
- ``0 Green 3 Yellow 0 Gray``
- ``1 Green 2 Yellow 0 Gray``
- ``2 Green 1 Yellow 0 Gray`` - NOT POSSIBLE
- ``3 Green 0 Yellow 0 Gray``

### Duplicate letters

A letter will only be green or yellow as many times is it occurs in the answer.

Thanks [/u/Humdrumbee on reddit](https://www.reddit.com/r/wordle/comments/ry49ne/illustration_of_what_happens_when_your_guess_has/)!



In [1]:
import json
import copy
import csv
import statistics

# The wordlists from wordle are seperated into two lists
# One is valid guesses
def make_list():
    la = json.load(open('la.json'))
    ta = json.load(open('ta.json'))
    return la+ta

wordle_valid = make_list()
wordle_answers = json.load(open('la.json'))


## How does elimination work?
Every time we guess a word, the number of possible solutions goes down, but how?

- If a letter is green, every word that does not have the same letter is that position, is pruned
- If a letter is yellow, every word that does not contain that letter is pruned
- If a letter is yellow, every word that has that letter in the same position in pruned
- If a letter is grey, every word that has the letter, in any position, is pruned

This implies we should test our word against every possible solution and for each we can then evaluate the word's performance

In [2]:
def score(test_word, solution_list, answer_list):
    # How many words are still solution guesses, after guessing this word?

    ##Some constants
    GREEN = 10
    YELLOW = 5
    GRAY = 0

    test_word_characters = list(test_word)
    amount_remaining_words = []

    # Loop over every possible answer
    for answer in answer_list:
        possible_solutions = copy.deepcopy(solution_list)
        answer_characters = list(answer)
        evaluation = [GRAY,GRAY,GRAY,GRAY,GRAY] # Green, Yellow, or Gray
        # Color the letters Green
        for index in range(0,5):
            if test_word_characters[index] == answer_characters[index]:
                evaluation[index] = GREEN
                answer_characters[index] = " "
        # Color the letters yellow
        for index in range(0,5):
            if test_word_characters[index] in answer_characters: 
                # If this character is repeated in the answer, 
                evaluation[index] = YELLOW
                answer_characters[answer_characters.index(test_word_characters[index])] = " "


        # Prune based on evaluation colors
        for index in range(0,5):
            for solution in possible_solutions:
                #- If a letter is green, every word that does not have the same letter is that position, is pruned
                if (evaluation[index] == GREEN):
                    if test_word[index] != solution[index]:
                        possible_solutions.remove(solution)
                        continue
                #- If a letter is yellow,
                if (evaluation[index] == YELLOW):
                    # every word that has that letter in the same position in pruned
                    if test_word[index] == solution[index]:
                        possible_solutions.remove(solution)
                        continue
                    # every word that does not contain that letter is pruned
                    if test_word[index] not in solution:
                        possible_solutions.remove(solution)
                        continue
                 #- If a letter is grey, every word that has the letter, in any position, is pruned
                if (evaluation[index] == GRAY):
                    if test_word[index] in solution:
                        possible_solutions.remove(solution)
                        continue
        amount_remaining_words.append(len(possible_solutions))
    #return sum(amount_remaining_words)/len(amount_remaining_words)
    return median(amount_remaining_words)

In [3]:
with open('scores.csv', 'w') as f:
    writer = csv.writer(f)

    for word in wordle_valid:
        score = score(word, wordle_valid, wordle_valid)
        result = word + score
        print(result)
        writer.writerow(result)