# Wordle Solver using twitter spoilers

Every time you submit a guess word, you get a 'clue' with the letters of the word in green, yellow, or gray.

This script starts with the hypothesis that every answer word has a unique set of 'clues' which makes it differentiable from other words. So, if you can get a set of clues from the wordle tweets that people share, you should be able to find the answer word.

The script starts making a dictionary of clue patterns containing all the words that show that pattern.

Then, given a set of clue patterns from Twitter, the script finds the word that has those patterns.

In [31]:
from tqdm import tqdm
import numpy as np
import pickle

## Compute the test between two words
The rules are straightforward:

* If the spot is green, the letter is in the correct spot
* If the spot is yellow, the letter is in the answer word but in the wrong place
* If the spot is gray, the letter is nos in the answer word

However, you should be careful with repeated letters. They are treated like any other letter, and if you repeat a letter more than it appears, the excess will be check in grey. For example:

```
answer: prick
guess:  polyp
clue:   gxxxx
````

where g: green, y: yellow, and x:gray. The second 'p' on polyp was check in gray.

In [32]:
def wordTest(answer, guess):
    """ Test two words (answer and guess) and return the clue pattern
    answer: string of 5 letters
    guess:  string of 5 letters
    return: string of 5 letters where g: green, y:yellow, and x:gray
    """
    
    # change from string to list to make it editable
    answer = list(answer)
    pattern = list('.....')
    
    # check for correct spot and check as green
    # remove the letter to not check a yellow highlight if the letter appears on other spot
    for i in range(5):
        if answer[i] == guess[i]:
            pattern[i] = 'g'
            answer[i] = '.'
    
    # check if the other spots are gray or yellow
    # remove the yellow to not check excess of repeated letters
    for i in range(5):
        if pattern[i] != 'g':
            index = answer.index(guess[i]) if guess[i] in answer else -1
            if index == -1:
                pattern[i] = 'x'
            else:
                pattern[i] = 'y'
                answer[index] = '.'
    
    return ''.join(pattern) 

## Load the word lists

The little trick is that the original game has in their code the word list of all possible answers and all the valid words.

As a side note, you should be careful because the list of answers word in the code are in order as they are the answer in the game everdy day.

You can get both list in alphabetical order (no spoilers here :)) from the following repo:

https://gist.github.com/cfreshman

In [33]:
# Loads the answer words
answers = []
with open('word-list/wordle-answers-alphabetical.txt','r') as f:
    for line in f:
        for word in line.split():
            answers.append(word)

# Loads the allowed words and contact with the answers, as they are also valid words
allowed = []
with open('word-list/wordle-allowed-guesses.txt','r') as f:
    for line in f:
        for word in line.split():
            allowed.append(word)

allowed = [*allowed, *answers]

## Dictionary of clue patterns
Create a dictionary with all the possible patterns of green, yellow and gray in 5 letters.

Then fill the dictionary with every answer word that can show that pattern when you use an allowed word.

In [41]:
# Create empty dictionary with all the clue pattern as key
patternDict = {}
for i in range(3**5):
    key = np.base_repr(i, base=3)
    key = '0'*(5-len(key))+key
    key = key.replace('0','x')
    key = key.replace('1','y')
    key = key.replace('2','g')
    
    patternDict[key] = []

# Fill the dictionary with the corresponding words
from tqdm import tqdm
for w1 in tqdm(answers):
    for w2 in allowed:
        clue = wordTest(w1, w2)
        
        patternDict[clue].append(w1)

# The previous piece of code will fill with repeated w1 words every time an allowed word shows a pattern
# I removed the repeated words with the piece of code below
# NOTE: This is memory-inneficient, but more faster than filter the word storage on the dictionary filling
for k in patternDict:
    patternDict[k] = list(dict.fromkeys(patternDict[k]))

        
pickle.dump(patternDict, open( "patternDict.pkl", "wb" ) )

100%|██████████| 2315/2315 [04:45<00:00,  8.11it/s]


## Find the answer word
Given a set of clue patterns get manually from twitter, first I get the possible word for each pattern from the list and intersect it until I get one word.

This script deppends highly on the patterns that people share and the ones that you add to the list.

In [48]:
patternDict = pickle.load( open( "patternDict.pkl", "rb" ) )

# Patterns from the wordle 217 (answer: wince)
key = ['xxxxx',
      'xyyxx',
      'gxyyg',
      'gggxg',
      'xxyxg',
      'xxxyx',
      'xxgxg',
      'xgggg',
      'xxxxg',
      'xggyx',
      'xxygg',
      'xxxgg',
      'xxxgg',
      'xxyyx',
      'xgxyx',
      'yxyxx',
      'xyxyy',
      'xgyxx',
      'xgxgg',
      'yxxxx',
      'gyggx',
      'ygxgg',
      'xgxyx',
      'xyygg',
      'xyxxx',
      'xgggg',
      'xxyyg',
      'xgxxx',
      'xxgxx',
      'xyyxg',
      'ygyxg',
      'ygxxx',
      'xxxyy',
      'xgxxg',
      'ggggx']
solution = 'wince'
key = list(dict.fromkeys(key))

intersection = patternDict[key[0]]
for k in key[1:]:
    intersection = list(set(intersection) & set(patternDict[k]))
    print(len(intersection))
    if len(intersection) == 1:
        print(intersection)

2314
487
278
278
278
278
180
180
161
119
119
119
119
119
119
119
113
113
23
10
4
4
4
4
4
4
3
3
3
3
1
['wince']
