2022 This is a solver for wordle.  It uses the following heuristic:
- Pull 5 letter words from nltk corpus.  Drop proper nouns
- Calculate letter frequency using this universe
- Rank words using the sum of the frequency of unique letters
- Solve the puzzle:
  - Make a guess, score it (must provide actual answer)
  - Filter universe by what was learned

Let it loose on the 200 wordle words on (the first 200) https://gist.githubusercontent.com/potluck/4ff27425a1896100279a2552f7a062da/raw/954da9449c754da5650d35f51f3a4030d946d71d/Wordles.txt

It has an error on 6 of the words, which needs to be sorted out.  It solves 85.6% of the remaining ones.

In [13]:
import numpy as np
import pandas as pd
from collections import Counter
import urllib

In [2]:
# if first time do:
# import nltk
# nltk.download('words')
from nltk.corpus import words 

In [17]:
prior_url = 'https://gist.githubusercontent.com/potluck/4ff27425a1896100279a2552f7a062da/raw/954da9449c754da5650d35f51f3a4030d946d71d/Wordles.txt'

with urllib.request.urlopen(prior_url) as fp:
    mybytes = fp.read()
    mystr = mybytes.decode("utf8")
prior_wordles = mystr.split('\n')
len(prior_wordles)

200

In [3]:
w5 = [w for w in words.words() if len(w) == 5 and w[0].lower() == w[0]] # looks like prop nouns are capitalized

In [4]:
counts = Counter(''.join(w5))
counts = pd.Series(counts)
freq = counts.div(sum(counts)).sort_values(ascending=False)
freq_d = freq.to_dict()

In [5]:
# so now lets find the word with the higest letter frequency
word_freq = lambda w: sum([freq_d[l] for l in Counter(w).keys()])
sorted(w5, key=word_freq)[-5:] # this would suggest we use 'orate' as our first guess every time

['irate', 'tarie', 'retia', 'arose', 'orate']

In [6]:
def score_guess(guess, actual):
    """scores a guess"""
    matches2 = [i for i in range(5) if guess[i]==actual[i]]
    actual_ex_m2 = [actual[i] for i in range(5) if i not in matches2]
    matches1 = [i for i in range(5) if guess[i] in actual_ex_m2 and not guess[i]==actual[i]]
    return matches1, matches2

In [7]:
# search for the highest probability where matches2 are same
# and matches 1 are in word but not in same place
def eval_candidate(w, guess, m1, m2, bad_letters):
    """evaluates a word for candidate list of words"""
    if sum(m2) > 0:
        check_m2 = np.all([guess[i]==w[i] for i in m2]) # true if m2 letters in place
    else:
        check_m2 = True
    if sum(m1) > 0:
        # need three things:
        # (a) guess[i] needs to be in w (guess in the word)
        # (b) guess[i] can't be equal to w[i] (it's not in right place)
        # (c) also have to ignore m2[i]
        check_m1 = True
        for i in m1:
            check_m1 = check_m1 and  guess[i] in w and guess[i] != w[i] # (a) and (b)
            for j in m2:
                check_m1 = check_m1 and guess[i] != w[j] 
    else:
        check_m1 = True
    non_m2_letters = [w[i] for i in range(5) if i not in m2]
    check_bl = np.all([bl not in non_m2_letters for bl in bad_letters])
    return check_m2 and check_m1 and check_bl

    
def get_candidates(guess, m1, m2, bad_letters):
    """evaluates all the words in w5 and returns the candidates"""
    candidates = [w for w in w5 if eval_candidate(w, guess, m1, m2, bad_letters)]
    return candidates



In [27]:
def solve_wordle(actual, first_guess='orate'):
    """solve the worlde!"""
    guesses = [first_guess]
    bad_letters = []
    while len(guesses) < 6:
        m1, m2 = score_guess(guesses[-1], actual)
        if len(m2) == 5:
            break
        bad_letters += [guesses[-1][i] for i in range(5) if i not in m1 + m2]
        candidates = get_candidates(guesses[-1], m1, m2, bad_letters)
        guesses.append(sorted(candidates, key=word_freq)[-1])
    success = guesses[-1] == actual
    return success, guesses

In [45]:
error_count, failure_count = 0, 0
guesses = []
for pw in prior_wordles:
    try:
        r, g = solve_wordle(pw)
        if r:
            print('{}: Got it in {} guesses'.format(pw, len(g)))
            guesses.append(len(g))
        else:
            print('{}: DID NOT GET!'.format(pw))
            failure_count += 1
    except:
        print('{}: We had a problem solving this one'.format(pw))
        error_count += 1
    

cigar: Got it in 4 guesses
rebut: Got it in 6 guesses
sissy: Got it in 6 guesses
humph: Got it in 4 guesses
awake: DID NOT GET!
blush: Got it in 4 guesses
focal: DID NOT GET!
evade: DID NOT GET!
naval: Got it in 5 guesses
serve: We had a problem solving this one
heath: Got it in 6 guesses
dwarf: Got it in 4 guesses
model: Got it in 4 guesses
karma: DID NOT GET!
stink: Got it in 4 guesses
grade: Got it in 4 guesses
quiet: Got it in 5 guesses
bench: Got it in 4 guesses
abate: DID NOT GET!
feign: Got it in 5 guesses
major: Got it in 6 guesses
death: Got it in 4 guesses
fresh: Got it in 4 guesses
crust: Got it in 3 guesses
stool: Got it in 3 guesses
colon: Got it in 4 guesses
abase: Got it in 6 guesses
marry: Got it in 5 guesses
react: Got it in 3 guesses
batty: DID NOT GET!
pride: Got it in 4 guesses
floss: Got it in 6 guesses
helix: Got it in 5 guesses
croak: Got it in 6 guesses
staff: DID NOT GET!
paper: DID NOT GET!
unfed: Got it in 6 guesses
whelp: Got it in 5 guesses
trawl: Got it in

In [47]:
n_wordles = len(prior_wordles)
n_non_error = n_wordles - error_count
n_success = n_non_error - failure_count
pct_success = n_success/n_non_error
print('overall, {}/{} ({:0.1%} success; {} errors)'.format(n_success, n_non_error, pct_success, error_count))

overall, 166/194 (85.6% success; 6 errors)
