# Score Words Script

This script applies a super simple algorithm to try to estimate the "best" wordle words.

Essentially, using the data of frequency of letters in each position we got from score_words.py, we score each letter with its frequency.

Here is an example: **SEAME**

- S in the first position occurs in 366 words, so we add 366 to SEAME's score
- E in the second position: 242
- A in the third position: 307
- M: in the fourth position: 68
- E in the last position: 424

We then sum these to get SEAME's score of **1407**.

In [None]:
from collections import defaultdict
import ast

# Load positional values dynamically
pos_vals = []
with open("./data/letter_counts.txt") as f:
    for line in f:
        if line.strip():
            # Each line looks like: Position 0: {'S': 366, ...}
            _, dict_str = line.split(":", 1)
            pos_vals.append(ast.literal_eval(dict_str.strip()))

def score_word(word: str) -> int:
    word = word.upper()
    if len(word) != 5: 
        return -1
    total = 0
    for i, ch in enumerate(word):
        total += pos_vals[i].get(ch, 0)
    return total

# Seed list of common valid guesses (heavily letter-frequency-biased).
with open("./guesses.txt") as f:
    candidate_words = [w.strip().upper() for w in f if len(w.strip()) == 5]

# Deduplicate and keep only alphabetic 5-letter words to be safe
cand = []
seen = set()
for w in candidate_words:
    w = w.upper()
    if len(w) == 5 and w.isalpha() and w not in seen:
        seen.add(w)
        cand.append(w)

scored = [(w, score_word(w)) for w in cand]
scored.sort(key=lambda x: x[1], reverse=True)

### Running the Script
There are three ways to run this, the first block, here, prints the top 10 scores to the terminal.

In [None]:
# Print out all top ten scorers
print(scored[:10])

### Run By Vowel Count
The second block is based off an observation I made while browsing Reddit, a lot of people intuitively believe more vowels = better. But Wordle pros actually argue that fewer vowels in your opener are better, no clue why, their reasons were complex. But I wanted to test the theory.

So in this section, you can define how many vowels you want in your word, so print off the top 10 words with 2 vowels.

In [None]:
vowels = ['A', 'E', 'I', 'O', 'U']
# Print only first 10 words with VOWEL_COUNT vowel(s)
count = 0
VOWEL_COUNT = 2
for word, score in scored:
    if sum(1 for ch in word if ch in vowels) == VOWEL_COUNT:
        print(word, score)
        count += 1
    if count >= 10:
        break

### Create File
Finally, for easier management this method allows you to print out a ton of these words to an external file. This is for use in the next script, the simulator.

In [None]:
# Find the top 100 words and export them to top_words.txt
with open("./data/top_words.txt", "w") as f:
    for word, score in scored[:100]:
        f.write(f"{word}: {score}\n")