# Information Entropy

## 1. Understanding Information Entropy
**Information entropy**, introduced by Claude Shannon, _is a measure of the uncertainty or randomness in a set of possible outcomes_. In the context of Wordle, entropy quantifies the expected information you would gain from making a particular guess, based on how it partitions the remaining possible words.

### Why Use Entropy in Wordle?

In Wordle, each guess provides feedback that narrows down the list of possible answers. By choosing a word that maximizes the expected information gain (entropy), you can eliminate the largest number of potential words, leading you closer to the solution more efficiently.

**Key Concepts:**
- **Probability Distribution:** The likelihood of each possible outcome.
- **Expected Information Gain:** The average amount of information you expect to gain from a guess.

## 2. Applying Entropy to Wordle

**How to Calculate Entropy for a Guess**
1. Possible Outcomes: For each guess, consider all possible feedback patterns (e.g., positions of green, yellow, and gray letters).

2. Partitioning the Word List: Each feedback pattern partitions the remaining possible words into subsets. Words that would produce the same feedback form a group.

3. Calculating Probabilities: For each feedback pattern, calculate the probability that it will occur, based on the current list of possible answers.

4. Entropy Formula:
$$\large
\text { Entropy }=-\sum_i p_i \log _2 p_i
$$
where $p_i$ is the probability of the $i$-th feedback pattern.


In [1]:
import math
from collections import defaultdict, Counter


class Wordle:
    def __init__(self):
        self.FILE_PATH = "../wordle/wordle-answers.txt"
        self.words = self.load_words(self.FILE_PATH)
        # self.words = ["about", "blink", "crane", "drove", "merge", "rayon"]
        self.LEN_WORDS = len(self.words)

    def load_words(self, file_path: str) -> list:
        with open(file_path, "r") as file:
            content = file.read()
            words = content.split("\n")
        if len(words[-1]) == 0:  # remove empty word after last word from
            words.pop()
        return words

    def simulate_feedback_pattern(self, word_played: str) -> dict:
        # hash map of the feedback pattern
        feedback_patern = defaultdict(list)
        for word in self.words:
            for letter in word_played:
                if letter not in word:
                    feedback_patern[word].append("gray")
                elif letter in word and word_played.index(letter) != word.index(letter):
                    feedback_patern[word].append("yellow")
                elif letter in word_played and word_played.index(letter) == word.index(
                    letter
                ):
                    feedback_patern[word].append("green")
        return feedback_patern

    # calculate probabilities of feedback pattern
    def calculate_probabilities(self, feedback_patern) -> dict:
        # count the number of each feedback pattern
        list_counts = Counter(tuple(lst) for lst in feedback_patern.values())
        # calculate the probabilities of each feedback pattern
        probabilities = {}
        for key, value in list_counts.items():
            probabilities[key] = value / self.LEN_WORDS
        return probabilities

    # calculate the entropy of the probabilities
    def compute_entropy(self, probabilities: dict) -> float:
        entropy = 0
        for prob in probabilities.values():
            entropy += -prob * math.log2(prob)
        return entropy

    # compute entropy for the guess
    def compute_entropy_guess(self, word_played: str):
        feedback_patern = self.simulate_feedback_pattern(word_played)
        probabilities = self.calculate_probabilities(feedback_patern)
        entropy = self.compute_entropy(probabilities)
        self.entropy = entropy


# Initialize the class
wordle = Wordle()
wordle.compute_entropy_guess("crane")  # make a guess
wordle.entropy

5.702285537930144

In [2]:
import math
from collections import defaultdict, Counter

# words = load_words(FILE_PATH)
words = ["about", "blink", "crane", "drove", "merge", "rayon"]
secret = "rayon"
word_played = "crane"

# hash map of the feedback pattern
feedback_patern = defaultdict(list)
for word in words:
    for letter in word_played:
        if letter not in word:
            feedback_patern[word].append("gray")
        elif letter in word and word_played.index(letter) != word.index(letter):
            feedback_patern[word].append("yellow")
        elif letter in word_played and word_played.index(letter) == word.index(letter):
            feedback_patern[word].append("green")

# count the number of each feedback pattern
list_count = Counter(tuple(lst) for lst in feedback_patern.values())
# calculate the probabilities of each feedback pattern
probabilities = {}
for lst, count in list_count.items():
    probabilities[lst] = count / len(words)


def compute_entropy(probabilities):
    entropy = 0
    for prob in probabilities.values():
        entropy += -prob * math.log2(prob)
    return entropy


entropy = compute_entropy(probabilities)
entropy

2.584962500721156