Wordle is a game where a user has six guesses in order to correctly figure out the word of the day. Many players have optimized a stragety with a specific choice of words in order to reduce the amount of guesses that they do. With users figuring out strageties they may want to opt for a more challenging version of Wordle as they may get bored of the original version. Wordle Hard Mode is similar to regular Wordle however the difficulty jump may be attributed to the stipulation that every revealed clue must be used in the user's subsequent guesses. This means the user is boxed in and are required to be more creative with how they approach their gusses. However, this mode has mixed opinions as this stipulation would make it force players to guess the correct word while others rely on using differing words to narrow down their options. That is the question we'll try to figure out, whether Wordle Hardmode is actually harder than regular Wordle. To figure this out, we'll use reinforcement learning. Rienforcement learning is using an agnet to make decisions based on rewards and penalties. For Wordle, each guess will be used as the decision and the colors given back (gren, yellow, or grey) will be our reward and penalties. These reward and penalties will guide the agent's stragety for future guesses. By analyzing how the agent learns and improve its guessing stragety we can compare the agents and see which one truly poses a greater challenge for optimal gameplay.

In [None]:
import random
import numpy as np
import pandas as pd
import pickle
import os
from google.colab import drive

drive.mount('/content/drive')
# Initialize parameters
alpha = 0.3  # Learning rate
gamma = 0.9  # Discount factor
epsilon = 0.1  # Exploration rate
episodes = 10000  # Number of episodes

# Load word list from CSV
word_data = pd.read_csv('/content/drive/My Drive/Colab Notebooks/wordle.csv')  # Path to your CSV in Google Drive
word_list = word_data['word'].tolist()  # Convert the 'word' column to a list of words

# File path to save/load the Q-table
q_table_file = '/content/drive/My Drive/Colab Notebooks/q_table.pkl'

# Load the Q-table if it exists, otherwise initialize a new Q-table
if os.path.exists(q_table_file):
    with open(q_table_file, 'rb') as f:
        q_table = pickle.load(f)
    print("Loaded Q-table from file.")
else:
    q_table = {}  # Initialize an empty Q-table if no previous Q-table is found
    print("Initialized new Q-table.")

Mounted at /content/drive
Initialized new Q-table.


We initialize a Q-table, which is a 2D array which shows the possible word choices and guesses. Two functions were made when it came to the reward and penalty section. A green letter would get a high positive reward indicating that the agent chose correctly then yellow for a smaller reward where the word was correct, but could do better for a higher reward. Then a grey for a penalty for the agent to not choose that word again.

In [None]:
# Define the reward mechanism for Wordle regular mode
def get_clue_easy(ans, guess):
    clue = ""
    for idx, letter in enumerate(guess):
        if letter == ans[idx]:
            clue += "G"
        elif letter in ans:
            clue += "Y"
        else:
            clue += "-"
    return clue

In [None]:
def get_clue_hard(ans, guess):
    clue = ""
    correct_positions = set()
    partial_correct_letters = set()

    for idx, letter in enumerate(guess):
        if letter == ans[idx]:
            clue += "G"
            correct_positions.add((idx, letter))
        elif letter in ans:
            clue += "Y"
            partial_correct_letters.add(letter)
        else:
            clue += "-"

    return clue, correct_positions, partial_correct_letters

In [None]:
# Reward function based on the clue
def reward_function(clue, guess, previous_guesses, answer):
    reward = 0
    for char in clue:
        if char == "G":
            reward += 10  # Positive reward for correct letters
        elif char == "Y":
            reward += 5  # Small reward for yellow letters
        else:
            reward -= 10  # Penalty for incorrect letters

    # Penalty for repeating the same guess
    if guess in previous_guesses:
        reward -= 10

    # Bonus for solving the Wordle
    if guess == answer:
        reward += 10

    return reward

To guide the agent's learning process, we implmeented an epsilon-greedy strategy. This balances exploration where the agent tries new guesses and exploitation where the agent uses their currently best guess to make a decision. The episolon value will be used for the agent's basis for how much it'll explore different possible stregeties. In addition to this exploration, the agent will exploit its current knowledge by referencing the highest Q-value which can possibly yield the best guess. This balance is important in order to balance as too much exploration means it may not be able to find the right answer at all and too much explotation

In [None]:
# Epsilon-greedy action selection for normal
def choose_action_easy(state, epsilon, q_table, word_list):
    if random.uniform(0, 1) < epsilon:
        # Exploration: Random guess
        choice = "Exploration"
        return random.choice(word_list), choice
    else:
        # Exploitation: Choose the best action based on Q-values
        if state not in q_table:
            q_table[state] = np.zeros(len(word_list))  # Initialize Q-values if state not present

        # Choose the action with the highest Q-value
        action_index = np.argmax(q_table[state])
        choice = "Exploitation"
        return word_list[action_index], choice

In [None]:
# Epsilon-greedy action selection for hard mode
def choose_action_hard(state, epsilon, q_table, word_list, correct_positions, partial_correct_letters):
    filtered_words = filter_word_list(word_list, correct_positions, partial_correct_letters)

    if not filtered_words:
        filtered_words = word_list  # Fallback to full list if no words match constraints
    if random.uniform(0, 1) < epsilon:
        # Exploration: Random guess
        choice = "Exploration"
        return random.choice(word_list), choice
    else:
        # Exploitation: Choose the best action based on Q-values
        if state not in q_table:
            q_table[state] = np.zeros(len(word_list))  # Initialize Q-values if state not present

        # Choose the action with the highest Q-value
        action_index = np.argmax(q_table[state])
        choice = "Exploitation"
        return word_list[action_index], choice

In [None]:
def update_q_table(state, action, reward, next_state, q_table, alpha, gamma, word_list):
    # Ensure the state and next state are in the Q-table
    if state not in q_table:
        q_table[state] = np.zeros(len(word_list))  # Initialize with zeros if state is not in Q-table
    if next_state not in q_table:
        q_table[next_state] = np.zeros(len(word_list))  # Initialize with zeros for next state

    action_index = word_list.index(action)  # Get the index of the action (word) in the word_list
    max_future_q = np.max(q_table[next_state])  # Get the maximum Q-value for the next state

    # Q-learning formula: Update the Q-value for the current state-action pair
    q_table[state][action_index] = q_table[state][action_index] + alpha * (
        reward + gamma * max_future_q - q_table[state][action_index]
    )


The training process involves running multiple events where the agent will attempt to guess the randomized word within the allowed attempts. In each event, the agent will use the epsilon-greedy strategy to either explore new guesses or exploit known strategies.

After each guess, the agent receives feedback in the form of a clue and a corresponding reward, which is used to update the Q-table. The Q-table helps the agent refine its guesses by adjusting the values based on the rewards. The training continues for a set number of events, allowing the agent to learn and improve its strategy over time. The total rewards for each episode are tracked to monitor performance improvements.

In [None]:
def filter_word_list(word_list, correct_positions, partial_correct_letters):
    return [word for word in word_list
            if all(word[idx] == letter for idx, letter in correct_positions)
            and all(letter in word for letter in partial_correct_letters)]

In [None]:
# Track statistics
reward_count = []
epsilon_count = []
correct_count = []
wrong_count = 0
attempt = 0

# Training process for easy mode
for episode in range(episodes):
    answer = random.choice(word_list)  # Randomly choose a new answer for each episode
    state = "-----"  # Start with an empty clue state
    previous_guesses = []
    total_rewards = 0

    print(f'The answer for this episode is {answer}')

    for attempt in range(6):  # Up to 6 guesses per episode
        guess, strategy = choose_action_easy(state, epsilon, q_table, word_list)
        previous_guesses.append(guess)
        clue = get_clue_easy(answer, guess)
        reward = reward_function(clue, guess, previous_guesses, answer)

        print(f"Guess: {guess}, Clue: {clue}, Strategy: {strategy}, Attempt: {attempt + 1}")

        epsilon_count.append(strategy)  # Track exploration vs exploitation

        # Transition to the next state (the new clue)
        next_state = clue

        # Update Q-table
        update_q_table(state, guess, reward, next_state, q_table, alpha, gamma, word_list)

        # Update state and total rewards
        state = next_state
        total_rewards += reward

        # If the guess is correct, break out of the loop
        if guess == answer:
            print("Correct!")
            correct_count.append(1)
            break
    else:
        # If the episode ends without a correct guess, track it as wrong
        wrong_count += 1
        correct_count.append(0)

    reward_count.append(total_rewards)

# Output training results
print(f"Training completed after {episodes} episodes")
print(f"Correct guesses: {sum(correct_count)}, Wrong guesses: {wrong_count}")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Guess: brill, Clue: -Y---, Strategy: Exploitation, Attempt: 6
The answer for this episode is caves
Guess: ranke, Clue: -G--Y, Strategy: Exploration, Attempt: 1
Guess: artsy, Clue: Y--Y-, Strategy: Exploitation, Attempt: 2
Guess: calks, Clue: GG--G, Strategy: Exploitation, Attempt: 3
Guess: abord, Clue: Y----, Strategy: Exploitation, Attempt: 4
Guess: helot, Clue: -Y---, Strategy: Exploitation, Attempt: 5
Guess: goels, Clue: --Y-G, Strategy: Exploitation, Attempt: 6
The answer for this episode is dewan
Guess: mauls, Clue: -Y---, Strategy: Exploitation, Attempt: 1
Guess: goers, Clue: --Y--, Strategy: Exploitation, Attempt: 2
Guess: giros, Clue: -----, Strategy: Exploitation, Attempt: 3
Guess: melds, Clue: -G-Y-, Strategy: Exploitation, Attempt: 4
Guess: argal, Clue: Y--G-, Strategy: Exploitation, Attempt: 5
Guess: aruhe, Clue: Y---Y, Strategy: Exploitation, Attempt: 6
The answer for this episode is glaze
Guess: midge, Clue:

KeyboardInterrupt: 

In [None]:
# Track statistics for hard mode
reward_count = []
epsilon_count = []
correct_count = []
wrong_count = 0
attempt = 0


# Training process for hard mode
for episode in range(episodes):
    answer = random.choice(word_list)  # Randomly choose a new answer for each episode
    state = "-----"  # Start with an empty clue state
    previous_guesses = []
    correct_positions = set()
    partial_correct_letters = set()
    total_rewards = 0

    print(f'The answer for this episode is {answer}')

    for attempt in range(6):  # Up to 6 guesses per episode
        guess, strategy = choose_action_hard(state, epsilon, q_table, word_list, correct_positions, partial_correct_letters)
        clue, new_correct_positions, new_partial_correct_letters = get_clue_hard(answer, guess)
        reward = reward_function(clue, guess, previous_guesses, answer)
        print(f"Guess: {guess}, Clue: {clue}, Strategy: {strategy}, Attempt: {attempt + 1}")

        epsilon_count.append(strategy)  # Track exploration vs exploitation

        # Transition to the next state (the new clue)
        next_state = clue

        # Update Q-table
        update_q_table(state, guess, reward, next_state, q_table, alpha, gamma, word_list)

        # Update state and total rewards
        state = next_state
        total_rewards += reward
       # Update constraints
        correct_positions.update(new_correct_positions)
        partial_correct_letters.update(new_partial_correct_letters)
        # If the guess is correct, break out of the loop
        if guess == answer:
            print("Correct!")
            correct_count.append(1)
            break
    else:
        # If the episode ends without a correct guess, track it as wrong
        wrong_count += 1
        correct_count.append(0)

    reward_count.append(total_rewards)

# Output training results
print(f"Training completed after {episodes} episodes")
print(f"Correct guesses: {sum(correct_count)}, Wrong guesses: {wrong_count}")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
The answer for this episode is yapok
Guess: skers, Clue: -Y---, Strategy: Exploitation, Attempt: 1
Guess: bumfs, Clue: -----, Strategy: Exploitation, Attempt: 2
Guess: skets, Clue: -Y---, Strategy: Exploitation, Attempt: 3
Guess: bumph, Clue: ---Y-, Strategy: Exploitation, Attempt: 4
Guess: clows, Clue: --Y--, Strategy: Exploitation, Attempt: 5
Guess: butyl, Clue: ---Y-, Strategy: Exploitation, Attempt: 6
The answer for this episode is scope
Guess: skews, Clue: G-Y-Y, Strategy: Exploitation, Attempt: 1
Guess: abune, Clue: ----G, Strategy: Exploitation, Attempt: 2
Guess: byway, Clue: -----, Strategy: Exploitation, Attempt: 3
Guess: skews, Clue: G-Y-Y, Strategy: Exploitation, Attempt: 4
Guess: abuse, Clue: ---YG, Strategy: Exploitation, Attempt: 5
Guess: alone, Clue: --G-G, Strategy: Exploitation, Attempt: 6
The answer for this episode is apeek
Guess: skews, Clue: -YG--, Strategy: Exploitation, Attempt: 1
Guess: odium, Clue