# "Hacking" Wordle

The goal is to obtain the best solution in wordle, the metric that I'm going to use is the **Shannon entropy**.

$$ S = \sum_{i} -p_{i} log_{2}\left ( p_{i} \right ) $$
## How Wordle works

The goal of the game is to find the daily word. We have 6 tries and the word lenght is 5 characters.

![alt text](./wordle_image.jpg)

When we make a try every letter is going to be colored, if the color is <font color="green"> green means that the letter is correct </font>, if it is <font color="orange">orange means that the letter is in the word but not in that position</font> and finally if the letter is <font color="grey">grey, the letter is not contained on the word</font>.




In [1]:
import pickle
import numpy as np
import pandas as pd

In [2]:
#First of all, I'm loading a dataset with all spanish words with 5 characters
with open('palabras_5_letras', 'rb') as fp:
    ALL_WORDS = pickle.load(fp)

In [3]:
#This function calculates the Shannon entropy of every character based on the all the possible words
def shannon_entropy(possible_words):
    #Initializing the dictionary:
    frec = dict()
    letters_sp = 'qwertyuiopasdfghjklñzxcvbnm '
    for l in letters_sp:
        frec[l] = 0
    #We are going to calculate the frequency of every character:
    total = 0
    for word in possible_words:
        letters = set(word)
        for l in letters:
            frec[l] += 1
            total += 1
    #So the entropy is:
    entropy = dict()
    for l in frec:
        p = frec[l]/total
        if(p!=0):
            entropy[l] = -p*np.log2(p)
        else:
            entropy[l] = 0
    
    return(entropy)

In [4]:
def best_option(possible_words, ALL_WORDS=[]):
    s = shannon_entropy(possible_words)
    #We have 2 options, depending on ALL_WORLDS parameter:
    #The first one takes less elements that the second so it's a lot of faster, but the second is the most optimal
    # 1) ALL_WORLDS is void (this is the optimal)
    if(ALL_WORDS==[]):
        total_worlds = possible_words
    # 2) ALL_WORLDS is not void
    else:
        total_worlds = ALL_WORDS
    
    #We are going to calculate the entropy of every word:
    data = []
    for word in total_worlds:
        entropy = 0
        letters = set(word)
        for l in letters:
            entropy += s[l]
        
        data.append((word, entropy))
    
    #We are going to take the word tha maximzes the entropy:
    df = pd.DataFrame(data)
    df = df.sort_values(by=1, ascending=False)
    df = df.reset_index()
    
    return(df[0][0])

In [5]:
#So we can obtain the best option between all possible spanish words:
best_option(ALL_WORDS)

'aireo'

In [6]:
#Based on the tried word and the solution obtained, this function filters all the possible words
def apply_filter(possible_words, tried_word, result):
    #possible_words: remaining words
    #result:
        #0: letter not contained in the word
        #1: contained, but not in this position
        #2: contained and correct position
    new_words = []
    not_contained = []
    contained = []
    all_joined = list(zip(tried_word, result, range(len(tried_word))))
    for i in all_joined:
        if(i[1]=='0'):
            not_contained.append(i[0])
        else:
            contained.append(i[0])
    
    for word in possible_words:
        valid = True
        for l in not_contained:
            valid = valid and (l not in word)
        for l in contained:
            valid = valid and (l in word)
        if(valid):
            for l in all_joined:
                i = l[2]
                if(l[1]=='1'):
                    valid = valid and (word[i]!=l[0])
                elif(l[1]=='2'):
                    valid = valid and (word[i]==l[0])
        if(valid):
            new_words.append(word)
    
    return(new_words)

In [7]:
#Here we have an example of how it works the function:
apply_filter(ALL_WORDS, 'aireo', '11201')

['boria', 'doria', 'noria']

In [13]:
#First option:
possible_words = ALL_WORDS

result = ''
i = 0
while(i<6 and result!=5*'2' and possible_words!=[]):
    word = best_option(possible_words)
    print('Next word is: '+ word)
    result = input('The result is: ')
    possible_words = apply_filter(possible_words, word, result)
    i += 1
print('\nSolved in ',i,' tries')

Next word is: aireo
The result is: 10002
Next word is: tlaco
The result is: 01102
Next word is: saldo
The result is: 12202
Next word is: balso
The result is: 02222
Next word is: falso
The result is: 22222

Solved in  5  tries


In [12]:
#Second option:
possible_words = ALL_WORDS

result = ''
i = 0
while(i<6 and result!=5*'2' and possible_words!=[]):
    if(len(possible_words)==1):
        word = possible_words[0]
    else:
        word = best_option(possible_words, ALL_WORDS)
    print('Next word is: '+ word)
    result = input('The result is: ')
    possible_words = apply_filter(possible_words, word, result)
    i += 1
print('\nSolved in ',i,' tries')    


Next word is: aireo
The result is: 10002
Next word is: talco
The result is: 02202
Next word is: glosa
The result is: 01121
Next word is: falso
The result is: 22222

Solved in  4  tries
