# Wordle Project 
Finding 3 optimum words to maximize the information gain to make the 4th guess

## Cost Function
$f(word) = (\Sigma^{i = len(word)}_{i=0} P(word[i])) + (P(word[0]) + P(word[len(word)])$

To get the most frequent occurring letters in the top guess was my idea
 behind using this cost function. The first half of the cost function is just the sum of all
 probabilities of letters in that word. This is the objective function. I wanted to maximize the
 probability of each letter so I get the most optimum guesses. The second half of this function is
 the regularization that I used to break the ties with. That is the probability of this starting letter
 and this ending letter of the word. This added an extra layer to break the ties.

## Implementation
For the implementation I started by loading the data and counting the
 occurrences of each letter to then get the probability of each letter. Then I removed all the words
 with double lettered occurrences leaving us words with only unique letters. Then I proceeded to
 calculate the cost of each word with the function stated above considering the probability of the
 letters as well as the probability of the starting and ending letters. Then I picked the first choice
 as the one with the highest cost. I then moved on to removing that word from the list of next
 possible words and proceeded to calculate the next best choice. For this I first removed all the
 words containing the letters of the first choice and selected the one with the maximum cost after
 winnowing out a few words. By the third try I was left with only a few words to choose from and I
 repeated the process again to find the third best word.

## Python Custom Methods


In [2]:
# imports
import argparse
import math
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import warnings
from collections import Counter

In [3]:
def read_file(file_loc):
    """
    reads the file in pandas and returns a dataframe.

    :param file_loc: the location of the file in the system
    :return: pandas dataframe
    """
    words = list()
    with open(file_loc) as f:
        lines = f.readlines()
        for line in lines:
            words.append(line[:-1])

    return words

In [4]:
def parse_args():
    """
    parses the arguments.
    :return: location of the file
    """

    parser = argparse.ArgumentParser()
    parser.add_argument("-file")
    args = parser.parse_args()  # parse arguments
    return args  # return file location given

In [5]:

def frequency_of_letters(words):
    """
    counts the number of times a letter occurs in this list
    :param words: list of words
    :return: dictionary with key as the letter and value as the count
    """
    count = {}
    for word in words:
        for letter in word:
            if letter not in count:
                count[letter] = 1
            else:
                count[letter] += 1
    return count

In [6]:
def cost_of_letters(count, total_words):
    """
    calculates the cost of each letter (probability)
    :param count: dictionary of count of each occurance of the letter
    :param total_words: total words in the list
    :return: dictionary as the cost or the probability as value and letter as key
    """
    cost = {}
    for letter in count:
        cost_of_letter = int(count[letter])/total_words
        cost[letter] = cost_of_letter
    return cost

In [7]:

def isUniqueChars(string):
    """
    checks if the string is made of unique characters
    :param string: string to check
    :return: boolean True if unique
    """
    # Counting frequency
    freq = Counter(string)

    if (len(freq) == len(string)):
        return True
    else:
        return False

In [8]:
def words_with_unique_letters(words):
    """
    gets all the words with different letters
    :param words: list of words
    :return: words with unqiue letters
    """
    unique_words = []
    for word in words:
        if isUniqueChars(word) == True:
            unique_words.append(word)
    return unique_words

In [9]:
def cost_of_words(cost_letters, words):
    """
    calculates the cost of each word by adding cost of each letter
    :param cost_letters: cost of each letter
    :param words: list of words
    :return: cost of words as dict
    """
    cost_words = {}
    for word in words:
        cost = 0
        for letter in word:
            cost += cost_letters[letter]
        cost_words[word] = cost
    return cost_words

In [10]:
def starting_frequency(words):
    """
    counts the number of times a letter has occured at the start of a word
    :param words: list of words
    :return: letter as key and count that it has occured at the start as value
    """
    count = {}
    for word in words:
        letter = word[0]
        if letter not in count:
            count[letter] = 1
        else:
            count[letter] += 1
    return count

In [11]:

def ending_frequency(words):
    """
    counts the number of times a letter has occured at the end of a word
    :param words: list of words
    :return: letter as key and count that it has occured at the end as value
    """
    count = {}
    for word in words:
        letter = word[-1]
        if letter not in count:
            count[letter] = 1
        else:
            count[letter] += 1
    return count

In [12]:
def break_ties(cost_words, starting, ending):
    """
    cosiders the starting and ending letter frequency in the cost to break ties
    :param cost_words: cost of words based on probability
    :param starting: dictionary of starting letter probabilities
    :param ending: dictionary of ending letter probabilities
    :return: updated costs with starting and ending letter considered
    """
    costs = {}
    for word in cost_words:
        cost = cost_words[word]
        cost += starting[word[0]]/26
        cost += ending[word[-1]]/26
        costs[word] = cost
    return costs

In [13]:
def process_next_choice(words, unique_words, cost):
    """
    takes the words guessed so far, list of possible words and costs of each letter to give the next best word.
    :param words: words guessed so far
    :param unique_words: list of possible words
    :param cost: cost of each letter
    :return: next best word
    """
    letters = []
    for word in words:
        l = list(word)
        letters += l
    next_options = list(unique_words)
    for word in unique_words:
        for i in letters:
            if i in word:
                if word in next_options:
                    next_options.remove(word)
    costs = cost_of_words(cost, next_options)
    costs = sort_dict(costs)
    return list(costs.keys())[0]

In [14]:
def sort_dict(d):
    """
    sort the dictionary by values in decending to maximize cost
    :param d: dictionary to sort
    :return:
    """
    sorted_dict = dict(sorted(d.items(), key=lambda x: x[1], reverse=True))
    return sorted_dict

## Getting top 3 words

In [20]:
words = read_file(r'C:\Users\shrun\VSProjects\Wordle Bot\wordle_bot\data')

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\shrun\\VSProjects\\Wordle Bot\\wordle_bot\\data'