# Trexquant Interview Project (The Hangman Game)

* Copyright Trexquant Investment LP. All Rights Reserved. 
* Redistribution of this question without written consent from Trexquant is prohibited

## Instruction:
For this coding test, your mission is to write an algorithm that plays the game of Hangman through our API server. 

When a user plays Hangman, the server first selects a secret word at random from a list. The server then returns a row of underscores (space separated)—one for each letter in the secret word—and asks the user to guess a letter. If the user guesses a letter that is in the word, the word is redisplayed with all instances of that letter shown in the correct positions, along with any letters correctly guessed on previous turns. If the letter does not appear in the word, the user is charged with an incorrect guess. The user keeps guessing letters until either (1) the user has correctly guessed all the letters in the word
or (2) the user has made six incorrect guesses.

You are required to write a "guess" function that takes current word (with underscores) as input and returns a guess letter. You will use the API codes below to play 1,000 Hangman games. You have the opportunity to practice before you want to start recording your game results.

Your algorithm is permitted to use a training set of approximately 250,000 dictionary words. Your algorithm will be tested on an entirely disjoint set of 250,000 dictionary words. Please note that this means the words that you will ultimately be tested on do NOT appear in the dictionary that you are given. You are not permitted to use any dictionary other than the training dictionary we provided. This requirement will be strictly enforced by code review.

You are provided with a basic, working algorithm. This algorithm will match the provided masked string (e.g. a _ _ l e) to all possible words in the dictionary, tabulate the frequency of letters appearing in these possible words, and then guess the letter with the highest frequency of appearence that has not already been guessed. If there are no remaining words that match then it will default back to the character frequency distribution of the entire dictionary.

This benchmark strategy is successful approximately 18% of the time. Your task is to design an algorithm that significantly outperforms this benchmark.

In [18]:
%load_ext autoreload
%autoreload 2

import json
import requests
import random
import string
import secrets
import time
import re
import collections
from tqdm.notebook import tqdm
try:
    from urllib.parse import parse_qs, urlencode, urlparse
except ImportError:
    from urlparse import parse_qs, urlparse
    from urllib import urlencode

from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
from agents import HangmanAPI, HangmanAPIError, LocalHangman
from hangman import Hangman
from typing import List, Dict, Union, NoReturn

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [19]:
def return_word_wise(wordlist : List[str]) -> List:
    dct_cnt = {
        chr(i) : 0 for i in range(ord('a'), ord('z')+1)
    }
    for word in wordlist:
        for c in set(word):
            dct_cnt[c] += 1
    cnt = sorted(dct_cnt.items(), key=lambda x:x[1])
    return cnt


In [20]:
class smart_greedy(LocalHangman):
    def __init__(self, access_token=None, session=None, timeout=None, wordlist="data/250k_train.txt"):
        super().__init__(access_token, session, wordlist)
        self.call_sign = "smart_greedy"
    def guess(self, word, debug : bool = False):
        '''
        Step 1: While filtering, first pattern match, and then ignore words containing letters
        already guessed. (Wrongly, ofc).
        
        '''
        clean_word = word.replace("_","[a-z]")
        len_word = len(clean_word)
        r = re.compile(f'\\b{clean_word}\\b')
        current_dictionary = self.current_dictionary
        new_dictionary = []
        new_dictionary = list(filter(r.match, current_dictionary))
        #! Gets the list of words matching the pattern.
        #! debugging hehe: 
        if debug:
            print(f"{word} {new_dictionary[:min(5, len(new_dictionary))]} {len(new_dictionary)}")

        #! removing words containing any bad letters
        good_letters = [i for i in word if i.isalpha()]
        bad_letters = [x for x in self.guessed_letters if x not in good_letters]
        #! DEBUG
        if debug:
            print(f"good: {good_letters} bad: {bad_letters}")

        if len(bad_letters)!=0:
            ptrn = f'^[^{"".join(bad_letters)}]*$'
            r = re.compile(ptrn)
            new_dictionary = list(filter(r.match, new_dictionary))
        # grab current dictionary of possible words from self object, initialize new possible words dictionary to empty
        #! DEBUG
        if debug:
            print(f"{word} {new_dictionary[:min(5, len(new_dictionary))]} {len(new_dictionary)}")        
        self.current_dictionary = new_dictionary



        full_dict_string = "".join(new_dictionary)
        
        c = collections.Counter(full_dict_string)
        sorted_letter_count = c.most_common()                   
        
        guess_letter = '!'
        
        # return most frequently occurring letter in all possible words that hasn't been guessed yet
        for letter,instance_count in sorted_letter_count:
            if letter not in self.guessed_letters:
                guess_letter = letter
                break
            
        # if no word matches in training dictionary, default back to ordering of full dictionary
        if guess_letter == '!':
            sorted_letter_count = self.full_dictionary_common_letter_sorted
            for letter,instance_count in sorted_letter_count:
                if letter not in self.guessed_letters:
                    guess_letter = letter
                    break      
        if debug:      
            print(guess_letter)
            print("====================================================================")
        return guess_letter

    
        

In [40]:
class word_freq_greedy(LocalHangman):
    '''
    Smart Greedy, but considers wordwise frequence instead of raw frequency.
    Eg: cheese contributes just one e.
    '''
    def __init__(self, access_token=None, session=None, timeout=None, wordlist="data/250k_train.txt"):
        super().__init__(access_token, session, timeout, wordlist)
        self.call_sign = "word_freq_greedy"
    
    def guess(self, word, debug = False):
        clean_word = word.replace("_","[a-z]")
        len_word = len(clean_word)
        r = re.compile(f'\\b{clean_word}\\b')
        current_dictionary = self.current_dictionary
        new_dictionary = []
        new_dictionary = list(filter(r.match, current_dictionary))
        #! Gets the list of words matching the pattern.
        #! debugging hehe: 
        if debug:
            print(f"{word} {new_dictionary[:min(5, len(new_dictionary))]} {len(new_dictionary)}")

        #! removing words containing any bad letters
        good_letters = [i for i in word if i.isalpha()]
        bad_letters = [x for x in self.guessed_letters if x not in good_letters]
        #! DEBUG
        if debug:
            print(f"good: {good_letters} bad: {bad_letters}")

        if len(bad_letters)!=0:
            ptrn = f'^[^{"".join(bad_letters)}]*$'
            r = re.compile(ptrn)
            new_dictionary = list(filter(r.match, new_dictionary))
        # grab current dictionary of possible words from self object, initialize new possible words dictionary to empty
        #! DEBUG
        if debug:
            print(f"{word} {new_dictionary[:min(5, len(new_dictionary))]} {len(new_dictionary)}")        
        self.current_dictionary = new_dictionary



        full_dict_string = "".join([''.join(set(x)) for x in new_dictionary])
        # print(full_dict_string)
        # time.sleep(50)
        c = collections.Counter(full_dict_string)
        sorted_letter_count = c.most_common()                   
        
        guess_letter = '!'
        
        # return most frequently occurring letter in all possible words that hasn't been guessed yet
        for letter,instance_count in sorted_letter_count:
            if letter not in self.guessed_letters:
                guess_letter = letter
                break
            
        # if no word matches in training dictionary, default back to ordering of full dictionary
        if guess_letter == '!':
            sorted_letter_count = self.full_dictionary_common_letter_sorted
            for letter,instance_count in sorted_letter_count:
                if letter not in self.guessed_letters:
                    guess_letter = letter
                    break      
        if debug:      
            print(guess_letter)
            print("====================================================================")
        return guess_letter

In [42]:
import dummylog
from datetime import datetime

dl = dummylog.DummyLog(datetime.now().strftime("%d%m_%H%M%S"))

## Testing Chalri Bahut Bhayankar.

## To start a new game:
1. Compute statistiks

In [45]:
api = word_freq_greedy(wordlist="data/250k.txt")


Successfully start a new game! Game ID: 6969rahuljha_1040729. # of tries remaining: 6. Word: _______.
Guessing letter: e
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 6, 'word': 'e___e__'}
Guessing letter: s
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 5, 'word': 'e___e__'}
Guessing letter: n
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 4, 'word': 'e___e__'}
Guessing letter: r
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 3, 'word': 'e___e__'}
Guessing letter: p
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 2, 'word': 'e___e__'}
Guessing letter: a
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 2, 'word': 'e___ea_'}
Guessing letter: h
Sever response: {'game_id': '6969rahuljha_1040729', 'status': 'ongoing', 'tries_remains': 2, 

True

In [49]:
api.start_game(practice=1,verbose=True, host_dict="250k.txt")


Successfully start a new game! Game ID: 6969rahuljha_841778. # of tries remaining: 6. Word: _______________.
Guessing letter: i
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': '_i__________i__'}
Guessing letter: t
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': 'ti__________i__'}
Guessing letter: o
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': 'ti______o___i__'}
Guessing letter: d
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': 'ti______od__i__'}
Guessing letter: r
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': 'ti___r_rod__i__'}
Guessing letter: p
Sever response: {'game_id': '6969rahuljha_841778', 'status': 'ongoing', 'tries_remains': 6, 'word': 'ti___rprod__i__'}
Guessing letter: b
Sever response: {'game_id': '6969rahuljha_84

True

In [50]:
def generate_stats(agent, train_list, test_list, n = 100):
    api = agent(wordlist = train_list)
    won = 0
    lost = 0
    curr_statz = 0.0
    played = 0
    for i in (pbar := tqdm(range(n))):
        pbar.set_description("Acc. %.3f"%curr_statz)
        res = api.start_game(practice=1,verbose=False, host_dict=test_list)
        if res:
            won += 1
        played += 1
        curr_statz = won/played

        # vl.logger.info(f"Using {api.call_sign}: Success Rate after {played} games:  {won/played}")
    dl.logger.info(f"Using {api.call_sign}: Train: {train_list} Test: {test_list}. Success rate after {played} games: %.3f"%(won/played))



In [57]:
generate_stats(smart_greedy, "data/250k.txt", "250k.txt", 1000)

  0%|          | 0/1000 [00:00<?, ?it/s]

07/24/2023 09:54:26 AM: INFO: Using smart_greedy: Train: data/250k.txt Test: 250k.txt. Success rate after 1000 games: 0.730
