## Riddler Classic: June 28

https://fivethirtyeight.com/features/whats-your-best-scrabble-string/

The game of Scrabble has 100 tiles — 98 of these tiles contain a letter and a score, and two of them are wildcards worth zero points. At home on a lazy summer day with a bag of these tiles, you decide to play the Superstring Scrabble Challenge. Using only the 100 tiles, you lay them out into one long 100-letter string of your choosing. You look through the string. For each word you find, you earn points equal to its score. Once you find a word, you don’t get any points for finding it again. The same tile may be used in multiple, overlapping words. So ‘“theater” includes “the,” “heat,” “heater,” “eat,” “eater,” “ate,” etc.

The super challenge: What order of tiles gives you the biggest score? (The blank tiles are locked into the letter they represent once you’ve picked it.)

The winner, and inaugural Wordsmith Extraordinaire of Riddler Nation, will be the solver whose string generates the most points. You should use this word list to determine whether a word is valid.

For reference, this is the distribution of letter tiles in the bag, by their point value:

- 0: ?×2
- 1: E×12 A×9 I×9 O×8 N×6 R×6 T×6 L×4 S×4 U×4
- 2: D×4 G×3
- 3: B×2 C×2 M×2 P×2
- 4: F×2 H×2 V×2 W×2 Y×2
- 5: K
- 8: J X
- 10: Q Z

Scrabble dictionary txt: https://norvig.com/ngrams/enable1.txt

### Strategy

100 factorial is roughly 9.33e157, so brute force string checking will not be efficient. We'll have to randomly generate tile strings from the available tilesets and check them. 

0. Make scoring dict and scoring function for a given string input
    - Use sets to make sure the same word is not scored twice
    - Have two scoring mechanisms to handle if wildcard is present/absent in ngram
    - Wildcards are locked in once they are used as a specific letter
1. Generate random tile string
2. Split tile string into ngram combinations and filter ngram set by words in the Scrabble dictionary
    - Longest possible Scrabble board word is 15 letters
    - Longest word in the Scrabble word list is 28 letters
3. Score remaining set of valid words and get total score
4. Repeat 1-3 for random tile strings

In [99]:
import numpy as np
import pandas as pd
import itertools
import random
import re
from urllib.request import urlopen

### 0. Make scoring dict, scoring function and download Scrabble wordlist

In [None]:
# Load scrabble wordlist
WORDLIST_FILE = 'words.txt'
with open(WORDLIST_FILE) as f:
    WORDLIST = set(f.read().split('\n'))
    
sorted(list(WORDLIST))[:5]

In [9]:
# Make tile string
# @ and # represent two different wildcards
TILES = ('e'*12) + ('ai'*9) + ('o'*8) + ('nrt'*6) + ('lsud'*4) + ('g'*3) + ('bcmpfhvwy'*2) + 'kjxqz@#'

print(len(TILES))
TILES

100


'eeeeeeeeeeeeaiaiaiaiaiaiaiaiaioooooooonrtnrtnrtnrtnrtnrtlsudlsudlsudlsudgggbcmpfhvwybcmpfhvwykjxqz??'

In [10]:
# Make scoring dict for each letter
letter_scores = {"a": 1 , "b": 3 , "c": 3 , "d": 2 , "e": 1 , 
                 "f": 4 , "g": 2 , "h": 4 , "i": 1 , "j": 8 , 
                 "k": 5 , "l": 1 , "m": 3 , "n": 1 , "o": 1 , 
                 "p": 3 , "q": 10, "r": 1 , "s": 1 , "t": 1 ,
                 "u": 1 , "v": 4 , "w": 4 , "x": 8 , "y": 4 , 
                 "z": 10}

In [117]:
# Make scoring function
def score_word(input_word, scoring_dict=letter_scores):
    return sum([scoring_dict[x] for x in input_word])

# Make wrapper scoring function to deal with wildcards
def get_word_score(input_word, scoring_dict=letter_scores):
    ''''
    OLD STRATEGY
    If a word contains a wildcard:
        1. Make list of all possible words
        2. Determine the highest-scoring possible word (and thus best replacement for wildcard)
        3. Replace all other words in the ngram set with the same wildcard 
        
    NEW STRATEGY
    Handle words with wildcards separately.
        1. Make ngrams as usual, but leave in wildcard ngrams when filtering out non-wordlist ngrams
        2. Score non-wildcard ngrams as planned.
        3. For each wildcard, substitute in each letter of the alphabet and see which combination gives the highest score
        4. Select appropriate lockdown values for each wildcard and calculate final score.
    '''
    if '@' in input_word:
        r = re.compile('^' + re.sub('@', '[a-z]', input_word) + '$')
        possible_words = list(filter(r.match, WORDLIST))
        print(possible_words)
        possible_scores = [score_word(x) for x in possible_words]
        print(possible_scores)
    elif '#' in input_word:
        pass
    else:
        return score_word(input_word)


# "test" should have a score of 4
get_word_score('te@t')

['tent', 'teat', 'text', 'test']
[4, 4, 11, 4]


In [81]:
# What is the longest word in the wordlist?
max(list(map(len, WORDLIST)))

28

### 1. Make random tilestring generator

In [87]:
def get_tilestring(tiles):
    '''Samples without replacement from tiles, then joins output list back into a string and returns result.'''
    return ''.join(random.sample(tiles, k=len(tiles)))

get_tilestring(TILES)

'ayiigiearnsdoafibitboautonafrleuvuoongehnalrzcrlkeetqdspr?yhi?easwpnoaocemmarvexeudnlwotteejeigtisid'

### 2. Split random tilestring into n-gram set and filter by Scrabble dictionary

In [97]:
# Function to find all possible n-grams of each length
# Adapted from http://www.locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/
def get_ngrams(input_word, n, min_length=2, wordlist=WORDLIST):
    result = set()
    for i in range(min_length, n):
        tuple_list = list(zip(*[input_word[x:] for x in range(i)]))
        ngram_list = [''.join(x) for x in tuple_list]
        result.update(ngram_list)
        
    return wordlist.intersection(result)

get_ngrams('theater', 15)

{'at', 'ate', 'eat', 'eater', 'er', 'he', 'heat', 'heater', 'the', 'theater'}

### 3. Score remaining set of valid words and get total score

In [96]:
def get_score(input_set):
    return sum(list(map(score_word, input_set)))

get_score(get_ngrams('theater', 15))

52

### 4. Repeat 1-3 on multiple random tilestrings