## Problem 0: Wordle Solution (Plus code introducing WordNet!)

### Introduction

First learn about the new word game **Wordle** that is taking the internet by storm.

Read about the phenomenon [here.](https://ktla.com/morning-news/technology/what-is-wordle-game-everyone-playing-explained-tips-to-win/)

Play a game [here.](https://www.powerlanguage.co.uk/wordle/)

Here's a quick summary of the idea.

   1. The game consists of one puzzle a day where you have six chances to guess a five-letter word. Let's call that the target. Each of your guesses must be an English word. Sounds pretty hard so far. There are lots of 5-ltter words!

   2. But here's the thing.  After you type in your guess, correct letters are highlighted: green means a letter is in the right spot, yellow means the letter is in the target, but it’s not in the right spot. Remaining means the letter does not occur in the target.
   
So as you work through your six guesses, you acquire information
about the target.  Say your guess produces a green *n* in the second
position. You know the target has an *n* in the second position (so, based on what we know about English spelling, it likely has a vowel or an *s* in the first position).  Say you also have a yellow *r* in the fifth position; then you know there's an *r* somewhere in the word, but not in fifth position, and not in first position, because English words can't start with *rn*, and not in second position, because that's filled.  So in fact you know the *r* is in third or fourth position.  Suppose the other three letters in your guess turned black.  You file away the information that none of these letters should show up in your future guesses. And so on, combining simple logic with knowledge of facts 
about English.  Fun game.



### Problem statement

Write a function **color_guess** that takes a Wordle target and a Wordle guess as inputs (so both are 5-letter words), and returns the **coloring** for the guess.

A coloring is a sequence of colors (represented as the characters 'g', 'y' or 'k' [for black]) that contains the correct colors for the letters of the guess, where correct means correct according to the coloring rules of Wordle. 

The first question you should answer is what the data type of a coloring is.

The definition of the function should be just a few lines of code.

### Solution

In [None]:
def color_guess (target, guess):
    coloring = list('k'*5)
    for i in range(5):
        if target[i] == guess[i]:
            color[i] = 'g'
        elif guess[i] in target:
            color[i] = 'y'
    return coloring

### Extra credit

For extra credit use an English word list to return the set of words compatible with the coloring and the guess. For example you can iterate through
`nltk.corpus.words.words()`.

```
from nltk.corpus import words

def find_compatible_words (guess, coloring):
    result = []
    for wd in words.words():
       <do some cool stuff>
    return result
```

Needless to say you should just return 5-letter words and don't forget to use negative information (for most colorings, you know some letters that can't be in the target).

In the solution below, we use a more complkete lexical resource **WordNet**, instead of
`nltk.corpus.words`.

### Solution

In [1]:
from nltk.corpus import wordnet as wn
# FRor a more natural list of wo
from string import ascii_lowercase,digits
digits = set(digits)

def get_active_words_wn (lang='eng',length=5):
    """
    This accesses the multilingual wordnet resource for an english wordlist
    """
    return {ln for w in wn.all_synsets() for ln in w.lemma_names(lang=lang) 
                 if len(ln) == length and  '_' not in ln and ln.lower() == ln 
                 and digits.intersection(ln) == set()}

Wordnet is a large multilingual database pairinhg words and meanings.

Wordnet implements two key ideas.  The **senses** (or meanings) of a word are language
independent concepts represented in a **very large** concept graph,
'
What we want for Wordle purposes are **lemmas** (pairings of a sense and a spelling),
which are language particular, and of course we want English lemmas.



In [13]:
wn.synsets('dog')[0].lemmas(lang='eng')

[Lemma('dog.n.01.dog'),
 Lemma('dog.n.01.domestic_dog'),
 Lemma('dog.n.01.Canis_familiaris')]

Below, just for fun, we implement a more general function than we need, `get_active_words` which
collects all five letter words in a given language (if it's in WordNet!).

For example, let's collect all 5-letter French words.

In [19]:
f_wds = get_active_words_wn (lang='fra')


Since `f_wds` is a set, we can't just look at the first 20 elements, so instead we look at a random sample.

```
from random import choice,sample
from string import ascii_lowercase


>>> sample(f_wds,20)
['reine',
 'mikvé',
 'geste',
 'orgue',
 'luire',
 'anode',
 'éluer',
 'jaune',
 'kobus',
 'hindi',
 'dingo',
 'osier',
 'lupin',
 'gruau',
 'ajuga',
 'rumen',
 'prise',
 'unix™',
 'axial',
 'gecko']
```

If you know French the set contains some pretty oddball words.

Notice that "Unix" with the trademark symbol counts as a 5-letter word.  Just one of many
surprises you will experience once you start working with Unicode.

In [31]:
from random import choice,sample
from string import ascii_lowercase
sample(f_wds,20)

['reine',
 'mikvé',
 'geste',
 'orgue',
 'luire',
 'anode',
 'éluer',
 'jaune',
 'kobus',
 'hindi',
 'dingo',
 'osier',
 'lupin',
 'gruau',
 'ajuga',
 'rumen',
 'prise',
 'unix™',
 'axial',
 'gecko']

Here's code for implementing our solution, discussed below.

In [29]:
use_wordnet = True


if use_wordnet:
    from nltk.corpus import wordnet as wn
    # For a more natural list of words than nltk.words()
    from string import ascii_lowercase,digits
    digits = set(digits)

    def get_active_words_wn ():
        return {ln for w in wn.all_synsets() for ln in w.lemma_names() 
                     if len(ln) == 5 and  '_' not in ln and ln.lower() == ln 
                     and digits.intersection(ln) == set()}

    active_words = get_active_words_wn()
else:
    from nltk.corpus import words
    # Use only 5-letter words; avoid capitalized words (names)
    active_words = {w for w in words.words() if len(w) == 5 and w.title() != w}
    # Missing word that turned uop in a recent NYT Wordle
    # active_words.add('caulk')


def find_compatible_words (guess, coloring,verbose=False):
    result = [wd for wd in active_words\
              if test_word(guess,coloring,wd)]
    if verbose:
        print(f'guess={guess} coloring = {coloring} {len(result):>4,} words found')
    return result

def test_word(guess, coloring, wd):
    """
    Limitation: This version does not work right
    with guesses that contain repeated letters.
    """
    for (w, g, c) in zip(wd, guess, coloring):
        if c == 'k' and g in wd:
                return False
        elif c == 'y':
            # 'y' means g is in the word but not at this position
            if w==g or g not in wd:
                return False
        elif c=='g':
            if w != g:
                return False
    return True

# A case like the one discussed above target must have n in 2nd position,
# last letter of guess [= "r"] in the word, i
L1 = find_compatible_words('tires','kkykk', verbose = True)
L2 = find_compatible_words('drone','kggkk', verbose = True)
L3 = find_compatible_words('proud','gggkk', verbose = True)
L4 = find_compatible_words('proxy','ggggg', verbose = True)
#len(L),'proxy' in L
 
# last letter of guess [= "r"] in the word, i
# A case like the one discussed above target must have n in 2nd position,
L5 = find_compatible_words('spoor','kgkky')
L5

guess=tires coloring = kkykk  205 words found
guess=drone coloring = kggkk   35 words found
guess=proud coloring = gggkk   13 words found
guess=proxy coloring = ggggg    1 words found


['apery', 'apart']

In [7]:
len(active_words)

4158

`active_words` is a set so we can't just look at the first 20 elements. 

Let's look at a random sample.

In [17]:
from random import sample
sample(active_words,20)

['gourd',
 'needs',
 'buddy',
 'class',
 'nylon',
 'drawl',
 'slain',
 'radar',
 'shirt',
 'guava',
 'imply',
 'ileum',
 'liver',
 'sherd',
 'honey',
 'combo',
 'esker',
 'haven',
 'peril',
 'crypt']

In [31]:
S0 = set(find_compatible_words('broad', 'kkkkk'))
len(S0)

701

In [32]:
S1 = set(find_compatible_words('fight', 'kykyk'))
len(S1)

64

### Finding a good first guess word.

'earth' is a good initial guess.  Here's one reason why.

In [34]:
len(find_compatible_words('earth', 'kkkkk'))

482

Striking out leaves a relatively small set of candidate words.

Find the perfect starting guess?

We want a word which, if it strikes out, is compatible with a very small set of words.

In [35]:
best_wd, wd_set_sz = 'earth', 482

for this_wd in active_words:
    fail_set = find_compatible_words(this_wd, 'kkkkk')
    this_wd_set_sz = len(fail_set)
    if this_wd_set_sz < wd_set_sz:
        best_wd,wd_set_sz = this_wd, this_wd_set_sz
        
print(best_wd,wd_set_sz)

aloes 285


### Extra credit

This is for the extra extra credit, a version with an excluded letters argument.

In [30]:
## Version II with optional excluded letters argument

def find_compatible_words2 (guess, coloring, excluded_letters = ''):
    return [wd for wd in active_words \
              if test_word2(guess,coloring,wd, excluded_letters)]


def test_word2(guess, coloring, wd, excluded_letters):
    for (w, g, c) in zip(wd, guess, coloring):
        if w in excluded_letters:
            return False
        if c == 'k' and g in wd:
                return False
        elif c == 'y':
            if w==g or g not in wd:
                return False
        elif c=='g':
            if w != g:
                return False
    return True

# A case like the one discussed above target must have n in 2nd position,
# last letter of guess [= "r"] in the word, i
#L = find_compatible_words('tires','kkykk')
#L = find_compatible_words('drone','kggkk')
#L = find_compatible_words('proud','gggkk')
#L = find_compatible_words('proxy','ggggg')
#len(L),'proxy' in L
 
# last letter of guess [= "r"] in the word, i
# A case like the one discussed above target must have n in 2nd position,
find_compatible_words('spoor','kgkky')

['apery', 'apart']

Let's check out the version that implements an `eliminated_words` argument.

The guess in the second Wordl game was quite insightful.

In [10]:
coloring0 = 'gykgk'
find_compatible_words2('pinch', coloring0, excluded_letters = "stealnh")

['pricy', 'prick']

WordNet does much more than look up possible word forms.

Here's a brief demo for using it as a dictionary.

In [16]:
def get_definitions(word_set,language = None):
    """
    Need to check that synset has at least one lemma in the given langugae.
    """
    for wd in word_set:
        print(wd)
        for (i,ss) in enumerate(wn.synsets(wd)):
            print(f'{i+1}. {ss.definition()}.',end= '  ')
            print()
        print()
        print()

get_definitions({'helix'})

helix
1. a curve that lies on the surface of a cylinder or cone and cuts the element at a constant angle.  
2. a structure consisting of something wound in a continuous series of loops.  
3. type genus of the family Helicidae.  


