# F# Advent Calendar 2022

This is my contrinution to the F# Advent Calendar 2022, which is a relatively short post on using F# to help solve Wordle.
For those who've been living under a rock this year, Wordle is a word guessing game created by Josh Wardle, and now owned by the NY Times.

The game involves iteratively attempting to guess a secret word, where each guess is 'scored' against this secret word. If the guess word's letter is in the correct position in the secret word, then that letter is scored as Green. Iff it's in the word but not in the correct position then Yellow, and grey if it's not in contained in the word.  

You continue to guess, until the game is won - where each letter is scored Green - or you run out out attempts.  You only get six attempts to guess the word.

The first thing to do, is some basic analysis on five letter words, to see if there are any patterns.  

I found a dictionary of words on the internet, so will use this in order to create a collection of five letter words.  For those of you familiar with wordle, you'll know that the way Josh structured the game, is that that the worlde is selected from a set of 2315 words, but each word you guess is checked to ensure that it exists in a much larger population of words (approx 11,000).  This may have changed somewhat since the NY Times bought wordle from Josh, and also since they now use a dedicated word setter, rather than using Josh's original list. You can scrate these lists from the JavaScript source code, however in order to not spoil things, I will use my internet downloaded set of words. The stats don't change much between the wordle set of words and the internet set of words.

### Part 1 - Analysing five letter words
First of all some useful functions in order to load the letters from the file.

In [2]:
open System.IO

let rec listToString l =
    match l with
    | [] -> ""
    | head :: tail -> head.ToString() + listToString tail

let sortString s =
    s |> Seq.sort |> Seq.toList |> listToString

let fiveLetterWords fileName = 
    fileName 
    |> File.ReadAllLines 
    |> Seq.filter (fun w -> w.Length = 5)

In [44]:
let words = fiveLetterWords @"../words.txt"

In [4]:
type Frequency = {Letter: char; Frequency: int}

let frequencies = 
    words 
    |> Seq.collect id 
    |> Seq.countBy id 
    |> Seq.sortByDescending snd

let frequenciesMap =
    frequencies
    |> Map.ofSeq

frequencies |> Seq.take 10 |> Seq.map (fun (l, f) -> {Letter =  l; Frequency = f})


index,Letter,Frequency
0,s,4331
1,e,4303
2,a,3665
3,r,2733
4,o,2712
5,i,2428
6,l,2293
7,t,2154
8,n,1867
9,d,1611


Unsurprisingly, the letter e features near the top of the list, although I didn't expect the letter s to be so frequent.

Let's now look at individual letter frequencies in each position, i.e. what's the most common first, last letter etc.


In [52]:
let letterPositionFrequency (words: string seq) = 
    words
    |> Seq.map (fun w -> w.ToCharArray() |> Array.indexed)
    |> Seq.collect id
    |> Seq.groupBy fst
    |> Seq.map (fun (p, cs) -> p, cs |> Seq.countBy snd |> Seq.sortByDescending snd)

let mostFrequentLetterPosition words = 
    words 
    |> letterPositionFrequency
    |> Seq.map (fun (p, lfs) -> p, lfs |> Seq.take 1)

words |> mostFrequentLetterPosition

index,Item1,Item2
0,0,"[ ( s, 1023 ) ]"
1,1,"[ ( a, 1408 ) ]"
2,2,"[ ( a, 823 ) ]"
3,3,"[ ( e, 1656 ) ]"
4,4,"[ ( s, 2585 ) ]"


You can see that the post frequent letters in each position are: S, A, A, E, S, which isn't that surprising.
Although this is using a far larger set of 5 letter words that the one used in wordle.

So let's try this with the actual wordle dataset (this was take last April, so may well have changed now)

In [50]:
let wordleWords = File.ReadAllLines @"..\Python\wordle_words.txt"
wordleWords |> Seq.length

In [53]:
wordleWords |> mostFrequentLetterPosition

index,Item1,Item2
0,0,"[ ( s, 366 ) ]"
1,1,"[ ( a, 304 ) ]"
2,2,"[ ( a, 307 ) ]"
3,3,"[ ( e, 318 ) ]"
4,4,"[ ( e, 424 ) ]"



Next we want to sore each word, by the frequency of individual letters, to see if there is a word that contains the most frequent letters.

The five most frequent letters in frequency order are: 'a', 'e', 's', 'r', 'o'.  

Let's try to find a word that uses these letters.

One way of doing this, is to create an aggregate score for each word, based on the frequency score of each individual letter.

You really don't want to choose an inintial guess word that contains the same letter, as this doesn't reduce your search space, so I've penalised words that have the same letter.

In [5]:
type WordScore = {Word: string; Score: int}

let letterScores =
    words
    |> Seq.map (fun word ->
        let multiplier = 
            word 
            |> Seq.groupBy id 
            |> Seq.fold (fun state (f, s) ->  state * Seq.length s) 1
        let freq = 
            word 
            |> Seq.fold (fun state letter -> state + Map.find letter frequenciesMap) 0
        word, freq / multiplier)
    |> Seq.sortByDescending snd

letterScores |> Seq.take 10 |> Seq.map (fun (l, f) -> {Word = l; Score = f})

index,Word,Score
0,arose,17744
1,arise,17460
2,raise,17460
3,serai,17460
4,arles,17325
5,earls,17325
6,lares,17325
7,laser,17325
8,lears,17325
9,rales,17325


Above, I've listed the top 10 words, i.e. words that use the most frequent letters, where the letters are all distinct.

So in this case, I've chosen my first word that I will use in wordle, as "AROSE".  Many people chose "AROSE" as a starting word, probably undertaking the same (basic) analysis.
It's been proven since, that "AROSE" isn't the best starting word, but since I've used it since February, I've decided to stick with it for now

Now there is a choice, if you play in 'hard mode', which means that you have to reuse letters scored in the previous round, then there is little point finding a generic second word.

However, if you are not using 'hard mode', then it may well be worthwhile finding a second word.  

My thinking here, is that I should choose the word that scores highly, but doesn't use the same letters are AROSE.

In [9]:
let aroseSet = "arose" |> Set.ofSeq

let secondWord = 
    letterScores
    |> Seq.find (fun (word, _) ->
        Set.intersect aroseSet (word |> Set.ofSeq)
        |> Set.count |> (fun count -> count = 0))    
        
secondWord |> (fun (l, f) -> {Word = l; Score = f})

Word,Score
unlit,10300


Therefore my next word will be "UNLIT".

Using "AROSE" and "UNLIT" while in 'easy mode' did well for me.  After about a 100 or so games, I have a Mode of 3 (see stats page), although I didn't have any luck guessing below 3, obviously.

But the What's app group that I play worlde with, decided to move to use 'Hard mode', so I could not longer use "UNLIT" as my second word, as I have to choose a second word which contains letters from "AROSE" in the right position, as given by the scoring algorithm.

### Scoring

The next step in our investigations is to choose a strategy and run a simulation on all 2305 wordle guess words, to see how that strategy fairs each day.

For those not familiar with wordle, wordle has a fixed list of 2305 (earlier in the year when I tried) of words that will be the daily secret word.  Each day a specific word is chose from the list and everyone in the world tries the guess the same word.

However, what we can do, is to take the word list from wordle (you can scrape it form the javascript source, or just search the internet for one) and run our simulation to see whether we are able solve the wordle each day, taking note of the average score for the whole 2305 set.

The first thing we need is to, is to create a scorer function, i.e. a function that given a guess and the wordle, will score each letter, grey, green or yellow.

In [13]:
type AnswerMask = Green | Yellow | Grey

module Counter =
    let createCounter items =
        items
        |> List.filter (fun (a, g) -> a <> g)
        |> List.map fst
        |> List.countBy id
        |> Map.ofList

    let countOf counter item =
        match Map.tryFind item counter with
        | Some c -> c
        | None -> 0

    let updateCount counter item =
        match Map.tryFind item counter with
        | Some c -> Map.add item (c - 1) counter
        | None -> counter

let scoreGuess actual guess =

    let letters = Seq.zip actual guess |> Seq.toList

    let folder ((count, mask): Map<'a,int> * AnswerMask list) (a, g) =
        if a = g then
            count, Green :: mask
        elif Seq.contains g actual && Counter.countOf count g > 0 then
            Counter.updateCount count g, Yellow :: mask
        else
            count, Grey :: mask

    List.fold folder (Counter.createCounter letters, []) letters |> snd |> List.rev

Now scoring, the guess 'arose' against the wordle 'favour', we see that the answer mask returned is Yellow, Yellow, Yellow, Grey, Grey.

In [28]:
type ScoreResult = {Wordle: String; Guess: string; Result:AnswerMask list}

let getScoreResult (wordle:string) (guess: string) = 
     wordle.ToUpper(), guess.ToUpper(), scoreGuess wordle guess

let tests = 
    [
        "favor", "arose"
        "favor", "ratio"
        "favor", "carol"
        "favor", "vapor"
        "arose", "speed"
        "treat", "speed"
    ]


tests 
|> List.map (fun (wordle, guess) -> getScoreResult wordle guess)
|> List.map (fun (w, g, r) -> {Wordle = w ; Guess = g; Result = r})

index,Wordle,Guess,Result
0,FAVOR,AROSE,"[ Yellow, Yellow, Yellow, Grey, Grey ]"
1,FAVOR,RATIO,"[ Yellow, Green, Grey, Grey, Yellow ]"
2,FAVOR,CAROL,"[ Grey, Green, Yellow, Green, Grey ]"
3,FAVOR,VAPOR,"[ Yellow, Green, Grey, Green, Green ]"
4,AROSE,SPEED,"[ Yellow, Grey, Yellow, Grey, Grey ]"
5,TREAT,SPEED,"[ Grey, Grey, Green, Grey, Grey ]"


One key point to note here, is the handling of a double Yellow letter, which many people in examples that I've seen get wrong.

If the wordle is "AROSE" and your guess is "SPEED", then you should score it as Yellow, Grey, Yellow, Grey, Grey, only the first instance of "E" in the "SPEED" being scored as a Yellow, the second instance is just a Grey.

As an example, many do the following:

In [42]:
let actual = "arose"
let guess = "speed"
let dodgyScorer (actual:string) (guess: string) = 
    let letters = Seq.zip actual guess |> Seq.toList
    let rec masker ls mask =
        match ls with
        | [] -> mask
        | (a, g) :: t when a = g -> masker t (Green :: mask)
        | (a, g) :: t when actual.Contains(g |> string) -> masker t (Yellow :: mask)
        | h :: t -> masker t (Grey :: mask)
    actual, guess, masker letters [] |> List.rev

dodgyScorer actual guess |> fun (w, g, r) -> {Wordle = actual ; Guess = guess; Result = r}

Wordle,Guess,Result
arose,speed,"[ Yellow, Grey, Yellow, Yellow, Grey ]"


In [43]:
tests 
|> List.map (fun (wordle, guess) -> dodgyScorer wordle guess)
|> List.map (fun (w, g, r) -> {Wordle = w ; Guess = g; Result = r})

index,Wordle,Guess,Result
0,favor,arose,"[ Yellow, Yellow, Yellow, Grey, Grey ]"
1,favor,ratio,"[ Yellow, Green, Grey, Grey, Yellow ]"
2,favor,carol,"[ Grey, Green, Yellow, Green, Grey ]"
3,favor,vapor,"[ Yellow, Green, Grey, Green, Green ]"
4,arose,speed,"[ Yellow, Grey, Yellow, Yellow, Grey ]"
5,treat,speed,"[ Grey, Grey, Green, Yellow, Grey ]"
