# Fun word patterns exhibited by Wordle *grid shares*.

[Wordle](https://www.powerlanguage.co.uk/wordle/) is beautifully designed so that you can share your enjoyment of the game with friends, without spoiling their enjoyment of the game.
How is that done? In Wordle, people can easily post their result grids to social media using the "share" option offered by the game. These posts include the feedback for each guess. But
[as noted by the inventor Josh Wardle himself](https://twitter.com/powerlanguish/status/1471493886031773707),
to avoid spoilers, the shares only contain the colors indicating whether the letters in the guess were in the target word, or also in the right spot.
They look like this (using the "colorblind" setting, which I dearly appreciate):

```
Wordle 218 4/6

⬜⬜⬜🟦⬜
🟧⬜⬜⬜⬜
⬜⬜🟦🟦⬜
🟧🟧🟧🟧🟧
```


But what do those shares tell you about interesting patterns in words? And could you ever figure out the solution to a Wordle just by seeing a share?

It turns out that occasionally, a share indirectly points out the existence of a limited set of other words similar to the day's answer. That can give you hints, if you are the type to dig in and play with such word patterns. You can call it cheating, or you can call it extending the game to suit your tastes. I figure that's what games are often all about.

Here's an example. About one day in five sports an answer which is the same as some other word, when two letters are swapped. For example, one share from January 16 looks like this:

```
Wordle 211 4/6

🟩⬜🟨🟨⬜
🟩🟨🟨⬜⬜
🟩🟩🟨🟩🟨
🟩🟩🟩🟩🟩
```

The second-to-last row indicates that the answer just requires swapping the last letter with the third letter.
That means we can look for all 5-letter words which are the same except for the swapping of those letters. We'll find that there are about 236 such words, for example "*bares*" and "*baser*". That narrows the search considerably from the
[12,972 words that are legal guesses in Wordle identified by tglaiel](https://medium.com/@tglaiel/the-mathematically-optimal-first-guess-in-wordle-cbcb03c19b0a).

"*Hmmm*", you might think. "*What is a minimal spoiler post and the underlying solution?*"
It turns out that the shortest post that uniquely identifies a solution is just two rows long!

In particular, my analysis suggests that there are only two words for which two rows of such "swap" clues would uniquely identify the word. For now, I'll leave them as an exercise for the reader. Note that only one of them shows up in the list of 2315 pre-defined daily answers (which are also identified in [tglaiel's post above](https://medium.com/@tglaiel/the-mathematically-optimal-first-guess-in-wordle-cbcb03c19b0a)). It turns out to be a delightfully suitable word! There may well be other 2-row solutions using different kinds of hints.

I don't expect that these unusual "spoilers" will really cause anyone any heartache. But perhaps they'll lead you to other ways to play with the game.

For the Python code behind this analysis (and actual spoilers), see [wordle_spoilers](githublink).

----

In general, the approach is:
* treat each posted row as a clue restricting the set of possible answers
* find intersection of sets for all the clues you've uncovered

Clue types:
* 3 green, 2 yellow: a "swap": the only words in the set have a mirror which is another valid word where the yellow positions are reversed.
 For example, only 138 5-letter words have a mirror with positions 1 and 4 swapped (where first position is 0). One of the (1,4) pairs is *actin* and *antic*.
* 2 green, 3 yellow: there's another word where yellows are rotated circularly once or twice.
* 1 green, 4 yellow: circularly rotate up to 3 times, or swap pairs in all 4-choose-2 = 6 ways 
* 5 yellow: circularly rotate up to 4 times, or for each of the 5-choose-2 partitions, swap and rotate one or 2 times
* TODO: *There are probably other interesting clue types - work them out....*
* 1 yellow: Yxxxx there is another word which has a letter in the given position which is also in the answer, elsewhere. This is probably only of marginal utility.

TODO:
* Work out code for all the various clue types. Only "swap" clues are covered so far.
* Find intersection set for arbitrary set of clues.
* Get a twitter feed with all Wordle clues, and run that algorithm continuously to see if we ever have enough usefule clues for a spoiler in real life. Note: look for the colorblind score posts (red and blue) also.

Note an example in-real-life partial spoiler clue, for day 212, where the last two letters are swapped (3,4): https://twitter.com/lamorgan7/status/1485805254092410880

Finally, what is a minimal post and the underlying solution?

It turns out that if you notice each of these rows "YRYRR" "RYRRY" for any given day:
```
🟨🟩🟨🟩🟩
🟩🟨🟩🟩🟨
```

that is a spoiler - it is enough all by itself to uniquely identify the answer, as demonstrated later on below. And the answer itself is pretty cool.

In [1]:
import itertools

Read in word lists from the included copy of a January-2022 version of the official code.

In [2]:
with open("main.e65ce0a5.js") as fn:
    words_js = fn.read()

Pull out words used as answers (from the soution set, in order. Avert your eyes!...)

In [3]:
start = words_js.index("var La=")

In [4]:
start

28458

In [5]:
end = words_js[start:].index(']')

In [6]:
end

18527

In [7]:
answer_list = eval(words_js[start+7:start+end+1])

In [8]:
answers = set(answer_list)

In [9]:
assert len(answers) == 2315

Pull out all other legal word guesses

In [10]:
startl = words_js.index(",Ta=")

In [11]:
endl = words_js[startl:].index(']')

In [12]:
startl

46986

In [13]:
endl

85260

In [14]:
seq = words_js[startl+4:startl+endl+1]

In [15]:
seq[:10]

'["aahed","'

In [16]:
seq[-10:]

'","zymic"]'

Combine answer words with the rest to get the full set of legal words.

In [17]:
words = set(eval(seq)) | answers

In [18]:
assert len(words) == 12972

## Define clue sets for swaps

Start with the most valuable clues: three greens and 2 yellows, and see how many words remain when each possible pair of letters is swapped.

In [19]:
# From https://stackoverflow.com/users/986059/robotbugs at https://stackoverflow.com/a/57517842/507544
def swap_indices(item, swap):
    """Swap the elements indexed by the swap tuple in the item (a string or tuple)

    >>> swap_indices("12345", (0,1))
    '21345'
    """

    s0 = min(swap)
    s1 = max(swap)
    if isinstance(item,str):
        return item[:s0]+item[s1]+item[s0+1:s1]+item[s0]+item[s1+1:]
    elif isinstance(item,tuple):
        return item[:s0]+(item[s1],)+item[s0+1:s1]+(item[s0],)+item[s1+1:]
    else:
        raise ValueError("Type not supported")

Make `swaplists` dictionary. For every pair of distinct letters that can be swapped (`swaps`), collect for word pairs that swap them.

In [20]:
swaps = list(itertools.combinations(range(5), 2))

In [21]:
swaps[0]

(0, 1)

In [22]:
swaplists = {}
for swap in swaps:
  swaplists[swap] = set()
  for word in words:
    swapped = swap_indices(word, swap)
    # Skip these: swapping duplicated letters changes nothing
    if word == swapped:
        continue
    if swapped in words:
        swaplists[swap].add(word)

In [23]:
len(swaplists[(1,4)])

138

In [24]:
swaplists[(1,4)]

{'actin',
 'algor',
 'altar',
 'alter',
 'alvar',
 'antic',
 'argol',
 'artal',
 'artel',
 'artis',
 'arval',
 'astir',
 'atony',
 'ayont',
 'bacco',
 'bocca',
 'caste',
 'cento',
 'cesta',
 'chaco',
 'choco',
 'coach',
 'conte',
 'cooch',
 'doobs',
 'dooms',
 'dsobo',
 'dsomo',
 'flair',
 'frail',
 'gamme',
 'gemma',
 'gesso',
 'glair',
 'gluer',
 'gosse',
 'grail',
 'gruel',
 'hallo',
 'holla',
 'idles',
 'isled',
 'lacey',
 'lassy',
 'lycea',
 'lyssa',
 'macho',
 'manse',
 'matzo',
 'mensa',
 'mento',
 'mesto',
 'mocha',
 'monte',
 'moste',
 'motza',
 'oasis',
 'orcas',
 'oscar',
 'ossia',
 'panne',
 'panni',
 'penna',
 'pinna',
 'range',
 'renga',
 'salle',
 'salve',
 'sasse',
 'scrae',
 'sella',
 'selva',
 'sengi',
 'serac',
 'sessa',
 'singe',
 'skart',
 'skean',
 'skeet',
 'skelp',
 'skint',
 'skirt',
 'skort',
 'slain',
 'sleep',
 'sleet',
 'sloop',
 'sloot',
 'snail',
 'sneak',
 'sneap',
 'snoop',
 'snout',
 'spean',
 'speel',
 'spelk',
 'spirt',
 'spool',
 'spoon',
 'spoot',


How big is each `swaplist`?

In [25]:
[(k, len(v)) for k, v in swaplists.items()]

[((0, 1), 252),
 ((0, 2), 662),
 ((0, 3), 378),
 ((0, 4), 254),
 ((1, 2), 458),
 ((1, 3), 336),
 ((1, 4), 138),
 ((2, 3), 476),
 ((2, 4), 236),
 ((3, 4), 448)]

## Look at combinations of all possible pairs of swaps
Find the intersection of possible words for each pair of `swap`s

In [26]:
cross_indices = list(itertools.combinations(swaps, 2))

In [27]:
cross_indices[0]

((0, 1), (0, 2))

In [28]:
def inter(s1, s2):
    "Return intersection of sets for swaps s1 and s2 and other info"

    intersection = swaplists[s1] & swaplists[s2]
    return ((s1, s2), len(intersection), intersection & answers, intersection)

List all pairs of swaps, with set of words in intersection of membership of each, along with indexes, length of the intersection, and membership of the intersection in the answer set

In [29]:
unsorted = [(indices, inter(*indices)) for indices in cross_indices]

An example.

(We repeat the pair of swaps because when we make a dictionary from this list, we need the key for the dictionary to be the first element of the tuple.)

In [30]:
unsorted[1]

(((0, 1), (0, 3)),
 (((0, 1), (0, 3)),
  8,
  set(),
  {'agger', 'asker', 'doors', 'ewest', 'gaita', 'leans', 'octan', 'rails'}))

Collect all the intersections in a dictionary for lookup later.

In [31]:
crosses = dict(sorted(unsorted, key=lambda x:x[1][1]))

In [32]:
len(crosses)

45

Find those which are unambiguous (length 1) for the intersection when considering all words, and the intersection considering only answers in the actual game.

In [33]:
uniques = [cross for cross in crosses.values() if len(cross[3]) == 1]

There are only two, only one of which (the first) is also a pre-defined answer:

In [34]:
uniques

[(((0, 2), (1, 4)), 1, {'grail'}, {'grail'}),
 (((1, 4), (2, 4)), 1, set(), {'testa'})]

**So if you notice people sharing feedback like "YRYRR" as well as "RYRRY", you know the answer is "grail"! Pretty cool.**

## Check out another example

In [35]:
cross = crosses[((0, 1), (0, 4))]

In [36]:
cross

(((0, 1), (0, 4)),
 14,
 {'later', 'lever'},
 {'dault',
  'elans',
  'etats',
  'lames',
  'later',
  'lever',
  'natis',
  'raked',
  'ramet',
  'rases',
  'ratel',
  'roles',
  'saker',
  'talas'})

In [37]:
# Validate a sample result
print(f'Intersection between sets for {cross[0]}: {swaplists[cross[0][0]] & swaplists[cross[0][1]]}')
for word in cross[2]:
    for swap in cross[0]:
        swapped = swap_indices(word, swap)
        print(f'{word=}, {swap=}, {swapped=}, {word in answers}, {swapped in words}')

Intersection between sets for ((0, 1), (0, 4)): {'ratel', 'rases', 'saker', 'natis', 'roles', 'elans', 'etats', 'raked', 'dault', 'later', 'lames', 'ramet', 'lever', 'talas'}
word='lever', swap=(0, 1), swapped='elver', True, True
word='lever', swap=(0, 4), swapped='revel', True, True
word='later', swap=(0, 1), swapped='alter', True, True
word='later', swap=(0, 4), swapped='ratel', True, True


## Notice the word "*salve*" here in two different pairs of *nearly*-unique rows:

In [38]:
[cross[1] for cross in unsorted if 1 <= len(cross[1][3]) <= 5]

[(((0, 1), (1, 3)),
  5,
  {'baler'},
  {'baler', 'koras', 'pater', 'tsars', 'veils'}),
 (((0, 2), (1, 4)), 1, {'grail'}, {'grail'}),
 (((0, 3), (1, 3)), 4, set(), {'deers', 'feers', 'krabs', 'plaas'}),
 (((0, 3), (1, 4)), 3, {'salve'}, {'dooms', 'orcas', 'salve'}),
 (((0, 3), (2, 4)),
  5,
  {'inlet'},
  {'inlet', 'lares', 'lores', 'poohs', 'teals'}),
 (((1, 2), (1, 4)), 2, {'salve', 'stool'}, {'salve', 'stool'}),
 (((1, 4), (2, 4)), 1, set(), {'testa'}),
 (((1, 4), (3, 4)), 4, {'steel'}, {'manse', 'steel', 'tenno', 'titre'})]

In [39]:
assert 'salve' in swaplists[(1,4)]

## Prepare to search Twitter for spoilers

Identify recent answers for which good clues are available

In [40]:
len(answers)

2315

Construct the union of all `swaplists`. Every word in any one of them has a good clue.

In [41]:
any_swap = set()
for swaplist in swaplists.values():
    any_swap |= swaplist

In [42]:
len(any_swap)

2993

What percent of words have a swap?

In [43]:
f'{2993/12972:.0%}'

'23%'

What percent of hand-picked answers have a swap?

In [44]:
len(any_swap & answers)

427

In [45]:
f'{427/2313:.0%}'

'18%'

Look at recent days

In [46]:
# 2022-01-27
today = 222

In [47]:
# earliest we can pretty easily get from Twittere API
today - 30

192

In [48]:
near = answer_list[today-30:today+70]

In [49]:
target = any_swap & set(near)

In [50]:
len(target)

15

In [51]:
sorted([answer_list.index(t) for t in target])[:15]

[196, 197, 202, 211, 212, 221, 227, 228, 231, 257, 264, 268, 272, 289, 290]

In [52]:
sorted([today-answer_list.index(t) for t in target])

[-68, -67, -50, -46, -42, -35, -9, -6, -5, 1, 10, 11, 20, 25, 26]

## Suppose we were on day 211 and wanted to look for and use this sort of hint.
Note that we can play old Wordles online via [Wordle Archive](https://www.devangthakkar.com/wordle_archive/?211)

In [53]:
day = 211

In [54]:
word = answer_list[day]

In [55]:
dayswaps = [swap for swap,val in swaplists.items() if word in val]

In [56]:
len(dayswaps)

1

The hint would be a swap of (2, 4) (🟩🟩🟨🟩🟨), narrowing the search to one of 236 words:

In [57]:
dayswaps

[(2, 4)]

In [58]:
cluelist = swaplists[dayswaps[0]]

In [59]:
len(cluelist)

236

At this point we have a simplified game, but it still might take a lot of guesses to narrow the answers. Note that combining this sort of up-front "spoiler" clue with a more clever human strategy, or of course one of the other Wordle code bases out there that picks an optimal set of guesses, might well do better. For now we'll just illustrate one way to use the information already assembled here.

So here's one possible sequence, illustrating my choices the first time I used this technique.

In [60]:
sorted(list(cluelist))[0]

'abmho'

If we play that, we get the response 🟨⬜⬜⬜🟨 and learn that 'a' and 'o' are in the word but not in those positions. So what's left? 20 posssibilities:

In [61]:
r0 = [l for l in cluelist if 'a' in l[1:] and 'o' in l[:4]]

In [62]:
len(r0)

20

A possible next guess:

In [63]:
sorted(r0)[2]

'coarb'

If we try that, we get the response ⬜🟩🟨🟨⬜. We see that only 8 possibilities remain.

In [64]:
r1 = [l for l in cluelist if l[1]=='o' and 'a' in l[3:] and ('r' == l[0] or 'r' == l[2] or 'r' == l[4])]

In [65]:
len(r1)

8

In [66]:
sorted(r1)[6]

'solar'

And, lucky me, that was the answer!

## TODO

Next: pull some actual clues from twitter, decode them, figure out what kind of clue they are, if they're a swap, pull the indicies of the letters that are swapped, and automate the narrowing of the possible set of answers.

In [67]:
# fhere's a python snippet for how to find the positions of all the yellows in a row of grid feedback
# indices = [i for i, x in enumerate(my_list) if x == "yellow"]