# Ghost

[*Ghost*](https://en.wikipedia.org/wiki/Ghost_(game&#41;) is a word game in which (quoting Wikipedia):

> *Ghost is a written or spoken word game in which players take turns adding letters to a growing word fragment, trying not to be the one to complete a valid word. Each fragment must be the beginning of an actual word, and usually some minimum is set on the length of a word that counts, such as three or four letters. The player who completes a word loses.*

I'd like to create a program to allow any two players (human or computer) to play the game, and I'd like to figure out who wins if both players play optimally. The concepts I will need to define, and my implementation choices, are as follows:

- **Words**: I will read a standard online word list, `enable1`, and make a set of all the words of sufficient length.
- **Fragment**: a fragment is a `str` of letters, such as `'gho'`.
- **Beginning**: each word has a set of valid beginnings: for `ghost` it is `{'', g, gh, gho, ghos, ghost}`.  "Prefix" is a synonym of "beginning".
- **Vocabulary**: `Vocabulary(words)` is an object with: a set of all the `words`, and a set of all the valid `fragments` (beginnings) of the words.
- **Player**: The first player will be called player `0`; the second player `1`. 
- **Play**: A play is a new fragment formed by adding one letter to the end of the existing fragment.
- **Legal Play**: A play that is a valid prefix of some word. `enable1.legal_plays('gho') = {'ghos, 'ghou'}`.
- **Strategy**: A strategy is a function with signature `strategy(vocab, fragment) -> play`.
- **Game**: `play_game(vocab, *strategies)` plays a game between two (or more) player strategies.



# Vocabulary: Words, Fragments, Legal Plays, and `enable1`

`Vocabulary(text)` takes a collection of words as input, stores the words as a set, and also stores all the legal fragments of those words (that is, the beginnings). `legal_plays(fragments)` gives a set of all plays that can be formed by adding a letter to create a legal word fragment (where 'fragment' includes complete words). I also define the function `words` to split any string into component words.

In [1]:
class Vocabulary:
    "Holds a set of legal words and a set of legal prefix fragments of those words."
    def __init__(self, words, minlength=3):
        self.words = {word for word in words if len(word) >= minlength}
        self.fragments = {word[:i] for word in self.words for i in range(len(word) + 1)}
        
    def legal_plays(self, fragment): return {fragment + L for L in alphabet} & self.fragments        
    
alphabet = 'abcdefghijklmnopqrstuvwxyz'
    
words = str.split # Function to split a str into words

Here is a small example:

In [2]:
v = Vocabulary(words('game ghost ghoul'))

v.words, v.fragments

({'game', 'ghost', 'ghoul'},
 {'', 'g', 'ga', 'gam', 'game', 'gh', 'gho', 'ghos', 'ghost', 'ghou', 'ghoul'})

And here is a large vocabulary, from a standard online Scrabble™ word list known as `enable1`:

In [3]:
! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt

In [4]:
enable1 = Vocabulary(words(open('enable1.txt').read()))

Here I explore `enable1`:

In [5]:
len(enable1.words), len(enable1.fragments), max(enable1.words, key=len)

(172724, 387878, 'ethylenediaminetetraacetates')

In [6]:
enable1.legal_plays('gho')

{'ghos', 'ghou'}

In [7]:
enable1.legal_plays('ew')

{'ewe'}

In [8]:
enable1.legal_plays('th')

{'tha', 'the', 'thi', 'tho', 'thr', 'thu', 'thw', 'thy'}

# Players and Winners

The first player is `0` and the second player is `1`. These names are convenient because:
- During the course of the game, the player whose turn it is to play next is always the length of the current fragment mod 2.
- When the game ends, the winning player is the length of the current fragment mod 2.


In [9]:
to_play = winner = lambda fragment: len(fragment) % 2

# Who Wins?

Who wins a game if both players are rational?  Given the current fragment, the player whose turn it is will win if either:
- The fragment is a word (meaning the other player formed the word, and lost).
- The fragment is not a legal fragment (meaning the other player made something that is not the beginning of an actual word, and lost).
- At least one of the legal plays puts the opponent in a position from which they *cannot* win.

The function `win(vocab, fragment)` implements this idea, returning `True` if the current player can force a win.

In [10]:
def win(vocab, fragment=''):
    "Does the player whose turn it is have a forced win?"
    return (fragment in vocab.words or 
            fragment not in vocab.fragments or
            any(not win(vocab, play) 
                for play in vocab.legal_plays(fragment)))

Let's test `win` to gain some confidence that we got it right:

In [11]:
win(Vocabulary(words('cat camel'))) # All words have odd number of letters; first player loses

False

In [12]:
win(Vocabulary(words('cat camel goat'))) # First player plays 'g', leading to a win with 'goat'

True

In [13]:
win(Vocabulary(words('cat camel goat gar'))) # Second player can avoid 'goat' with 'ga'

False

In [14]:
win(Vocabulary(words('cat camel goat gar gannet'))) # First player plays 'gan' after 'ga' to win

True

# TL;DR: The Answer

Can the first player win with the `enable1` vocabulary?

In [15]:
win(enable1)

False

**No.** The game is a win for the second player, not the first.
This agrees with [xkcd](https://xkcd.com/)'s Randall Monroe, who [says](https://blog.xkcd.com/2007/12/31/ghost/) *"I hear if you use the Scrabble wordlist, it’s always a win for the second player."*

But ... Wikipedia says that the minimum word length can be "three or four letter." In `enable1` we included three-letter words, which is a disadvantage for the first player, who has to avoid all those three letter possibilities. What if we eliminate the three-letter words?

In [16]:
enable1_4 = Vocabulary(enable1.words, 4)

win(enable1_4)

True

**Yes.** The first player can win in this case. So here's a good meta-strategy: Say "Hey, let's play a game of Ghost. We can use the `enable1` word list. Would you like the limit to be 3 or 4 letters?" Then if your opponent says three (or four) you can say "OK, since you decided that, I'll decide to go second (or first)."

# Playing the Game: Strategies

We define a *strategy* as a function that is given a vocabulary and a fragment as arguments, and returns a legal play. Below we define `rational` (a strategy that wins whenever it is possible to do so) and `ask` (a strategy factory that returns a strategy that, when called, will ask the named person to input a fragment).

In [17]:
def rational(vocab, fragment): 
    "Select a play that makes opponent lose (if there is one), otherwise any play."
    for play in vocab.legal_plays(fragment):
        if not win(vocab, play):
            return play
    return play # Could return any play here

def ask(name='Player'):
    "Return a strategy that asks for the next letter."
    return (lambda _, fragment: 
            fragment + input("{}'s letter to add to '{}'? ".format(name, fragment)))

Here is a function to play a game:

In [18]:
from itertools import cycle

def play(vocab, *strategies):
    "Return (winner, final_fragment) for a game of Ghost between these strategies."
    fragment = ''
    for strategy in cycle(strategies):
        play = strategy(vocab, fragment)
        if play not in vocab.legal_plays(fragment):
            return (winner(fragment + '?'), play) # Player loses for making an illegal play
        elif play in vocab.words:
            return (winner(play), play)           # Player loses for making a word
        else:
            fragment = play                       # Keep playing

In [19]:
play(enable1, rational, rational) 

(1, 'ply')

In [20]:
play(enable1_4, rational, rational)

(0, 'huddle')

In [21]:
play(enable1, ask(), rational)

Player's letter to add to ''? d
Player's letter to add to 'dr'? o
Player's letter to add to 'droi'? d


(1, 'droid')

# Minimizing Possible Outcomes

Now we know how to play perfectly, if we have a computer handy to execute the strategy.
But can we summarize the strategy into a form that is small enough that a human can memorize it? I will define the function `outcomes(vocab, fragment, player)` to return a set of words that are the possible outcomes of a game, where the opponent can use any strategy whatsoever, but `player` uses a strategy that is:

- *Rational*: plays towards a forced win whenever there is one.
- *Exploitive*: otherwise tries to give the opponent an opportunity to  make a mistake that can be exploited.
- *Minimizing*: within the above constraints, returns the smallest possible set of words.

In [22]:
def outcomes(vocab, fragment, player):
    "The smallest set of outcome words, if player tries to win, and make the set small."
    if fragment in vocab.words:
        return {fragment}
    else:
        cases = [outcomes(vocab, play, player) for play in vocab.legal_plays(fragment)]
        if to_play(fragment) == player: # Player picks the top priority case
            return min(cases, key=lambda words: priority(words, player))
        else:                           # Oher player could pick anything
            return set.union(*cases)
                       
def priority(words, player):
    """Return (lossiness, number_of_words, total_number_of_letters),
    where lossiness is 0 if no losses, 1 if mixed losses/wins, 2 if all losses.
    The idea is to find the list of outcome words that minimizes this triple."""
    lossiness = (0 if all(winner(word) == player for word in words) else
                 1 if any(winner(word) == player for word in words) else
                 2)    
    return (lossiness, len(words), sum(map(len, words)))

## Minimizing Outcomes for Player 0

Let's see what minimal set of words player 0 can force the game into (with both vocabularies):

In [23]:
outcomes(enable1, '', 0)

{'qaid', 'qiviut', 'qoph', 'qursh', 'qurush', 'qwerty'}

**Interesting!** There are only 6 words; it wouldn't be hard for a human to memorize these. Then, when you are playing as player 0, pick `'q'` first, and then try to steer the game to one of the 5 words with an even number of letters. Unfortunately, one word, `'qursh'` (a monetary unit of Saudi Arabia), has an odd number of letters, which means that if the opponent replies to `'q'` with `'qu'` and to `'qur'` with `'qurs'`, then player 0 will lose. But if the opponent makes any other responses, player 0 will win.

In [24]:
outcomes(enable1_4, '', 0)

{'nays', 'nene', 'ngultrum', 'nirvanic', 'nolo', 'null', 'nyctalopia'}

**Neat!** Only 7 words, and the first player can always win by forcing the opponent to one of these words.

## Minimizing Outcomes for Player 1

Since player 0 can pick any letter, the minimal `outcomes` set for player 1 must be at least 26 words. Let's see how much bigger it turns out to be. 

With `enable1` we already know that player 1 can force a win, so all the words in the `outcomes` set will have odd length:

In [25]:
outcomes(enable1, '', 1)

{'aah',
 'aal',
 'aargh',
 'aas',
 'bwana',
 'cwm',
 'drave',
 'dreck',
 'drink',
 'droit',
 'drunk',
 'dry',
 'ewe',
 'fjeld',
 'fjord',
 'gjetost',
 'hmm',
 'ihram',
 'jnana',
 'kwashiorkor',
 'llano',
 'mho',
 'nth',
 'oquassa',
 'praam',
 'prequel',
 'prill',
 'pro',
 'prurigo',
 'pry',
 'qua',
 'quell',
 'quiff',
 'quomodo',
 'qursh',
 'rhamnus',
 'rheum',
 'rhizoid',
 'rho',
 'rhumb',
 'rhyolitic',
 'squoosh',
 'tchotchke',
 'uhlan',
 'vroom',
 'wrack',
 'wrest',
 'wrist',
 'wrong',
 'wrung',
 'wry',
 'xanthic',
 'xanthin',
 'ycleped',
 'zucchetto'}

In [26]:
len(_), [w for w in _ if winner(w) == 0]

(55, [])

This says that player 1 can force the game towards one of these 55 words (none of which are losses). Memorize this list and you will never lose as player 1.

How about with the other vocabulary?

In [27]:
outcomes(enable1_4, '', 1)

{'aquiculture',
 'aquifer',
 'aquilegia',
 'aquiver',
 'bwana',
 'cnidarian',
 'drave',
 'dreck',
 'drink',
 'droit',
 'drunk',
 'drywall',
 'eschatologies',
 'eschatology',
 'escheat',
 'eserine',
 'eskar',
 'esophagus',
 'esplanade',
 'esquire',
 'esquiring',
 'essay',
 'estuarial',
 'estuary',
 'esurience',
 'fjeld',
 'fjord',
 'gjetost',
 'hyaenic',
 'hybris',
 'hydatid',
 'hyena',
 'hyenine',
 'hyenoid',
 'hygeist',
 'hying',
 'hylozoism',
 'hylozoist',
 'hymen',
 'hyoid',
 'hypha',
 'hyphen',
 'hyraces',
 'hyrax',
 'hyson',
 'hyte',
 'ihram',
 'jnana',
 'kwashiorkor',
 'llano',
 'mbira',
 'ngultrum',
 'ngwee',
 'oquassa',
 'plaza',
 'plethoric',
 'plica',
 'plonk',
 'pluck',
 'plyer',
 'quack',
 'quell',
 'quiff',
 'quomodo',
 'qursh',
 'rhamnus',
 'rheum',
 'rhizoid',
 'rhomb',
 'rhumb',
 'rhyolitic',
 'squoosh',
 'tchotchke',
 'uhlan',
 'vroom',
 'wrack',
 'wrest',
 'wrist',
 'wrong',
 'wrung',
 'wryly',
 'xanthic',
 'xanthin',
 'ycleped',
 'zucchetto'}

In [28]:
len(_), [w for w in _ if winner(w) == 0]

(85, ['hyphen', 'ngultrum', 'hyte', 'hybris'])

In this case there are 85 words, four of which are losses for player 1. But the other 81 words are wins, so with this strategy you'd have a good chance against an imperfect opponent.

# SuperGhost

In the variant *SuperGhost*, players can add a letter to either the beginning or the end of a fragment, as long as this forms a fragment that is part of some word. I was thinking of SuperGhost when I made the design decision to encapsulate `legal_plays` as a method of `Vocabulary`, rather than as a separate function. Because I did that, I should be able to use all the existing code if I just make a new class, `SuperVocabulary`, that finds *all* fragments (i.e. infixes) rather than just the beginning fragments (i.e. prefixes).

In [29]:
class SuperVocabulary(Vocabulary):
    "Holds a set of legal words and a set of legal infix fragments of those words."
    def __init__(self, words, minlength=3):
        self.words = {word for word in words if len(word) >= minlength}
        self.fragments = {word[i:j] for word in self.words 
                                    for i in range(len(word)) 
                                    for j in range(i, len(word) + 1)}
        
    def legal_plays(self, fragment):
        "All plays (adding a letter to fragment) that form a valid infix."
        return {play for L in alphabet for play in (fragment + L, L + fragment)} & self.fragments

One more thing: I'll change `ask` to ask for a fragment, not a letter:

In [30]:
def ask(name='Player'):
    "Return a strategy that asks for a fragment."
    return (lambda _, fragment: input("{}'s fragment, given '{}'? ".format(name, fragment)))

Now I will create `SuperVocabulary` objects for 3- and 4-letter versions of `enable1`, and check out how many fragments there are in each variant:

In [35]:
enable1s   = SuperVocabulary(enable1.words)
enable1_4s = SuperVocabulary(enable1.words, 4)

[len(v.fragments) for v in [enable1, enable1_4, enable1s, enable1_4s]]

[387878, 387844, 1076434, 1076431]

In [32]:
# Can the first player win in SuperGhost with 3-letter words?

win(enable1s)

True

In [33]:
# How about with a 4-letter limit?

win(enable1_4s)

True

The first player can win with or without three-letter words. And unless the first player is perfect, the rational strategy can do pretty well as seond player as well. Here is a sample game:

In [34]:
play(enable1s, ask('Peter'), rational)

Peter's fragment, given ''? z
Peter's fragment, given 'zq'? zqu
Peter's fragment, given 'ezqu'? ezqui
Peter's fragment, given 'mezqui'? mezquit


(1, 'mezquit')

I would like to give a concise summary of the strategy for SuperGhost, but my existing `outcomes` function won't do it. That's because it is not enough to know that a particular word results in a win; we have to know in what order the letters of the word are added. I'll leave it as an exercise to find a good way to summarize SuperGhost strategies.

# Summary

Here's a summary of what we have learned. (*Note:* the bold **qursh**  means it is a losing word):

| Game 	| Shortest 	| Winner 	| First Player Outcomes | Second Player Outcomes
|----	|---	    |---	    |---                    |---
| Ghost | 3 	    | Second 	| qaid qiviut qoph **qursh** qurush qwerty  | 55 words
| Ghost | 4 	    | First 	| naan nene ngultrum nirvanic nolo null nyctalopia | 85 words
| SuperGhost | 3	| First 	| ? | ? |
| SuperGhost | 4 	| First 	| ? | ? |

# Further Work

Here are some additional ideas to play with:

- **Exploitation:** What are some good strategies when you are not guaranteed to win, to exploit an imperfect human opponent? Can you steer the game so that you win if the opponent is unfamiliar with some obscure word(s)? You might need a file of [word frequencies](http://norvig.com/ngrams/count_1w.txt).
- **Security:** A strategy function could *cheat*, and modify  `vocab.words`, inserting or deleting some crucial words to ensure victory. Can you harden `play` (and/or change `Vocabulary`) to protect against that?
- **Saving Space:** Currently `Vocabulary` saves words and fragments that could never be reached in a game. For example, because `'the'` is a word that ends the game, we could never reach `'them'` or `'theme'` or `'thermoluminescences'`. Can you eliminate these redundant words/fragments?
- **Multi-player:** `play(enable1, ask('A'), ask('B'), ask('C'))` will play a three-player game. But `rational` (along with `win` and `winner`) would no longer work, since they assume there are exactly two players. Can you alter them to allow *n* players?
- **SuperGhost Summary:** Can you summarize a SuperGhost strategy in a way that a human can memorize?
- **Xghost:** In *Xghost*, a letter can be added anywhere, so from the fragment `'ab'` you could play `'arb'`.
- **Spook:** In *Spook*, letters can be rearranged before adding one, so from the fragment `'ab'` you could play `'bxa'`.