In [1]:
from IPython.core.display import HTML
with open('../style.css') as f:
    css = f.read()
HTML(css)

# Utilities

The global variable `Cache` is used as a cache for the function `value`defined later.

In [2]:
Cache = {}

The global variable `num_value_calls` is used to count the number of invocations
of the function `value`. 

In [3]:
num_value_calls = 0

The function `memoize` takes a function `f` as its argument.  It returns a *memoized* version of the function `f`.  This memoized version will store all results in the directory `Cache` where the arguments of `f` are used as keys.  Later, when `f` is called with the same arguments, the result is retrieved from the `Cache` instead of being recomputed.  Note that the function `f_memoized` is a [closure](https://en.wikipedia.org/wiki/Closure_(computer_programming)).

In [4]:
def memoize(f):
    global Cache
    
    def f_memoized(*args):
        if args in Cache:
            return Cache[args]
        result = f(*args)
        Cache[args] = result
        return result
    
    return f_memoized

# The Minimax Algorithm

This notebook implements the [minimax algorithm](https://en.wikipedia.org/wiki/Minimax) and thereby implements a program that can play various deterministic, zero-sum, two-person games with perfect information.  The implementation assumes that an external notebook defines a game and provides the following variables and functions:
* `Players` is a list of length two.  The elements of this list are the 
  players.  It is assumed that the first element in this list represents 
  the computer, while the second element is the human player.  The computer
  always starts the game.
* `Start` is the start state of the game.
* `next_states(state, player)` is a function that takes two arguments:`
  - `state` is a state of the game.
  - `player` is the player whose turn it is to make a move.
  The function call `next_states(state, player)` returns the list
  of all states that can be reached by any move of `player`.
* `utility(state, player)` takes a state and a player as its arguments.
  If `state` is a terminal state, then the function returns the value
  that this `state`has for `player`.  Otherwise, the function returns `None`.
* `finished(state)` returns `True` if and only if `state` is a terminal state.
* `get_move(state)` displays the given state and asks the human player for
  her move.
* `final_msg(state)` informs the human player about the result of the game.
* `draw(state, canvas, value)` draws the given state on the canvas and informs
  the user about the `value` of this state. 
   
---

In order to have some variation in our games, we use random numbers to choose between different optimal moves.

In [5]:
import random
random.seed(1)

Given a player `p`, the function `other(p)` computes the opponent of `p`.  This assumes that there are only two players and the set of all players is stored in the global variable `Players`.

In [6]:
other = lambda p: [o for o in Players if o != p][0]

The function `value(State, player)` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `value` returns the *value* that the given `State` has for `player` if both players play their best game.  This values is an element from the set $\{-1, 0, 1\}$.  
* If `player` can force a win, then the return value is `1`.
* If `player` can at best force a draw, then the return value is `0`.
* If the opponent of `player` can force a win for herself, then the return value is `-1`.

For reasons of efficiency, this function is *memoized*.  Mathematically, the function `value`
is defined recursively:
- $\texttt{finished}(s) \rightarrow \texttt{value}(s, p) = \texttt{utility}(s, p)$
- $\neg \texttt{finished}(s) \rightarrow 
   \texttt{value}(s, p) = \max\bigl(\bigl\{
                     -\texttt{value}(n, o) \bigm| n \in \texttt{nextStates}(s, p)
                     \bigr\}\bigr)
  $, where $o = \texttt{other}(p)$

In [7]:
@memoize
def value(State, player):
    global num_value_calls 
    num_value_calls += 1
    if finished(State):
        return utility(State, player)
    o = other(player)
    return max([ -value(ns, o) for ns in next_states(State, player) ])

The function `best_move` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `best_move` returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state.  The state $s$ is a state that is reached from `State` if `player` makes one of her optimal moves.  In order to have some variation in the game, the function randomly chooses any of the optimal moves.

In [8]:
def best_move(State, player):
    NS        = next_states(State, player)
    bestVal   = value(State, player)
    BestMoves = [s for s in NS if -value(s, other(player)) == bestVal]
    BestState = random.choice(BestMoves)
    return bestVal, BestState

The next line is needed because we need the function `IPython.display.clear_output` to clear the output in a cell.

In [9]:
import IPython.display 

The function `play_game` plays a game on the given `canvas`.  The game played is specified indirectly as follows:
- `Start` is a global variable defining the start state of the game.
- `next_states` is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
- `finished` is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
- `utility` is a function such that $\texttt{utility}(s, p)$ returns either `-1`, `0`, or `1` in the *terminal state* $s$.  We have that
  - $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$, 
  - $\texttt{utility}(s, p)=  0$ iff the game is drawn, and 
  - $\texttt{utility}(s, p)=  1$ iff the game is won for player $p$ in state $s$.

In [10]:
def play_game(canvas):
    State = Start
    while True:
        firstPlayer = Players[0]
        val, State  = best_move(State, firstPlayer);
        draw(State, canvas, f'For me, the game has the value {val}.')
        if finished(State):
            final_msg(State)
            return
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        draw(State, canvas, '')
        if finished(State):
            IPython.display.clear_output(wait=True)
            final_msg(State)
            return

In [None]:
%run Tic-Tac-Toe.ipynb

In [11]:
%run Tic-Tac-Toe-Bitboard.ipynb

+-+-+-+
|X|O|X|
+-+-+-+
|X|O|X|
+-+-+-+
|O|X| |
+-+-+-+

+-+-+-+
| |O|X|
+-+-+-+
|X|O|X|
+-+-+-+
| | |O|
+-+-+-+

state:
+-+-+-+
| |O|X|
+-+-+-+
|X|O|X|
+-+-+-+
|O| | |
+-+-+-+

next states:
+-+-+-+
|X|O|X|
+-+-+-+
|X|O|X|
+-+-+-+
|O| | |
+-+-+-+

+-+-+-+
| |O|X|
+-+-+-+
|X|O|X|
+-+-+-+
|O|X| |
+-+-+-+

+-+-+-+
| |O|X|
+-+-+-+
|X|O|X|
+-+-+-+
|O| |X|
+-+-+-+

+-+-+-+
|X|X|X|
+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+

+-+-+-+
| | | |
+-+-+-+
|X|X|X|
+-+-+-+
| | | |
+-+-+-+

+-+-+-+
| | | |
+-+-+-+
| | | |
+-+-+-+
|X|X|X|
+-+-+-+

+-+-+-+
|X| | |
+-+-+-+
|X| | |
+-+-+-+
|X| | |
+-+-+-+

+-+-+-+
| |X| |
+-+-+-+
| |X| |
+-+-+-+
| |X| |
+-+-+-+

+-+-+-+
| | |X|
+-+-+-+
| | |X|
+-+-+-+
| | |X|
+-+-+-+

+-+-+-+
|X| | |
+-+-+-+
| |X| |
+-+-+-+
| | |X|
+-+-+-+

+-+-+-+
| | |X|
+-+-+-+
| |X| |
+-+-+-+
|X| | |
+-+-+-+

+-+-+-+
|X|O|X|
+-+-+-+
|X|O|O|
+-+-+-+
|X| | |
+-+-+-+

+-+-+-+
|X|O|X|
+-+-+-+
| |O| |
+-+-+-+
|X|O|X|
+-+-+-+

+-+-+-+
|X|O|X|
+-+-+-+
|O|O|X|
+-+-+-+
|X|X|O|
+-+-+-+

0
+-+-+-+
|

Canvas(width=450)

With the game *tic-tac-toe* represented as lists and without memoization, computing the value of the start state takes 7.11 seconds.
If we use a bitboard instead, it takes 3.85 seconds.  However, the bitboard truly shines when we use memoization:
* Representing states as bitboards and using memoization we need 836 megabytes and the computation needs 49 milliseconds.
* Representing states as lists of lists and using memoization uses 6524 megabytes and takes 296 milliseconds.
Observe that *memoization* accounts for a more than tenfold speedup. 

In [12]:
%%time
val = value(Start, 0)
val

Wall time: 40.1 ms


0

Let us check how many times the function `value` has been called.

In [13]:
num_value_calls

5478

We have the following results depending on whether we use memoization or not:

| Memoization   | Number of Calls |
|:------------- | ---------------:|
| `False`       |          549946 |
| `True`        |            5478 |

The start state has the value `0`as neither player can force a win.

In [14]:
val

0

We check how many different states are stored in the `Cache`.

In [15]:
len(Cache)

5478

Let's draw the board.

In [16]:
canvas = create_canvas()
draw(Start, canvas, f'Current value of game for "X": {val}')

Canvas(width=450)

Now its time to play.  In the input window that will pop up later, enter your move in the format "row,col"  with no space between row and column.

In [17]:
play_game(canvas)

Enter move here: 1,0
It's a draw.
