In [1]:
from IPython.core.display import HTML
with open('../style.css') as f:
    css = f.read()
HTML(css)

# A Recursive Implementation of Minimax 

This notebook implements the [minimax algorithm](https://en.wikipedia.org/wiki/Minimax) in a pure form, i.e. it does not employ any *memoization techniques*.

---

In order to have some variation in our games, we use random numbers to choose between different optimal moves.

In [2]:
import random
random.seed(1)

Given a player `p`, the function `other(p)` computes the opponent of `p`.  This assumes that there are only two players and the set of all players is stored in the global variable `Players`.

In [3]:
other = lambda p: [o for o in Players if o != p][0]

The function `value(State, player)` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `value` returns the *value* that the given `State` has for `player` if both players play their best game.  This values is an element from the set $\{-1, 0, 1\}$.  
* If `player` can force a win, then the return value is `1`.
* If `player` can at best force a draw, then the return value is `0`.
* If the opponent of `player` can force a win for herself, then the return value is `-1`.

For reasons of efficiency, this function is *memoized*.  Mathematically, the function `value`
is defined recursively:
- $\texttt{finished}(s) \rightarrow \texttt{value}(s, p) = \texttt{utility}(s, p)$
- $\neg \texttt{finished}(s) \rightarrow 
   \texttt{value}(s, p) = \max\bigl(\bigl\{
                     -\texttt{value}(n, o) \bigm| n \in \texttt{nextStates}(s, p)
                     \bigr\}\bigr)
  $, where $o = \texttt{other}(p)$

In [4]:
def value(State, player):
    if finished(State):
        return utility(State, player)
    Moves = next_states(State, player)
    return value_list(Moves, player)

The function `value_list` takes three arguments:
- `Moves` is a list of states.  Each of these states results from a move
  that `player` has made in a given state.
  When `value_list` is called initially, this list is non-empty.
- `player` defines the player who has made the moves.
- `val` is a lower bound for the value of the state where the moves have been 
  made.  Initially, this value is $-1$ as we don't yet know how good or bad
  this initial state is.

In [None]:
def value_list(Moves, player, alpha=-1):
    if Moves == []:
        return alpha
    move_val = -value(Moves[0], other(player))
    alpha    = max(move_val, alpha)
    return value_list(Moves[1:], player, alpha)

The function `best_move` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `best_move` returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state.  The state $s$ is a state that is reached from `State` if `player` makes one of her optimal moves.  In order to have some variation in the game, the function randomly chooses any of the optimal moves.

In [None]:
def best_move(State, player):
    NS        = next_states(State, player)
    bestVal   = value(State, player)
    BestMoves = [s for s in NS if -value(s, other(player)) == bestVal]
    BestState = random.choice(BestMoves)
    return bestVal, BestState

The next line is needed because we need the function `IPython.display.clear_output` to clear the output in a cell.

In [None]:
import IPython.display 

The function `play_game` plays a game on the given `canvas`.  The game played is specified indirectly as follows:
- `Start` is a global variable defining the start state of the game.
- `next_states` is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
- `finished` is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
- `utility` is a function such that $\texttt{utility}(s, p)$ returns either `-1`, `0`, or `1` in the *terminal state* $s$.  We have that
  - $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$, 
  - $\texttt{utility}(s, p)=  0$ iff the game is drawn, and 
  - $\texttt{utility}(s, p)=  1$ iff the game is won for player $p$ in state $s$.

In [None]:
def play_game(canvas):
    State = Start
    while True:
        firstPlayer = Players[0]
        val, State  = best_move(State, firstPlayer);
        draw(State, canvas, f'For me, the game has the value {val}.')
        if finished(State):
            final_msg(State)
            return
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        draw(State, canvas, '')
        if finished(State):
            IPython.display.clear_output(wait=True)
            final_msg(State)
            return

In [None]:
%run Tic-Tac-Toe-Bitboard.ipynb

With the game *tic-tac-toe* represented as lists and without memoization, computing the value of the start state takes 7.11 seconds.
If we use a bitboard instead, it takes 3.85 seconds.  However, the bitboard truly shines when we use memoization:
* Representing states as bitboards and using memoization we need 808 kilobytes and the computation needs 49 milliseconds.
* Representing states as lists of lists and using memoization uses 7648 kilobytes and takes 296 milliseconds.
Observe that *memoization* accounts for a more than tenfold speedup. 

In [None]:
import resource

In [None]:
%%time
val = value(Start, 0)

The start state has the value `0` as neither player can force a win.

In [None]:
val

Let's draw the board.

In [None]:
canvas = create_canvas()
draw(Start, canvas, f'Current value of game for "X": {val}')

Now its time to play.  In the input window that will pop up later, enter your move in the format "row,col"  with no space between row and column.

In [None]:
play_game(canvas)