In [None]:
from IPython.core.display import HTML
with open('../style.css') as f:
    css = f.read()
HTML(css)

# Alpha-Beta Pruning

This notebook implements the [alpha-beta pruning](https://en.wikipedia.org/wiki/Alpha-beta_pruning) in a pure form, i.e. it
does not implement any *memoization techniques* since adding these techniques in a meaningfull way results in an algorithm that is quite complicated.

Effectively, this notebook is a *game solver* because it can be used to play various deterministic, zero-sum, two-person games with perfect information.  To this end, the implementation assumes that an external notebook defines a game and provides the following variables and functions:
* `Players` is a list of length two.  The elements of this list are the 
  players.  It is assumed that the first element in this list represents 
  the computer, while the second element is the human player.  The computer
  always starts the game.
* `Start` is the start state of the game.
* `next_states(state, player)` is a function that takes two arguments:
  - `state` is a state of the game.
  - `player` is the player whose turn it is to make a move.
  The function call `next_states(state, player)` returns the list
  of all states that can be reached by any move of `player`.
* `utility(state, player)` takes a `state` and a `player` as its arguments.
  If `state` is a terminal state, then the function returns the value
  that this `state`has for `player`.  Otherwise, the function returns `None`.
* `finished(state)` returns `True` if and only if `state` is a terminal state.
* `get_move(state)` displays the given state and asks the human player for
  her move.
* `final_msg(state)` informs the human player about the result of the game.
* `draw(state, canvas, value)` draws the given state on the canvas and informs
  the user about the `value` of this state. 
   
---

In order to have some variation in our game, we use random numbers to choose between optimal moves.

In [None]:
import random
random.seed(0)

Given a player `p`, the function `other(p)` computes the opponent of `p`.  This assumes that there are only two players and the set of all players is stored in the global variable `Players`.

In [None]:
other = lambda p: [o for o in Players if o != p][0]

The function `value` takes four arguments:
- `State` is the current state of the game,
- `player` is a player,
- `alpha` is the worst result that can happen to `player`,
- `beta` is the best result that can happen to `player`.

The function `value` returns the *value* that the given `State` has for `player` if both players play their optimal game.  This value is an element from the set $\{-1, 0, 1\}$.  
- If `player` can force a win, the return value is `1`.
- If `player` can at best force a draw, the return value is `0`.
- If `player` might loose even when playing optimal, the return value is `-1`.

The variable `num_value_call` is a global variable that keeps track how often the function `value` has been invoked.

In [None]:
num_value_call = 0

In [None]:
def value(State, player, alpha=-1, beta=1):
    global num_value_call
    num_value_call += 1
    return alphaBeta(State, player, alpha, beta)

The function `alphaBeta` satisfies the following specification:
- $\alpha \leq \texttt{value}(s, p) \leq \beta \;\rightarrow\;\texttt{alphaBeta}(s, p, \alpha, \beta) = \texttt{value}(s,p)$
- $\texttt{value}(s, p) < \alpha \;\rightarrow\; \texttt{alphaBeta}(s, p, \alpha, \beta) \leq \alpha$
- $\beta < \texttt{value}(s, p) \;\rightarrow\; \beta \leq \texttt{alphaBeta}(s, p, \alpha, \beta)$

Note that this specification does not define the function `alphaBeta` as there 
are many functions that satisfy this specification. 

In [None]:
def alphaBeta(State, player, alpha, beta):
    if finished(State):
        return utility(State, player)
    val = alpha
    for ns in next_states(State, player):
        o   = other(player)
        val = max(val, -value(ns, o, -beta, -alpha))
        if val >= beta:
            return val
        alpha = max(val, alpha)
    return val

The function `best_move` takes two arguments:
- `State` is the current state of the game,
- `player` is a player.

The function `best_move` returns a pair of the form $(v, s)$ where $s$ is a state and $v$ is the value of this state.  The state $s$ is a state that is reached from `State` if `player` makes one of her optimal moves.  In order to have some variation in the game, the function randomly chooses any of the optimal moves.

In [None]:
def best_move(State, player):
    NS        = next_states(State, player)
    bestVal   = value(State, player)
    BestState = random.choice([s for s in NS if -value(s, other(player)) == bestVal])
    return bestVal, BestState

The next line is needed because we need the function `IPython.display.clear_output` to clear the output in a cell.

In [None]:
import IPython.display 

The function `play_game` plays on the given `canvas`.  The game played is specified indirectly by specifying the following:
- `Start` is a global variable defining the start state of the game.
- `next_states` is a function such that $\texttt{next_states}(s, p)$ computes the set of all possible states that can be reached from state $s$ if player $p$ is next to move.
- `finished` is a function such that $\texttt{finished}(s)$ is true for a state $s$ if the game is over in state $s$.
- `utility` is a function such that $\texttt{utility}(s, p)$ returns either `-1`, `0`, or `1` in the *terminal state* $s$.  We have that
  - $\texttt{utility}(s, p)= -1$ iff the game is lost for player $p$ in state $s$, 
  - $\texttt{utility}(s, p)=  0$ iff the game is drawn, and 
  - $\texttt{utility}(s, p)=  1$ iff the game is won for player $p$ in state $s$.

In [None]:
def play_game(canvas):
    State = Start
    while (True):
        firstPlayer = Players[0]
        val, State  = best_move(State, firstPlayer);
        draw(State, canvas, f'For me, the game has the value {val}.')
        if finished(State):
            final_msg(State)
            break
        IPython.display.clear_output(wait=True)
        State = get_move(State)
        if finished(State):
            draw(State, canvas, '')
            IPython.display.clear_output(wait=True)
            final_msg(State)
            break

In [None]:
%run Tic-Tac-Toe-Bitboard.ipynb

If we use *$\alpha$-$\beta$ pruning*, computing the value of the `Start` state of *tic-tac-toe* takes 95 ms.

In [None]:
%%time
val = value(Start, 0)

Let us check how many times the function `value` is called:

In [None]:
num_value_call

We have the following results depending on whether we use *$\alpha$-$\beta$ pruning* or the plain *minimax algorithm*:

| Algorithm                 | Number of Calls |
|:------------------------- | ---------------:|
| Minimax                   |      $549\,946$ |
| $\alpha$-$\beta$ pruning  |       $16\,811$ |

Let's draw the board.

In [None]:
canvas = create_canvas()
draw(Start, canvas, f'Current value of game for "X": {val}')

Now its time to play.  In the input window that will pop up later, enter your move in the format "row,col"  with no space between row and column.

In [None]:
play_game(canvas)