# Games and Adversarial Search
## Noughts and Crosses
### Introduction

<figure>
<img src="images/noughtsandcrosses.png", width=200 align="right">
    <figcaption></figcaption>
</figure>

Noughts and Crosses, or Tic-Tac-Toe is a simple game for two players played on a 3x3 grid. One player takes O (noughts) and the other takes X (crosses). The players alternate placing their symbol in a cell on the grid until one of the gets three in a row, winning the game.

The game is well known for having a relatively easy to learn optimal strategy, which will lead to a draw if followed by both players.

## Minimax Search
Minimax is a simple search algorithm which can determine optimal play assuming the opponent is going to play perfectly. This assumption usually works even against suboptimal players – good moves are still good!

The full version of the algorithm is a depth first search of the entire state space for a 2-player game. First we model the problem such that high rewards benefit us and low (or negative) rewards benefit our opponent: so we are looking to maximise the reward, and our opponent is looking to minimise the reward. 

Assume we are player 1. On a given turn, look at the reward for each action, and take the maximum one. How do we find the reward for each action? Well, we look at what player 2 would do in response following the same logic: look at every action available to them, and choose the one that minimises the reward. Of course this *also* requires knowing the rewards, for which we might need to look at what player 1 would do in response to each action. And so on. Eventually, one of the moves will end the game, and the reward can be fed back up the search tree.

Here is the pseudocode for minimax from Russell and Norvig (p. 166): <br />
<img src="./images/minimax.png" width=500 />

## Example
In the cell below you will find two classes. 
* The first class, `NoughtsAndCrosses`, is for the game state and rules – you can skim through the code if you want to see how it works, but hopefully it will be clear enough from the various names. 
* The second class, `MinimaxAgent`, is the important one. This is the one that contains the recursive function which you should read carefully to better your understanding of the algorithm.

Note that the cell will not immediately produce any output when it is run, but there is code further down the page which shows a demonstration (as always make sure you have actually run the following cell first).

In [None]:
import math
import numpy as np
import copy


class NoughtsAndCrosses:
    def __init__(self):
        self.EMPTY = ' '
        self.NOUGHT = 'O'
        self.CROSS = 'X'
        self.DRAW = 'draw'

        self.next_player = self.CROSS
        self.flip_player = {self.CROSS: self.NOUGHT, self.NOUGHT: self.CROSS}

        self.board = np.array([[self.EMPTY for _ in range(3)] for _ in range(3)])

    def valid_move(self, row, col):
        return self.board[row, col] == self.EMPTY

    def move(self, row, col):
        """Returns a *copy* of this state with the specified move"""
        if not self.valid_move(row, col):
            raise ValueError("Position not empty")

        state = copy.deepcopy(self)
        state.board[row, col] = state.next_player
        state.next_player = state.flip_player[state.next_player]
        return state

    def winner(self):
        for i in range(3):
            if np.all(self.board[i, :] == self.board[i, 0]) and self.board[i, 0] != self.EMPTY:
                return self.board[i, 0]
            if np.all(self.board[:, i] == self.board[0, i]) and self.board[0, i] != self.EMPTY:
                return self.board[0, i]
        if self.EMPTY != self.board[0, 0] and self.board[0, 0] == self.board[1, 1] and self.board[1, 1] == self.board[2, 2]:
            return self.board[1, 1]
        if self.EMPTY != self.board[2, 0] and self.board[2, 0] == self.board[1, 1] and self.board[1, 1] == self.board[0, 2]:
            return self.board[1, 1]
        if np.all(self.board != self.EMPTY):
            return self.DRAW
        return False

    def actions(self):
        row, col = np.nonzero(self.board == self.EMPTY)
        return [(r, c) for r, c in zip(row, col)]


class MinimaxAgent:
    def __init__(self, verbose=False):
        self.verbose = verbose

    def next_move(self, state=NoughtsAndCrosses()):
        player = state.next_player

        best_action = None
        best_value = -1 * math.inf
        for action in state.actions():
            new_state = state.move(action[0], action[1])
            action_value = self.get_value(new_state, player, get_min=True)
            if self.verbose:
                print(action_value, end=" ")
            if action_value > best_value:
                best_action = action
                best_value = action_value
        if self.verbose:
            print()
        return best_action

    def get_value(self, state, player, get_min):
        """If get_min is set to true, returns the minimum value, otherwise the maximum value"""
        other_player = state.flip_player[player]

        winner = state.winner()
        if winner == player:
            return 1
        elif winner == other_player:
            return -1
        elif winner == state.DRAW:
            return 0

        best_value = math.inf
        if not get_min:
            best_value *= -1

        for action in state.actions():
            new_state = state.move(action[0], action[1])
            action_value = self.get_value(new_state, player, get_min=not get_min)
            if not get_min and action_value > best_value \
                    or get_min and action_value < best_value:
                best_value = action_value

        return best_value

## Demonstration
Run the cell below to play a game against this AI.

There is an additional agent class called `HumanAgent`, which will ask the user for input on the command line to determine what move to make. Then there is a function which plays a game of noughts and crosses. By default the game is Human vs AI (with the AI going second), but you can manually change the values for `player1` and `player2` to change this. 

Just a warning: the minimax algorithm generates every possible game when it is asked to go first – expect it to take a long time to run.

In [None]:
class HumanAgent:
    def next_move(self, state):
        while True:
            try:
                print("What's your next move? In format row,col")
                move = input(">")
                move = move.split(',')
                move = int(move[0]), int(move[1])
                if not state.valid_move(move[0], move[1]):
                    print("Space must be empty.")
                else:
                    return move
            except ValueError:
                print("Please enter valid space as row,col between 0,0 and 2,2")


def run_game(player1=HumanAgent(), player2=MinimaxAgent()):
    state = NoughtsAndCrosses()
    print(state.board)
    while not state.winner():
        move = player1.next_move(state)
        state = state.move(move[0], move[1])
        print(state.board)
        if state.winner() == state.CROSS:
            print("Player one wins!")
            return
        elif state.winner() == state.DRAW:
            print("It's a draw.")
            return

        move = player2.next_move(state)
        state = state.move(move[0], move[1])
        print(state.board)
        if state.winner() == state.NOUGHT:
            print("Player two wins!")
            return
        elif state.winner() == state.DRAW:
            print("It's a draw.")
            return


if __name__ == "__main__":
    run_game()

## Extensions
Minimax is too inefficient for most real games, it's only just enough to work on noughts and crosses. The minimax implementation above considers 549,945 states to calculate its opening move. We could cut this significantly by optimising for repeated states and symmetry of the board. But even with optimisations like these, the search graph for chess has about $10^{40}$ distinct states, or 10,000,000,000,000,000,000,000,000,000,000,000,000,000.

Still, minimax is a good foundation on which to build other optimisations:
* Alpha-beta pruning
* Table lookups
* Evaluation functions

## Alpha-Beta Pruning
Alpha-beta pruning is a technique which can cut the search space of minimax roughly in half with no loss of optimality. Consider this example game from Russel and Norvig (p. 164-168): <br />
<img src="images/game.png" width=400 />

Now suppose we are searching using minimax and we reach this point in the tree: <br />
<img src="images/gameab.png" width=400 />

Notice that if the MAX player choose action $a_1$, then they are guaranteed a result of 3, because this is the lowest value for the MIN player. So the MAX player knows their *minimum* result is 3.

Now while exploring the MIN player's actions after MAX takes action $a_2$, we realise that MIN has the option of picking an action that will result in a reward of 2, the action $c_1$. We can now rule out action $a_2$ from the perspective of the MAX player, *without* having to check the rewards for $c_2$ or $c_3$. The MAX player knows $a_1$ is a better option than $a_2$, because if $c_2$ or $c_3$ give a better result than 3, the MIN player will not choose it, and we do not care if $c_2$ or $c_3$ give a *worse* result than 2, because 2 is already worse than the 3 we can get from taking $a_1$.

## Task
Use the pseudocode below (Russel and Norvig p. 170) to adapt the MinimaxAgent to use alpha-beta pruning. If you are stuck, read this section in the textbook for more details. 

Some skeleton code is provided in the cell below, look for the lines which say `### YOUR CODE HERE`. <br />
<img src="./images/abpseudo.png" width=500 />

In [None]:
class ABMinimaxAgent:
    def __init__(self, verbose=False):
        self.verbose = verbose

    def next_move(self, state=NoughtsAndCrosses()):
        player = state.next_player

        best_action = None
        best_value = -1 * math.inf
        for action in state.actions():
            new_state = state.move(action[0], action[1])
            action_value = self.get_value(new_state, player, get_min=True)
            if self.verbose:
                print(action_value, end=" ")
            if action_value > best_value:
                best_action = action
                best_value = action_value
        if self.verbose:
            print()
        return best_action

    def get_value(self, state, player, get_min, alpha=-math.inf, beta=math.inf):
        """If get_min is set to true, returns the minimum value, otherwise the maximum value"""
        other_player = state.flip_player[player]

        winner = state.winner()
        if winner == player:
            return 1
        elif winner == other_player:
            return -1
        elif winner == state.DRAW:
            return 0

        best_value = math.inf
        if not get_min:
            best_value *= -1

        for action in state.actions():
            new_state = state.move(action[0], action[1])
            action_value = self.get_value(new_state, player, get_min=not get_min, alpha=alpha, beta=beta)
            
            if not get_min:
                ### YOUR CODE HERE
            else:
                ### YOUR CODE HERE

            if not get_min and action_value > best_value \
                    or get_min and action_value < best_value:
                best_value = action_value

        return best_value
    

run_game(player1=ABMinimaxAgent(), player2=ABMinimaxAgent())