# A perfect Tic Tac Tow player, using the Min-Max algorithm #
In this Notebook, we will use the Min-Max algorith to create a computer player which will be able to play Tic Tac Toe perfectly. That is, the player will always play the best move in a given situation. This player will give us a goodbench mark to pit the other players against.

Let's start by importing a few of the utility functions and classes we defined last time and make sure it all works:

In [1]:
from IPython.display import HTML, display
from tic_tac_toe.Board import Board, GameResult, CROSS, NAUGHT, EMPTY
from util import print_board, play_game
from tic_tac_toe.RandomPlayer import RandomPlayer

board = Board()
player1 = RandomPlayer()
player2 = RandomPlayer()

result = play_game(board, player1, player2)
print_board(board)

if result == GameResult.CROSS_WIN:
    print("Cross won")
elif result == GameResult.NAUGHT_WIN:
    print("Naught won")
else:
    print("Draw")

0,1,2
,x,o
,x,
,x,o


Cross won


## The Min-Max algorithm
So, what is this Min-Max algorithm that we want to implement?

The long answer can be found [here](https://en.wikipedia.org/wiki/Minimax). We won't go into that much detail here and just look at the general idea:

Given a board state, we find the best move by simulating all possible continuations from this position and chose the one that is best for us. The one best for us is the one with the best outcome if:

* we always make the move that is best for us (*Maximizes* the game value for us) and 
* our opponent always makes the move that is best for them (and thus worst for us - *Minimizing* the game value for us). 

You can see where the algorithm gets its name from.

Let's look at an exmaple. Given the followin board position and NAUGHT to move next:

In [2]:
example = Board([CROSS  , EMPTY  , CROSS,
                 NAUGHT , NAUGHT , CROSS,
                 EMPTY  , EMPTY  , NAUGHT])
print_board(example)

0,1,2
x,,x
o,o,x
,,o


The following continuations are possible:
![title](./TicTacToe-MinMax-Example1.png)

That is: first NAUGHT, the maximizing player, gets to move, then CROSS, the minimizing player, gets to move and in those cases where the game has not ended at that point, NAUGHT, the maximizing player, gets one more move:
![title](./TicTacToe-MinMax-Example2.png)

We label all final game states according to their value from the point of view of Naught: 
* 1 for a win
* -1 for a loss
* 0 for a draw

![title](./TicTacToe-MinMax-Example3.png)

Now we can back propagate the scores from the bottom layer to the layer above. According to the algorithm, as it is the Max players turn, we chose the move with the highest score. Note that in this initial case, as there is only one possible move and the move thus is forced, we just propagate that value one layer up without having to chose a maximizing move:

![title](./TicTacToe-MinMax-Example4.png)

Now we propagate up again. This time it is the minimizing player's turn, so we propagate the smaller values for each possible move up:

![title](./TicTacToe-MinMax-Example5.png)

Finally, we propagate one more layer up. This time it's the maximizing player again, so we chose the highest possible value of all moves for the position:

![title](./TicTacToe-MinMax-Example6.png)

Now we know everything we need to know to make a move: 

* The best we can hope for if both we and our opponent always plays their best move is a draw (since the score of the current board position is 0). 

* We also know, there is only 1 move in the current situation that will achieve this best case for us: Putting a NAUGHT in the middle spot on the top row.

Note that there are other potential continuation that would also lead to a draw, and even some that might lead to NAUGHT winning. Unfortunately, however we also know now that if CROSS always plays their best move we won't ever have a chance to get there.

## The Min-Max players ##
The code contains the following 2 player classes which implement the MinMax algorithm for TicTacToe. 

In order to make things a bit more efficient the players will remember the scores for a given board position in an internal cache. This means it has to simulate the possible continuation from that position only once. It makes even this first simluation more efficient as often different move combination could produce the same board position, which, with the cached result we don't have to evaluate again.

Even on a moderately fast computer this works quite well due to the small number of possible board positions in Tic Tac Toe: While a game can have something like $9! = 362,800$ different possible move combinations, i.e. 9 choices for the first move, 8 choices for the second move, 7 choices for the 3rd move etc down to 1 choice for the last move (and for simpicity ignoring cases where the game is over before all squares a occupied), the game can only have $3^9 = 19,683$ different states as each square can only either be empty, have a NAUGHTp, or have a CROSS in it (again for simplicity ignoring game states that are impossible in a real game; also ignoring the fact that we could reduce the number of states further by treating symmetirc position as the same). 

* [MinMaxAgent.py](./tic_tac_toe/MinMaxAgent.py): Plays Tic Tac Toe using the MinMax Algorithm in a deterministic way. I.e. if there are more than 1 moves with euqal best scores in a given position this pplayer will always chose the same one.

* [RndMinMaxAgent.py](./tic_tac_toe/RndMinMaxAgent.py): Plays Tic Tac Toe using the MinMax Algorithm in a non-deterministic way. I.e. if there are more than 1 moves with euqal best scores in a given position this pplayer will randomly chose one of them each time.

Let's see how they perform.

First we define a small utility function to pit 2 players against each other:

In [3]:
from tic_tac_toe.Player import Player

def battle(player1: Player, player2: Player, num_games : int = 100000 ) :
    draw_count = 0
    cross_count = 0
    naught_count = 0
    for _ in range(num_games):
        result = play_game(board, player1, player2)
        if result == GameResult.CROSS_WIN:
            cross_count += 1
        elif result == GameResult.NAUGHT_WIN:
            naught_count += 1
        else:
            draw_count += 1
        
    print("After {} game we have draws: {}, cross wins: {}, and naught wins: {}.".format(num_games, draw_count, 
                                                                        cross_count, naught_count))

    print("Which gives percentages of draws : cross : naught of about {:.2%} : {:.2%} : {:.2%}".format(
        draw_count / num_games, cross_count / num_games, naught_count / num_games))

First MinMaxAgent against RandomPlayer:

In [4]:
from tic_tac_toe.MinMaxAgent import MinMaxAgent

battle(MinMaxAgent(), RandomPlayer())

AttributeError: 'MinMaxAgent' object has no attribute 'self'