The goal of this series is to implement and test a couple of different approaches to teaching a computer how to play Tic Tac Toe. We will create 

* a player that plays completely randomly, 
* two players that implement simple forms of the Min-Max algorithm, 
* a tabular Q-Learning player, 
* a and use re-inforcement learning to train a Neural Network using Q-learning as well as Policy Gradient.
* a and use re-inforcement learning to train a Neural Network using Q-learning as well as Policy Gradient.

Let's get started:

We import the following classes:

Board: Contains all the Tic Tac Toe board state management plus some utility methods
GameResult: A game can be either NOT_FINISHED, DRAW, CROSS_WIN, or NAUGT_WIN
CROSS, NAUGHT: Will tell our players what side they play

We also define a utility method 'print_board' that prints a board state pretty in HTML

In [1]:
from IPython.display import HTML, display
from tic_tac_toe.Board import Board, GameResult, CROSS, NAUGHT


def print_board(board):
    display(HTML(board.html_str()))

With everything set up, we can now create a new board and print it in all its empty glory.

In [2]:
board = Board()
print_board(board)

0,1,2
,,
,,
,,


Now let's use the methods 'random_empty_spot' and 'move' to find a random empty spot on the board and put a CROSS there. We then print the board to confirm.

In [3]:
board.move(board.random_empty_spot(), CROSS)
print_board(board)

0,1,2
,,
,,x
,,


Now let's extend that to play a whole game. 

We reset the board state and play alternating CROSS and NAUGHT until the game is either won by one side or a draw. We print the board after each move and and the end print out who won.

In [4]:
board.reset()
finished = False
while not finished:
   _, result, finished = board.move(board.random_empty_spot(), CROSS)
   print_board(board)
   if finished:
       if result == GameResult.DRAW:
           print("Game is a draw")
       else:
           print("Cross won!")
   else:
       _, result, finished = board.move(board.random_empty_spot(), NAUGHT)
       print_board(board)
       if finished:
            if result == GameResult.DRAW:
               print("Game is a draw")
            else:
               print("Naught won!")

0,1,2
,,
,,
x,,


0,1,2
,,o
,,
x,,


0,1,2
,,o
,,x
x,,


0,1,2
,o,o
,,x
x,,


0,1,2
,o,o
x,,x
x,,


0,1,2
,o,o
x,,x
x,,o


0,1,2
,o,o
x,x,x
x,,o


Cross won!


Now let's create a utility function 'play_game' that takes a board and 2 players and plays a complete game. It returns the result of the game at the end.

We test the new function by playing a game

In [5]:
from tic_tac_toe.Player import Player


def play_game(board: Board, player1: Player, player2: Player):
    player1.new_game(CROSS)
    player2.new_game(NAUGHT)
    board.reset()
    
    finished = False
    while not finished:
        result, finished = player1.move(board)
        if finished:
            if result == GameResult.DRAW:
                final_result = GameResult.DRAW
            else:
                final_result =  GameResult.CROSS_WIN
        else:
            result, finished = player2.move(board)
            if finished:
                if result == GameResult.DRAW:
                    final_result =  GameResult.DRAW
                else:
                    final_result =  GameResult.NAUGHT_WIN
        
    player1.final_result(final_result)
    player2.final_result(final_result)
    return final_result


from tic_tac_toe.RandomPlayer import RandomPlayer


result = play_game(board, RandomPlayer(), RandomPlayer())
print_board(board)
if result == GameResult.CROSS_WIN:
    print("Cross won")
elif result == GameResult.NAUGHT_WIN:
    print("Naught won")
else:
    print("Draw")

0,1,2
o,o,x
x,x,o
o,x,x


Draw


Establishing some ground truth.

The above now allows us to establish some ground truth: If we let 2 random players play against each other, how many games do we expect to be won by NAUGHT, how many by CROSS, and how many do we expect to end in a draw?

Going forward, building more intelligent players, we can then measure how much better they play compared to a random player.

In [17]:
num_games = 100000

draw_count = 0
cross_count = 0
naught_count = 0

p1 = RandomPlayer()
p2 = RandomPlayer()

for _ in range(num_games):
    result = play_game(board, p1, p2)
    if result == GameResult.CROSS_WIN:
        cross_count += 1
    elif result == GameResult.NAUGHT_WIN:
        naught_count += 1
    else:
        draw_count += 1
        
print("After {} game we have draws: {}, cross wins: {}, and naught wins: {}.".format(num_games, draw_count, 
                                                                        cross_count, naught_count))

print("Which gives percentages of draws : cross : naught of about {:.2%} : {:.2%} : {:.2%}".format(
    draw_count / num_games, cross_count / num_games, naught_count / num_games))

After 100000 game we have draws: 12638, cross wins: 58688, and naught wins: 28674.
Which gives percentages of draws : cross : naught of about 12.64% : 58.69% : 28.67%
