# Adversarial Search: Solving Tic-Tac-Toe with Monte Carlo Tree Search

## Introduction 

Multiplayer games can be implemented as:
1. Nondeterministic actions: The opponent is seen as part of an environment with nondeterministic actions. Non-determinism is the result of the unknown opponent's moves. 
2. Optimal Decisions: Minimax search (search complete game tree) and alpha-beta pruning.
3. Heuristic Alpha-Beta Tree Search: Cut off tree search and use heuristic to estimate state value. 
4. __Monte Carlo Tree search:__ Simulate playouts to estimate state value. 

Here we will implement search for Tic-Tac-Toe (see [rules](https://en.wikipedia.org/wiki/Tic-tac-toe)). The game is a __zero-sum game__: Win by x results in +1, win by o in -1 and a tie has a value of 0. Max plays x and tries to maximize the outcome while Min plays o and tries to minimize the outcome.   

We will implement
* Pure Monte Carlo search

## The board

I represent the board as a vector of length 9. The values are `' ', 'x', 'o'`.  

In [5]:
def empty_board():
    return [' '] * 9

board = empty_board()
display(board)

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

Some helper functions.

In [6]:
import numpy as np

def show_board(board):
    """display the board"""
    b = np.array(board).reshape((3,3))
    print(b)

board = empty_board()
show_board(board)    

print()
print("Add some x's")
board[0] = 'x'; board[3] = 'x'; board[6] = 'x';  
show_board(board)

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

Add some x's
[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]


In [7]:
def check_win(board):
    """check the board and return one of x, o, d (draw), or n (for next move)"""
    
    board = np.array(board).reshape((3,3))
    
    diagonals = np.array([[board[i][i] for i in range(len(board))], 
                          [board[i][len(board)-i-1] for i in range(len(board))]])
    
    for a_board in [board, np.transpose(board), diagonals]:
        for row in a_board:
            if len(set(row)) == 1 and row[0] != ' ':
                return row[0]
    
    # check for draw
    if(np.sum(board == ' ') < 1):
        return 'd'
    
    return 'n'

show_board(board)
print('Win? ' + check_win(board))

print()
show_board(empty_board())
print('Win? ' + check_win(empty_board()))

[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]
Win? x

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]
Win? n


In [8]:
def get_actions(board):
    """return possible actions as a vector ot indices"""
    return np.where(np.array(board) == ' ')[0].tolist()

    # randomize the action order
    #actions = np.where(np.array(board) == ' ')[0]
    #np.random.shuffle(actions)
    #return actions.tolist()


show_board(board)
get_actions(board)

[['x' ' ' ' ']
 ['x' ' ' ' ']
 ['x' ' ' ' ']]


[1, 2, 4, 5, 7, 8]

In [9]:
def result(state, player, action):
    """Add move to the board."""
    
    state = state.copy()
    state[action] = player
  
    return state

show_board(empty_board())

print()
print("State for placing an x at position 4:")
show_board(result(empty_board(), 'x', 4))

[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]

State for placing an x at position 4:
[[' ' ' ' ' ']
 [' ' 'x' ' ']
 [' ' ' ' ' ']]


In [10]:
def utility(state):
    """check is a state is terminal and return the utility if it is. None means not a terminal mode."""
    goal = check_win(state)        
    if goal == 'x': return +1 
    if goal == 'd': return 0  
    if goal == 'o': return -1  # loss is failure
    return None # continue

print(utility(['x'] * 9))
print(utility(['o'] * 9))
print(utility(empty_board()))

1
-1
None


# Pure Monte Carlo Tree Search

See AIMA page 161ff. 

We implement a extremely simplified version.

For the current state: 
1. Simulate $N$ random playouts for each possible action and 
2. pick the action with the highest average utility.

__Important note:__ we use here a random playout policy, which ends up creating just a randomized search that works fine for this toy problem. For real applications you need to extend the code with
1. a good __playout policy__ (e.g., learned by self-play) and 
2. a __selection policy__ (e.g., UCB1).

## Simulate playouts

In [11]:
def playout(state, action):
    """Perfrom a random playout starting with the given action on the fiven board 
    and return the utility of the finished game."""
    state = result(state, 'x', action)
    player = 'o'
    
    while(True):
        # reached terminal state?
        u = utility(state)
        if u is not None: return(u)
        
        # we use a random playout policy
        a = np.random.choice(get_actions(state))
        state = result(state, player, a)
        #print(state)
        
        # switch between players
        if player == 'o': 
            player = 'x'
        else: 
            player = 'o'


board = empty_board()
print(playout(board, 0))
print(playout(board, 0))
print(playout(board, 0))

1
1
1


In [12]:
def playouts(board, action, N = 100):
    """Perform N playouts following the given action for the given board."""
    return [playout(board, action) for i in range(N)]

p = playouts(board, 0)
print(p)

print(f"mean utility: {np.mean(p)}")
print(f"win probability: {(np.mean(p) + 1)/2}")

[-1, 1, 1, 1, 1, 0, 1, -1, 0, -1, 1, 1, 1, 0, 1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 0, -1, 1, -1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, 0, 0, 1, -1, 0, 0, 1, 0, 1, 0, 1, -1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, -1, -1, 1, 1, 1, -1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 0, 1, 1, 1, 1, -1]
mean utility: 0.42
win probability: 0.71


__Note:__ This shows that the player who goes first has a significant advantage in pure random play.

## Choose the best action

Pure Monte Carlo Search (pmcs)

In [13]:
def pmcs(board, N = 100, debug = False):
    """Pure Monte Carlo Search. Returns the action that has the largest average utility."""
    ps = [[i, np.mean(playouts(board, i, N = N))] for i in get_actions(board)]

    if debug: display(ps)
        
    action = sorted(ps, key=lambda ps: ps[1], reverse=True)[0][0]
    return action

pmcs(board, debug = True)

[[0, 0.3],
 [1, 0.2],
 [2, 0.37],
 [3, 0.16],
 [4, 0.4],
 [5, 0.17],
 [6, 0.41],
 [7, 0.19],
 [8, 0.33]]

6

Looks like the center and the corners are a lot better.

## Some Tests

In [14]:
# x is about to win (play 8)

board = empty_board() 
board[0] = 'x'
board[1] = 'o'
board[3] = 'o'
board[4] = 'x'

print("Board:")
show_board(board)

print()
display(pmcs(board, debug = True))

Board:
[['x' 'o' ' ']
 ['o' 'x' ' ']
 [' ' ' ' ' ']]



[[2, 0.8], [5, 0.67], [6, 0.83], [7, 0.72], [8, 1.0]]

8

In [15]:
# o is about to win

board = empty_board() 
board[0] = 'o'
board[1] = 'o'
board[3] = 'o'
board[4] = 'x'
board[8] = 'x'

print("Board:")
show_board(board)

print()
display(pmcs(board, debug = True))

Board:
[['o' 'o' ' ']
 ['o' 'x' ' ']
 [' ' ' ' 'x']]



[[2, 0.06], [5, -0.74], [6, 0.02], [7, -0.8]]

2

In [16]:
#### x can draw if it chooses 7.

board = empty_board() 
board[0] = 'x'
board[1] = 'o'
board[2] = 'x'
board[4] = 'o'

print("Board:")
show_board(board)

print()
display(pmcs(board, debug = True))

Board:
[['x' 'o' 'x']
 [' ' 'o' ' ']
 [' ' ' ' ' ']]



[[3, -0.2], [5, -0.11], [6, -0.3], [7, -0.04], [8, -0.43]]

7

In [17]:
# o went first

board = empty_board() 
board[4] = 'o'

print("Board:")
show_board(board)


print()
display(pmcs(board, debug = True))

Board:
[[' ' ' ' ' ']
 [' ' 'o' ' ']
 [' ' ' ' ' ']]



[[0, -0.39],
 [1, -0.49],
 [2, -0.51],
 [3, -0.58],
 [5, -0.58],
 [6, -0.62],
 [7, -0.73],
 [8, -0.34]]

8

In [18]:
# Empty board: Only a draw an be guaranteed

board = empty_board() 

print("Board:")
show_board(board)


print()
display(pmcs(board, debug = True))

Board:
[[' ' ' ' ' ']
 [' ' ' ' ' ']
 [' ' ' ' ' ']]



[[0, 0.22],
 [1, 0.01],
 [2, 0.45],
 [3, 0.25],
 [4, 0.44],
 [5, 0.27],
 [6, 0.42],
 [7, -0.04],
 [8, 0.35]]

2

In [19]:
# A bad situation

board = empty_board() 
board[0] = 'o'
board[2] = 'x'
board[8] = 'o'

print("Board:")
show_board(board)


print()
display(pmcs(board, debug = True))

Board:
[['o' ' ' 'x']
 [' ' ' ' ' ']
 [' ' ' ' 'o']]



[[1, -0.82], [3, -0.47], [4, 0.16], [5, -0.56], [6, -0.2], [7, -0.53]]

4