#### Game Setup

- Player 1 is 'X', indexed by 0. It is also status of board when Player 1 Wins.
- Player 2 is 'O', indexed by 1. It is also status of board when Player 2 Wins. 
- An unfilled spot is ".", indexed by 2. And 2 is also the status when game is incomplete.
- 3 indicates the game is a tie. 

In [1]:
import numpy as np
import pandas as pd
import random
from numba import jit
import warnings
warnings.filterwarnings('ignore')

def print_board(board, depth = 0):
    mapping = {0:'x',1:'o',2:'.'}
    x = list(map(lambda x: mapping[x] if x in mapping else x, board))
    print('\t'*depth,x[0],'|',x[1],'|',x[2])
    print('\t'*depth,'---------')
    print('\t'*depth,x[3],'|',x[4],'|',x[5])
    print('\t'*depth,'---------')
    print('\t'*depth,x[6],'|',x[7],'|',x[8])
    
@jit
def permissible_actions(x):
    return np.where(x == 2)[0]    

@jit
def random_action(board):
    return np.random.choice(permissible_actions(board))

@jit
def check_status(board):
    # Define all possible winning combinations
    winning_combinations = [
        [0, 1, 2], [3, 4, 5], [6, 7, 8],  # Rows
        [0, 3, 6], [1, 4, 7], [2, 5, 8],  # Columns
        [0, 4, 8], [2, 4, 6]             # Diagonals
    ]
    
    # Check each winning combination for a win
    for combo in winning_combinations:
        if board[combo[0]] == board[combo[1]] == board[combo[2]] and board[combo[0]] != 2:
            return board[combo[0]]
    
    # Check for a draw
    if 2 not in board:
        return 3
    
    return 2

    
@jit
def visualize_game(list_x):
    for x in list_x:
        print_board(x[-1])

def update_board(board,action_idx):
    player_idx, opponent_idx = who_to_move(board)
    board_new = board.copy()
    if player_idx == 0:
        board_new[action_idx] = 0
    else:
        board_new[action_idx] = 1
    return board_new


@jit
def who_to_move(board):
    if np.sum(np.where(board==0,1,0))>np.sum(np.where(board==1,1,0)):
        return 1,0
    else:
        return 0,1

@jit
def play_random_game(verbose = 0):
    board = np.array([2,2,2,2,2,2,2,2,2])
    game_history = []
    turn_count = 0
    status = 2
    while True:
        player_idx, opponent_idx = who_to_move(board)
        status = check_status(board)
        if status != 2:
            status_final = status
            break
        turn_count += 1
        action_idx = random_action(board)
        board_new = update_board(board,action_idx)
        game_history.append([board, status, turn_count, player_idx, opponent_idx, action_idx, board_new])
        board = board_new.copy()
    if verbose == 1:
        visualize_game(game_history)
    return game_history, status

@jit
def generate_random_board():
    game_history, status = play_random_game()
    boards = [events[-1] for events in game_history]
    return random.choice(boards)

#### Monte Carlo Tree Search 

- Used when minimax is impossible due to the depth or breath of the game tree. 
- Two main ideas: 
    - True value of an action given a state may be evaluated by random simulations from that state. 
    - We can move from a random policy to the optimal policy, by using these simulations to update policy. 
- From any board / position / state, we conduct the following steps: 
    - Selection: We use TREEPOLICY to descend through the tree, until we find a "expandable" node. This node is non-terminal, and has unvisited children. We select this node for simulation. 
    - Expansion: From selected node, we add few children to tree. 
    - Simulation: From these new children, we simulate the game until termination using SIMULATEPOLICY. We compute the final result (win/draw/loss).
    - Backpropagation: using the final result we travel back through the tree and update the scores of each node on the path from the root to this child.
- These 4 steps are repeated until our computational budget is exhausted. 
- Two types of policies: 
    - TREEPOLICY: Used for selection and expansion. e.g. Greedy, based on the relative scores. 
    - SIMULATEPOLICY: Used for simulation from the new nodes added. e.g. Uniformly random. 
- Three key nodes: 
    - Root Node: the board position we are evaluating. 
    - Selected Node: the node from which we will add children to the tree
    - 