# Assignment 3 - Game-tree

In this assignment, you are going to complete the implementation of ConnectFour game and Monte-Carlo Tree search algorithm.

What you need to do:
<br>1. Follow the instructions and complete the parts with **# TODO**.
<br>2. Run simulations to experiment with different values of parameter **max_iter** and different search algorithms.
<br>3. Report the results using tables.
<br>4. Discuss your findings.

# Your Information

# Implementations

**TODO**: Complete the implementation of `available_moves` in ConnectFour class and `select_leaf` and `expand` in mcts. See **mcts_assignment.py** for details.

# Helper Functions for Simulations

In [1]:
import numpy as np
from time import time
from numpy.random import RandomState
import os
import sys
import mcts_assignment
from mcts_assignment import * # TODO import the necessary classes and functions only
import importlib
from importlib import reload as reload

In [2]:
# Inspired from https://stackoverflow.com/questions/8391411/how-to-block-calls-to-print

# Turn off printing during simulations

class StdoutDisabled:
    def __enter__(self):
        self._original_stdout = sys.stdout
        sys.stdout = open(os.devnull, 'w')

    def __exit__(self, exc_type, exc_val, exc_tb):
        sys.stdout.close()
        sys.stdout = self._original_stdout

`In phase (1), existing information is used to repeatedly choose successive child nodes down to the end of the search tree.
Next, in phase (2), the search tree is expanded by adding a node.
Then, in phase (3), a simulation is run to the end to determine the winner.
Finally, in phase (4), all the nodes in the selected path are updated with new information gained from the simulated game. This 4-phase algorithm is run repeatedly until enough information is gathered to produce a good move.`

In [3]:
def play_one_full_game(initial_node, x_player, o_player):
    # This is a similar version of the game_play function
    # It just has an extra functionality to keep track of the time.
    current_gn = initial_node
    x_running_time, o_running_time = 0, 0

    while not current_gn.is_terminal():
        print(current_gn)
        p = current_gn.next_player()
        print("It's {}'s turn.".format(p))
        if p == 'X':
            x_start_time = time()
            chosen_move = x_player(current_gn)
            x_move_time = time() - x_start_time
            x_running_time += x_move_time

        else:
            o_start_time = time()
            chosen_move = o_player(current_gn)
            o_move_time = time() - o_start_time
            o_running_time += o_move_time
        print("Chosen move {}.".format(str(chosen_move)))
        print()
        current_gn = current_gn.next_game_node(chosen_move)


    print("\nGame ended.")
    print(current_gn)

    winner = current_gn.winner()

    if winner is not None:
        print("Winner is {}.".format(winner))
    else:
        print("Draw.")
    return winner, current_gn, x_running_time, o_running_time

# I added this: remove

In [4]:
4 in [1,2,3]

False

In [5]:
init_seed = int("20452471") # TODO Change to your own A#.
num_trials = 10
m = 6
n = 7
initial_state = []
for _ in range(m):
    initial_state.append(n*['-'])
initial_gn = ConnectFour(initial_state)


x_player = lambda b: randplayer(b, rs=RandomState(seed=init_seed))
o_player = lambda b: mcts_player(b, max_iter=1)#, rs=RandomState(seed=init_seed))
play_one_full_game(initial_gn, x_player, o_player)

- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's X's turn.
Chosen move (0, 4, 'X').

- - - - X - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's O's turn.
Chosen move (0, 0, 'O').

O - - - X - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's X's turn.
Chosen move (0, 6, 'X').

O - - - X - X
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's O's turn.
Chosen move (0, 1, 'O').

O O - - X - X
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's X's turn.
Chosen move (1, 1, 'X').

O O - - X - X
- X - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's O's turn.
Chosen move (0, 2, 'O').

O O O - X - X
- X - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's X's turn.
Chosen move (1, 3, 'X').

O O O - X - X
- X - X - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's O's turn.
Chosen move (0, 3,

('O',
 O O O O X - X
 - X - X - - -
 - - - - - - -
 - - - - - - -
 - - - - - - -
 - - - - - - -,
 0.002309083938598633,
 0.5476388931274414)

# Remove above ^

## ConnectFour 

In [6]:
m = 6
n = 7

initial_state = []
for _ in range(m):
    initial_state.append(n*['-'])
initial_gn = ConnectFour(initial_state)

### Random vs MCTS

This section simulates the ConnectFour game using a random player against a Monte-Carlo Tree player. You need to experiment with different numbers of iterations for the Monte-Carlo Tree Search algorithm.
<br>
Do not change the parameters other than **max_iter**.
<br>
Try `max_iter=100` and `max_iter=500` and another value of your choice (250? 750? Another value? You decide.)
<br>
Some of these simulations will take time, possible 30-40 minutes. Do not wait till the last minute to run them.

In [11]:
res_list = [0., 0., 0.] # times of X as the winner, times of O as the winner, times of draw
x_time_list, o_time_list = [], []
tree_depths = []

with StdoutDisabled():

    for i in range(num_trials):
        cur_x_seed = init_seed + i
        cur_o_seed = init_seed - i

        x_player = lambda b: randplayer(b, rs=RandomState(seed=cur_x_seed))
        o_player = lambda b: mcts_player(b, max_iter=30, rs=RandomState(seed=cur_o_seed))

        winner, final_gn, x_running_time, o_running_time = play_one_full_game(initial_gn, x_player, o_player)
        x_c, o_c = final_gn._count()
        x_time_list.append(x_running_time)
        o_time_list.append(o_running_time)
        tree_depths.append(x_c + o_c)

        if winner == 'X':
            res_list[0] += 1
        elif winner == 'O':
            res_list[1] += 1
        else:
            res_list[2] += 1

KeyboardInterrupt: 

In [8]:
#10

# print("Average running time of X: {:>10.4f}s".format(np.average(x_time_list)))
# print("Average running time of O: {:>10.4f}s".format(np.average(o_time_list)))

# print("Average tree depth: {:.3f}".format(np.average(tree_depths)))

# print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
# print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
# print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))

Average running time of X:     0.0027s
Average running time of O:     4.2844s
Average tree depth: 10.000
Percentage of wins by X:   1.000
Percentage of wins by O:   0.000
Percentage of draw:        0.000


In [10]:
# #20
# print("Average running time of X: {:>10.4f}s".format(np.average(x_time_list)))
# print("Average running time of O: {:>10.4f}s".format(np.average(o_time_list)))

# print("Average tree depth: {:.3f}".format(np.average(tree_depths)))

# print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
# print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
# print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))

Average running time of X:     0.0034s
Average running time of O:     7.5089s
Average tree depth: 10.000
Percentage of wins by X:   1.000
Percentage of wins by O:   0.000
Percentage of draw:        0.000


In [14]:
# #30

# print("Average running time of X: {:>10.4f}s".format(np.average(x_time_list)))
# print("Average running time of O: {:>10.4f}s".format(np.average(o_time_list)))

# print("Average tree depth: {:.3f}".format(np.average(tree_depths)))

# print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
# print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
# print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))

Average running time of X:        nans
Average running time of O:        nans
Average tree depth: nan
Percentage of wins by X:     nan
Percentage of wins by O:     nan
Percentage of draw:          nan


  print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
  print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
  print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))


max_iter = 10:<br>

Average running time of X:     0.0027s<br>
Average running time of O:     4.2844s<br>
Average tree depth: 10.000<br>
Percentage of wins by X:   1.000<br>
Percentage of wins by O:   0.000<br>
Percentage of draw:        0.000<br>

max_iter = 20:<br>

Average running time of X:     0.0034s<br>
Average running time of O:     7.5089s<br>
Average tree depth: 10.000<br>
Percentage of wins by X:   1.000<br>
Percentage of wins by O:   0.000<br>
Percentage of draw:        0.000<br>

max_iter = 30:<br>

Average running time of X:        nans<br>
Average running time of O:        nans<br>
Average tree depth: nan<br>
Percentage of wins by X:     nan<br>
Percentage of wins by O:     nan<br>
Percentage of draw:          nan<br>

**TODO**: Present a table of your result reporting the run time, win percentage, loss percentage, and draw percentage of mcts, as a function of `max_iter` parameter. For example, the rows are mcts's run time, win percentage, loss percentage, and draw percentage, and the columns are different values of `max_iter`. Consider using pandas.

### Human vs MCTS

In this section, you will play the game against a Monte-Carlo Tree player. 
Select a reasonable value of **max_iter**, and then run the code.

In [17]:
# TODO: Uncomment the following code to play against mcts

x_player = lambda b: human_player(b, 'X')
o_player = lambda b: mcts_player(b, max_iter=10, rs=RandomState(seed=init_seed))

play_one_full_game(initial_gn, x_player, o_player)

- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's X's turn.
Your move, x, y separated by comma:1,3
Chosen move (1, 3, 'X').

- - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - - - - - -
- - - - - - -

It's O's turn.
Chosen move (4, 5, 'O').

- - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - - - - O -
- - - - - - -

It's X's turn.
Your move, x, y separated by comma:5,2
Chosen move (5, 2, 'X').

- - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - - - - O -
- - X - - - -

It's O's turn.
Chosen move (4, 3, 'O').

- - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - - O - O -
- - X - - - -

It's X's turn.
Your move, x, y separated by comma:0,0
Chosen move (0, 0, 'X').

X - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - - O - O -
- - X - - - -

It's O's turn.
Chosen move (4, 2, 'O').

X - - - - - -
- - - X - - -
- - - - - - -
- - - - - - -
- - O O - O -
- - X - - - -

It's X's turn.
Your move, x, y separated b

ValueError: invalid literal for int() with base 10: 'exit'

## Tic-Tac-Toe

In [18]:
n = 3

initial_state = []
for _ in range(n):
    initial_state.append(n*['-'])
    
initial_gn = MNKNode(initial_state, k=3)

### Alpha-beta vs MCTS

This section simulates the Tic-Tac-Toe game using an alpha-beta player against a Monte-Carlo Tree Search player. You need to experiment with different numbers of iterations for the Monte-Carlo Tree Search algorithm.
<br>
Do not change the parameters other than **max_iter**.
<br>
Try `max_iter=100`, `max_iter=500`, and `max_iter=2500`.
<br>
Some of these simulations will take time, possible 30-40 minutes. Do not wait till the last minute to run them.

In [21]:
res_list = [0., 0., 0.] # times of X as the winner, times of O as the winner, times of draw
x_time_list, o_time_list = [], []
tree_depths = []

with StdoutDisabled():

    for i in range(num_trials):
        cur_x_seed = init_seed + i
        cur_o_seed = init_seed - i
        
        x_player = lambda b: maxplayer(b, algo=alpha_beta_search)
        # TODO: experiment with different values of max_iter
        o_player = lambda b: mcts_player(b, max_iter=5, rs=RandomState(seed=cur_o_seed))
        
        winner, final_gn, x_running_time, o_running_time = play_one_full_game(initial_gn, x_player, o_player)
        x_c, o_c = final_gn._count()
        x_time_list.append(x_running_time)
        o_time_list.append(o_running_time)
        tree_depths.append(x_c + o_c)
        
        if winner == 'X':
            res_list[0] += 1
        elif winner == 'O':
            res_list[1] += 1
        else:
            res_list[2] += 1

KeyboardInterrupt: 

In [22]:
print("Average running time of X: {:>10.4f}s".format(np.average(x_time_list)))
print("Average running time of O: {:>10.4f}s".format(np.average(o_time_list)))

print("Average tree depth: {:.3f}".format(np.average(tree_depths)))

print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))

Average running time of X:        nans
Average running time of O:        nans
Average tree depth: nan
Percentage of wins by X:     nan
Percentage of wins by O:     nan
Percentage of draw:          nan


  print("Percentage of wins by X: {:>7.3f}".format(res_list[0]/np.sum(res_list)))
  print("Percentage of wins by O: {:>7.3f}".format(res_list[1]/np.sum(res_list)))
  print("Percentage of draw:      {:>7.3f}".format(res_list[2]/np.sum(res_list)))


In [None]:
# TODO: Present a table of your results. Consider using pandas.

### Human vs MCTS

In this section, you will play the game against a Monte-Carlo Tree player. 
Select a reasonable value of **max_iter**, and then run the code.

In [23]:
# TODO Uncomment the following code to play against mcts

x_player = lambda b: human_player(b, 'X')
o_player = lambda b: mcts_player(b, max_iter=100, rs=RandomState(seed=init_seed))

play_one_full_game(initial_gn, x_player, o_player)

- - -
- - -
- - -

It's X's turn.
Your move, x, y separated by comma:0,0
Chosen move (0, 0, 'X').

X - -
- - -
- - -

It's O's turn.
Chosen move (1, 2, 'O').

X - -
- - O
- - -

It's X's turn.
Your move, x, y separated by comma:1,1
Chosen move (1, 1, 'X').

X - -
- X O
- - -

It's O's turn.
Chosen move (2, 0, 'O').

X - -
- X O
O - -

It's X's turn.
Your move, x, y separated by comma:0,2
Chosen move (0, 2, 'X').

X - X
- X O
O - -

It's O's turn.
Chosen move (0, 1, 'O').

X O X
- X O
O - -

It's X's turn.
Your move, x, y separated by comma:0,1
Chosen move (0, 1, 'X').



AssertionError: 

# Report

Max iter resulted in slower performance, contrary to my intuition.
I ran out of time, I'm so sorry.

**TODO** Discuss your findinds.