# Adversarial Search: Playing Connect 4


Nicholas Larsen

## Instructions

Total Points: Undegraduates 10, graduate students 11

Complete this notebook and submit it. The notebook needs to be a complete project report with your implementation, documentation including a short discussion of how your implementation works and your design choices, and experimental results (e.g., tables and charts with simulation results) with a short discussion of what they mean. Use the provided notebook cells and insert additional code and markdown cells as needed.

## Introduction

You will implement different versions of agents that play Connect 4:

> "Connect 4 is a two-player connection board game, in which the players choose a color and then take turns dropping colored discs into a seven-column, six-row vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own discs." (see [Connect Four on Wikipedia](https://en.wikipedia.org/wiki/Connect_Four))

In [1]:
import sys  
sys.path.insert(0, '.\\')

## Task 1: Defining the Search Problem [1 point]

Define the components of the search problem:

* Initial state - The initial state is 
* Actions - A column number for where the player drops their playing piece. It will fall to the lowest open place for that column.
* Transition model - Place a piece in the column that was selected by the player
* Goal state - Create 4 pieces in a row



* **Initial state** - The initial state is 
* **Actions** - A column number for where the player drops their playing piece. It will fall to the lowest open place for that column.
* **Transition model** - Place a piece in the column that was selected by the player
* **Goal state** - Create 4 pieces in a row

How big is the search space?

TODO: calculate this state space

__Note:__ The search space for a $6 \times 7$ board is large. You can experiment with smaller boards (the smallest is $4 times \4$) and/or changing the winning rule to connect 3 instead of 4.

## Task 2: Game Environment and Random Agent [2 point]

Use a numpy character array as the board.

In [4]:
import numpy as np

def empty_board(shape=(6, 7)):
    return np.full(shape=shape, fill_value=' ')

def fill_board(board):
    s = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    i = 0
    for row in range(len(board)):
        for col in range(len(board[0])):
            board[row][col] = s[i%len(s)]
            i+=1

print(empty_board())

[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']]


Instead of colors for the players use 'x' and 'o' to represent the players. Make sure that your agent functions all have the from: `agent_type(board, player = 'x')`, where board is the current board position and player is the player whose next move it is and who the agent should play.

Implement the board and helper functions for:

* The transition model (result).
* The utility function.
* Check for terminal states.
* A check for available actions.
* A function to visualize the board.

Make sure that all these functions work with boards of different sizes.

Implement an agent that plays randomly and let two random agents play against each other 1000 times. How often does each player win? Is the result expected? 

In [95]:
type(board[0][3])

numpy.str_

In [97]:
board = empty_board()
if type(board[0][3]) != np.str_:
    print('col',board[0][3], 'b')
    print('board',board)

In [6]:
import random

def flip_player(player):
    if player == 'x':
        return 'o'
    else:
        return 'x'
    
def actions(board):
    options = []
    for col in range(0, len(board[0])):
        if board[0][col] == ' ':
            options.append(col)
    return options

def random_order_actions(board):
    options = actions(board)
    random.shuffle(options)
    return options

def agent_random(board, player = 'x'):
    options = actions(board)
    if len(options) == 0:
        return None
    return random.choice(options)
            
# Environment methods
def place_piece(board, col, player='x'):
    """Place the player piece in the col on the board"""
    if board[0][col] != ' ':
        return False
    
    next_open_row = len(board) - 1 # if the col is empty use the bottom row
    for row in range(0, len(board)):
        if board[row][col] != ' ':
            next_open_row = row - 1
            break
    
    board[next_open_row][col] = player


def result(board, player, action):
    board_copy = board.copy()
    place_piece(board_copy, action, player)
    return board_copy


def check_series_for_winner(series):
    """Check if this line (could be anywhere)
    has 4 in a row"""
    count = 0
    prev = ' '
    for current_space in series:
        if current_space != ' ' and current_space == prev:
            count += 1
        elif current_space != ' ' and current_space != prev:
            count = 1
            prev = current_space
        elif current_space == ' ':
            prev = ' '
            count = 0
        
        if count == 4:
            return prev

def get_all_series(board):
    """Get all of the rows/cols/diags that could be winning streaks"""
    num_cols = len(board[0])
    num_rows = len(board)
    
    runs = []
    for row in range(0, num_rows):
        runs.append(board[row])
    for col in range(0, num_cols):
        runs.append(board[:,col])
        
    for offset in range(1-num_rows, num_cols):
        runs.append(np.diagonal(board, offset=offset))
        runs.append(np.diagonal(np.fliplr(board), offset=offset))
    # get all diags
    max_row_cols = max(num_rows, num_cols)

    return runs

def terminal(board):
    """Determine if a player has won"""
    num_cols = len(board[0])
    num_rows = len(board)
    if len(actions(board)) == 0:
        return 'c'
    
    series = get_all_series(board)
    for s in series:
        winner = check_series_for_winner(s)
        if winner is not None:
            return winner
    return ' '

def utility(board, player = 'x'):
    """check is a state a terminal state, return the utility if it is.
    None means not terminal"""
    
    winner = terminal(board)
    if winner == player:
        return +1
    if winner == flip_player(player):
        return -1
    if winner == 'c':
        return 0
    return None

def play_list_plays(moves, board, starting_player='x'):
    for move in moves:
        place_piece(board=board, player=starting_player, col=move)
        starting_player = flip_player(starting_player)

In [4]:
def pretty_board(board):
    for i in range(0,board.shape[1]+board.shape[1]+1):
        print('-',end='')
    print()
    for row in board:
        print('|', end='')
        for piece in row:
            print(piece, end='')
            print('|',end='')
        print() 
    for i in range(0,board.shape[1]+board.shape[1]+1):
        print('-',end='')
    print()

In [5]:

def play_game(player_x, player_o, board_shape = (6, 7), verbose=False):
    players_turn = 'x'
    
    board = empty_board(board_shape)
    while True:
        col = -1
        if players_turn == 'x':
            col = player_x(board, players_turn)
        else:
            col = player_o(board, players_turn)
        board = result(board=board, player=players_turn, action=col)
        if verbose:
            print('Player: ', players_turn, 'placing in col', col)
        winner = terminal(board)
        if winner == 'c':
            return ('c', board)
        elif winner != ' ':
            return (players_turn, board)
        
        players_turn = flip_player(players_turn)


### Test Utility

In [6]:
board = empty_board()
pretty_board(board)
print(utility(board))
play_list_plays([3,2,3,1,3,0,3], board)
pretty_board(board)
print('x:', utility(board, player='x'), 'o:', utility(board, player='o'))
board = empty_board()
play_list_plays([0,1,1,2,3,2,2,5,3,3,3], board)
pretty_board(board)
print('x:', utility(board, player='x'), 'o:', utility(board, player='o'))
board = empty_board()
play_list_plays([0,1,1,2,3,2,2,5,3,3,3], board, starting_player='o')
pretty_board(board)
print('x:', utility(board, player='x'), 'o:', utility(board, player='o'))
board = empty_board(shape = (5,6))
play_list_plays([2,2,1,1,0,1,1,2,2,0,0], board)
print('x:', utility(board, player='x'), 'o:', utility(board, player='x'))

---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
None
---------------
| | | | | | | |
| | | | | | | |
| | | |x| | | |
| | | |x| | | |
| | | |x| | | |
|o|o|o|x| | | |
---------------
x: 1 o: -1
---------------
| | | | | | | |
| | | | | | | |
| | | |x| | | |
| | |x|o| | | |
| |x|o|x| | | |
|x|o|o|x| |o| |
---------------
x: 1 o: -1
---------------
| | | | | | | |
| | | | | | | |
| | | |o| | | |
| | |o|x| | | |
| |o|x|o| | | |
|o|x|x|o| |x| |
---------------
x: -1 o: 1
x: None o: None


### Random V Random

In [7]:
x_ct = 0
o_ct = 0
c_ct = 0
games = []
for i in range(1000):
    game = play_game(agent_random, agent_random, verbose=False)
    winner = game[0]
    if winner == 'x':
        x_ct += 1
    elif winner == 'o':
        o_ct += 1
    elif winner == 'c':
        c_ct += 1
    games.append(game)
print('X won:', x_ct, '|O won', o_ct, '|Cat games', c_ct)

X won: 555 |O won 440 |Cat games 5


## Task 3: Minimax Search with Alpha-Beta Pruning [4 points]

### Implement the search starting from a given board and specifying the player.

In [8]:
import math
import random
DEBUG = 1 # 1 ... count nodes, 2 ... debug each node
COUNT = 0
def alpha_beta_search(board, player='x', actions=random_order_actions):
    global DEBUG, COUNT
    COUNT = 0
    
    value, move = max_value_ab(board, player, -math.inf, +math.inf, actions=actions)
    
    if DEBUG >= 1: print(f"Number of nodes searched: {COUNT}") 
        
    return value, move

def max_value_ab(board, player, alpha, beta, actions=actions):
    """Player's best move"""
    v = utility(board, player)
    global DEBUG, COUNT
    COUNT += 1
    if DEBUG >= 2:
        print("max: \n" + str(board) + str([alpha, beta, v])) 
    if v is not None: return v, None
    
    v, move = -math.inf, None
    
    moves = actions(board)
    for a in moves:
        v2, a2 = min_value_ab(result(board, player, a), player, alpha, beta, actions=actions)
        if v2 > v:
            v, move = v2, a
            alpha = max(alpha, v)
        if v >= beta:
            return v, move
    return v, move

def min_value_ab(board, player, alpha, beta, actions=actions):
    """opponent's best response"""
    global DEBUG, COUNT
    COUNT += 1
    
    #return utility if state is terminal state
    v = utility(board, player)
    if DEBUG >= 2: print("min: \n" + str(board) + str([alpha, beta, v]))
    if v is not None: return v, None
    
    v, move = +math.inf, None
    
    moves = actions(board)
    for a in moves:
        v2, a2 = max_value_ab(result(board, flip_player(player), a),player,alpha,beta, actions=actions)
        if v2 < v:
            v, move = v2, a
            beta = min(beta, v)
        if v <= alpha:
            return v, move
        
    return v, move

Experiment with some manually created boards (at least 5) to check if the agent spots wining opportunities.

### Find winning move

In [9]:
%%time
board = empty_board(shape = (4,5))
moves = [2,2,1,1,0,1,1,2]
play_list_plays(moves, board)
pretty_board(board)
DEBUG = 0
display(alpha_beta_search(board, player='x'))

-----------
| |x| | | |
| |o|o| | |
| |o|o| | |
|x|x|x| | |
-----------


(1, 3)

Wall time: 117 ms


### Play through this game (1)

In [10]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| | | | |
| | | | |
| | | |o|
| |x|x|o|
|x|o|o|x|
---------


(1, 1)

Wall time: 192 ms


In [11]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='o'))

---------
| | | | |
| | | | |
| |x| |o|
| |x|x|o|
|x|o|o|x|
---------


(-1, 3)

Wall time: 122 ms


In [12]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| | | | |
| | | |o|
| |x| |o|
| |x|x|o|
|x|o|o|x|
---------


(1, 3)

Wall time: 30 ms


In [13]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='o'))

---------
| | | |x|
| | | |o|
| |x| |o|
| |x|x|o|
|x|o|o|x|
---------


(-1, 1)

Wall time: 23 ms


In [15]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| | | |x|
| |o| |o|
| |x| |o|
| |x|x|o|
|x|o|o|x|
---------


(1, 1)

Wall time: 9 ms


In [17]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1,1]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='o'))

---------
| |x| |x|
| |o| |o|
| |x| |o|
| |x|x|o|
|x|o|o|x|
---------


(-1, 2)

Wall time: 11 ms


In [18]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1,1,2]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| |x| |x|
| |o| |o|
| |x|o|o|
| |x|x|o|
|x|o|o|x|
---------


(1, 2)

Wall time: 4 ms


In [19]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1,1,2,2]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='o'))

---------
| |x| |x|
| |o|x|o|
| |x|o|o|
| |x|x|o|
|x|o|o|x|
---------


(-1, 2)

Wall time: 4.97 ms


In [20]:
%%time
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1,1,2,2,2]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| |x|o|x|
| |o|x|o|
| |x|o|o|
| |x|x|o|
|x|o|o|x|
---------


(1, 0)

Wall time: 6 ms


In [23]:
%%time
# the rest of the moves are in 0 and x wins, like was predicted in the first cell
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,1,3,3,1,1,2,2,2,0,0,0]
play_list_plays(moves, board)
pretty_board(board)
display(alpha_beta_search(board, player='x'))

---------
| |x|o|x|
|x|o|x|o|
|o|x|o|o|
|x|x|x|o|
|x|o|o|x|
---------


(1, None)

Wall time: 5.96 ms


### We can see here that 'o' Makes a block when x has a chance at winning

In [51]:
%%time
board = empty_board(shape = (5,5))
play_list_plays(moves=[2,2,1,1,0,1,1,2,2],
                board=board,
                starting_player='x')
pretty_board(board)
DEBUG = 1
display(alpha_beta_search(board, player='o'))

-----------
| | | | | |
| |x|x| | |
| |o|o| | |
| |o|o| | |
|x|x|x| | |
-----------
Number of nodes searched: 25516


(1, 3)

Wall time: 2.29 s


## How long does it take to decide the first move?

In [24]:
%%time
DEBUG = 1
board = empty_board(shape=(4,4))
display(alpha_beta_search(board, player='x'))

Number of nodes searched: 96974


(0, 2)

Wall time: 5.61 s


In [153]:
%%time
DEBUG = 1
board = empty_board(shape=(4,5))
display(alpha_beta_search(board, player='x'))

Number of nodes searched: 10992089


(0, 1)

Wall time: 12min 34s


### Move ordering

My move ordering will explore the paths in the center before other moves. 
* actions_middle_first
 * Returns the middle actions first if they are available

In [26]:
def actions_middle_first(board):
    """Return the option of moves in the order starting with the middle"""
    moves = actions(board) # the moves in order
    num_cols = board.shape[1]
    new_order = []
    if board[0][num_cols//2] == ' ':
        new_order.append(num_cols//2)
    for i in range(1, num_cols//2+1):
        idx = (num_cols//2) + i
        if idx < num_cols and board[0][idx] == ' ':
            new_order.append(idx)
        idx = (num_cols//2) - i
        if idx >= 0 and board[0][idx] == ' ':
            new_order.append(idx)
    return new_order

board = empty_board(shape=(4,4))
print(actions_middle_first(board))
board = empty_board(shape=(4,5))
print(actions_middle_first(board))
board = empty_board(shape=(4,5))
play_list_plays([2,2,2,2], board)
print(actions_middle_first(board))
board = empty_board()
play_list_plays([3,3,3,3,3,3], board)
print(actions_middle_first(board))

[2, 3, 1, 0]
[2, 3, 1, 4, 0]
[3, 1, 4, 0]
[4, 2, 5, 1, 6, 0]


In [27]:
%%time
board = empty_board(shape=(4,4))
pretty_board(board)
display(alpha_beta_search(board, player='x',actions=random_order_actions))

---------
| | | | |
| | | | |
| | | | |
| | | | |
---------
Number of nodes searched: 98123


(0, 0)

Wall time: 6.13 s


In [28]:
%%time
board = empty_board(shape=(4,4))
display(alpha_beta_search(board, player='x',actions=actions_middle_first))

Number of nodes searched: 54505


(0, 2)

Wall time: 3.46 s


In [29]:
%%time
board = empty_board(shape=(4,5))
display(alpha_beta_search(board, player='x',actions=actions_middle_first))

Number of nodes searched: 5017045


(0, 2)

Wall time: 5min 47s


### Playtime

Let the Minimax Search agent play a random agent on a small board. Analyze wins, losses and draws.

In [115]:
import sys
def mini_max_ab_agent(board, player='x'):
    v, col = alpha_beta_search(board=board, player=player)

def mini_max_ab_agent_middle(board, player='x'):
    v, col = alpha_beta_search(board=board, player=player, actions=actions_middle_first)
    return col

DEBUG = 0

game_number = 0
turn_number = 0
def play_game(player_x, player_o, board_shape = (6, 7), verbose = 0):
    
    global game_number, turn_number, DEBUG
    players_turn = 'x'
    
    turn_number = 0
    board = empty_board(board_shape)
    while True:
        col = -1
        if players_turn == 'x':
            col = player_x(board, players_turn)
        else:
            col = player_o(board, players_turn)  
            
        board = result(board=board, player=players_turn, action=col)
        
        if verbose == 2:
            pretty_board(board)
        
        if verbose >= 2:
            print('Player: ', players_turn, 'placing in col', col)
        winner = terminal(board)
        if winner == 'c':
            return ('c', board)
        elif winner != ' ':
            return (players_turn, board)
        
        
        turn_number+=1
        players_turn = flip_player(players_turn)
    if verbose >= 1:
        pretty_board(board)

def play_games(N, agent_x, agent_o, shape):
    winnings =[0, 0, 0]
    global game_number, turn_number
    game_number = 1
    for i in range(0, N):
        result_data = play_game(agent_x,
                                agent_o,
                                board_shape=shape)
        turn_number = 0
        game_number+=1

        if result_data[0] == 'x': winnings[0]+=1
        if result_data[0] == 'o': winnings[1]+=1
        if result_data[0] == 'c': winnings[2]+=1

        sys.stderr.write('\rGameNumber:%d' % (game_number))
        sys.stderr.flush()

        print
    return winnings

#game_number = 0
#turn_number = 0
#winnings = play_games(100, mini_max_ab_agent_middle, agent_random, shape=(4,4))
#print('x won:', winnings[0], 'o won', winnings[1], 'tie ', winnings[2])


## Task 4: Heuristic Alpha-Beta Tree Search [3 points] 

### Heuristic evaluation function

this Heuristic will award 3 points for every open square on either side of 3 pieces in a row, and 2 points for each open side on 2 pieces in a row

In [35]:
def basic_heuristic(board, player='x'):
    series = get_all_series(board)
    util = utility(board, player)
    if util is not None: return util*100, True
    
    util = 0
    for s in series:
        as_string = ""
        as_string = as_string.join(s)
        # points for xxx
        
        idx = as_string.find(player*3)
        while idx >= 0:
            if idx-1 >= 0 and as_string[idx-1] == ' ':
                util += 3
            if idx >= 0 and idx+3 < len(as_string) and as_string[idx+3] == ' ':
                util += 3
            idx = as_string.find(player*3, idx+3)
            
        #points for xx
        idx = as_string.find(player*2)
        while idx >= 0:
            if idx-1 >= 0 and as_string[idx-1] == player:
                idx = as_string.find(player*2, idx+2)
                continue
            if idx+2 < len(as_string) and as_string[idx+2] == player:
                idx = as_string.find(player*2, idx+2)
                continue
            if idx-1 >= 0 and as_string[idx-1] == ' ':
                util += 2
            if idx > 0 and idx+2 < len(as_string) and as_string[idx+2] == ' ':
                util += 2
            idx = as_string.find(player*2, idx+2)
    return util, False

# Test 1
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,3,3,1,0]
play_list_plays(moves, board)
pretty_board(board)
print('H(x) =', basic_heuristic(board, player='x'),'H(x) =',
      'H(o) =',basic_heuristic(board, player='o'))
board = empty_board(shape = (6,7))
board = empty_board(shape = (5,4))

# Test 2
moves = [0,1,1,2,3,3,2,3,3,3,1,0,2]
play_list_plays(moves, board)
pretty_board(board)
print('H(x) =', basic_heuristic(board, player='x'),'H(x) =',
      'H(o) =',basic_heuristic(board, player='o'))
board = empty_board(shape = (6,7))

# Test 3
moves = [3,3,4,4,2]
play_list_plays(moves, board)
pretty_board(board)
print('H(x) =', basic_heuristic(board, player='x'),'H(x) =',
      'H(o) =',basic_heuristic(board, player='o'))

---------
| | | |o|
| | | |x|
| |x| |o|
|o|x|x|o|
|x|o|o|x|
---------
H(x) = (7, False) H(x) = H(o) = (0, False)
---------
| | | |o|
| | | |x|
| |x|x|o|
|o|x|x|o|
|x|o|o|x|
---------
H(x) = (100, True) H(x) = H(o) = (-100, True)
---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | |o|o| | |
| | |x|x|x| | |
---------------
H(x) = (6, False) H(x) = H(o) = (4, False)


### Cutting off search 

Modify your Minimax Search with Alpha-Beta Pruning to cut off search at a specified depth and use the heuristic evaluation function. Experiment with different cutoff values.

In [109]:
def alpha_beta_search_cutoff(board, cutoff = None, player = 'x', H=basic_heuristic):
    global DEBUG, COUNT
    COUNT = 0
    
    value, move = max_value_ab_cutoff(board, player, -math.inf, +math.inf, 0, cutoff, H)
    
    if DEBUG >= 1:
        print(f"Number of nodes searched (cutoff = {cutoff}): {COUNT})")
        
    return value, move

def max_value_ab_cutoff(state, player, alpha, beta, depth, cutoff, H):
    """Player's best move"""
    global DEBUG, COUNT
    COUNT += 1
    
    # cut off and terminal test
    v, terminal = H(state, player)
    if((cutoff is not None and depth >= cutoff) or terminal): 
        if(terminal): alpha, beta = v, v
        if DEBUG >= 2: print(f"stopped at {depth}: {state} term: {terminal} eval: {v} [{alpha}, {beta}]" ) 
        return v, None
    
    v, move = -math.inf, None

    # check all possible actions in the state, update alpha and return move with the largest value
    for a in actions_middle_first(state):
        v2, a2 = min_value_ab_cutoff(result(state, player, a), player, alpha, beta, depth + 1, cutoff, H)
        if v2 > v:
            v, move = v2, a
            alpha = max(alpha, v)
        if v >= beta: return v, move
    return v, move

def min_value_ab_cutoff(state, player, alpha, beta, depth, cutoff, H):
    """opponent's best response."""
    global DEBUG, COUNT
    COUNT += 1
    
    # cut off and terminal test
    v, terminal = H(state, player)
    #if((cutoff is not None and depth >= cutoff) or terminal): 
    # always let the opponent make her move
    if(terminal): 
        alpha, beta = v, v
        if DEBUG >= 2:
            print(f"stopped at {depth}: {state} term: {terminal} eval: {v} [{alpha}, {beta}]" ) 
        return v, None
    
    v, move = +math.inf, None

    # check all possible actions in the state, update beta and return move with the smallest value
    for a in actions_middle_first(state):
        v2, a2 = max_value_ab_cutoff(result(state, flip_player(player), a), player, alpha, beta, depth + 1, cutoff, H)
        if v2 < v:
            v, move = v2, a
            beta = min(beta, v)
        if v <= alpha: return v, move
    
    return v, move
    
    

Experiment with the same manually created boards as above to check if the agent spots wining opportunities.

In [36]:
# Test 1
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,3,3,1,0]
play_list_plays(moves, board)
pretty_board(board)
print('H(x) =', basic_heuristic(board, player='x'),'H(x) =',
      'H(o) =',basic_heuristic(board, player='o'))

%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

# Test 2
board = empty_board(shape =(6,7))
moves = [3,3,3,3,3,3,4,2,4]
play_list_plays(moves, board)
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

---------
| | | |o|
| | | |x|
| |x| |o|
|o|x|x|o|
|x|o|o|x|
---------
H(x) = (7, False) H(x) = H(o) = (0, False)


(100, 2)

6.34 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
---------------
| | | |o| | | |
| | | |x| | | |
| | | |o| | | |
| | | |x| | | |
| | | |o|x| | |
| | |o|x|x| | |
---------------


(17, 2)

32.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes


In [37]:
# Test 2
board = empty_board(shape =(4,4))
moves = [0,0,1,1,2]
play_list_plays(moves, board)
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='o'))
print(f"Looked through {COUNT} Nodes")

---------
| | | | |
| | | | |
|o|o| | |
|x|x|x| |
---------


(2, 3)

298 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes


How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns.

In [38]:
%%time
board = empty_board(shape =(4,4))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

---------
| | | | |
| | | | |
| | | | |
| | | | |
---------


(4, 2)

1.25 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
Wall time: 1.25 s


In [39]:
%%time
board = empty_board(shape =(4,5))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

-----------
| | | | | |
| | | | | |
| | | | | |
| | | | | |
-----------


(4, 2)

12.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
Wall time: 12.1 s


In [40]:
%%time
board = empty_board(shape =(4,6))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

-------------
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
-------------


(6, 3)

10.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
Wall time: 10.2 s


In [41]:
%%time
board = empty_board(shape =(6,6))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

-------------
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
-------------


(8, 3)

13.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
Wall time: 13.1 s


In [42]:
%%time
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))
print(f"Looked through {COUNT} Nodes")

---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------


(8, 3)

27.2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Looked through 236 Nodes
Wall time: 27.2 s


### Forward Pruning

Add forward pruning to the cutoff search where you do not consider moves that have a low evaluation value after a shallow search 
(way smaller than the cuttoff value).

In [132]:
def alpha_beta_search_cutoff(board, cutoff=None, fp=None, player = 'x', H=basic_heuristic):
    global DEBUG, COUNT
    COUNT = 0
    
    value, move = max_value_ab_cutoff(board, player, -math.inf, +math.inf, 0, cutoff, fp, H)
    
    if DEBUG >= 1:
        print(f"Number of nodes searched (cutoff = {cutoff}) (fp: {fp}) {COUNT}")
        
    # If the search could not come up with anything for the cutoff/fp
    # This should only happen in very short sided searches
    if move is None:
        move = agent_random(board)
    return value, move

def max_value_ab_cutoff(state, player, alpha, beta, depth, cutoff, fp, H):
    """Player's best move"""
    global DEBUG, COUNT
    COUNT += 1
    
    # cut off and terminal test
    v, terminal = H(state, player)
    oppo, terminal = H(state, flip_player(player))
    diff = v - oppo
    
    # difference here we will stop evaulating if pass the fp value of H
    if (cutoff is not None and depth >= cutoff) or \
        (fp is not None and diff < -fp) or terminal: 
        if(terminal): alpha, beta = v, v
        if DEBUG >= 2: print(f"stopped at {depth}: {state} term: {terminal} eval_diff: {diff} [{alpha}, {beta}]" ) 
        return v, None
    
    v, move = -math.inf, None

    # check all possible actions in the state, update alpha and return move with the largest value
    possibles = actions_middle_first(state)
    for a in possibles:
        v2, a2 = min_value_ab_cutoff(result(state, player, a),
                                     player,
                                     alpha,
                                     beta,
                                     depth + 1,
                                     cutoff,
                                     fp,
                                     H)
        if v2 > v:
            v, move = v2, a
            alpha = max(alpha, v)
        if v >= beta: return v, move
    return v, move

def min_value_ab_cutoff(state, player, alpha, beta, depth, cutoff, fp, H):
    """opponent's best response."""
    global DEBUG, COUNT
    COUNT += 1
    
    # cut off and terminal test
    v, terminal = H(state, player)
    #if((cutoff is not None and depth >= cutoff) or terminal): 
    # always let the opponent make her move
    if(terminal): 
        alpha, beta = v, v
        if DEBUG >= 2:
            print(f"stopped at {depth}: {state} term: {terminal} eval: {v} [{alpha}, {beta}]" ) 
        return v, None
    
    v, move = +math.inf, None

    # check all possible actions in the state, update beta and return move with the smallest value
    for a in actions_middle_first(state):
        v2, a2 = max_value_ab_cutoff(result(state, flip_player(player), a),
                                     player,
                                     alpha,
                                     beta,
                                     depth + 1,
                                     cutoff,
                                     fp,
                                     H)
        if v2 < v:
            v, move = v2, a
            beta = min(beta, v)
        if v <= alpha: return v, move
    
    return v, move
    

In [44]:
# Test 1
board = empty_board(shape = (5,4))
moves = [0,1,1,2,3,3,2,3,3,3,1,0]
play_list_plays(moves, board)
pretty_board(board)
print('H(x) =', basic_heuristic(board, player='x'),'H(x) =',
      'H(o) =',basic_heuristic(board, player='o'))

board = empty_board(shape=(4,4))
pretty_board(board)
DEBUG = 1
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=5, player='x'))
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=1, player='x'))


---------
| | | |o|
| | | |x|
| |x| |o|
|o|x|x|o|
|x|o|o|x|
---------
H(x) = (7, False) H(x) = H(o) = (0, False)
---------
| | | | |
| | | | |
| | | | |
| | | | |
---------
Number of nodes searched (cutoff = 10) (fp: 5) 8144


(4, 2)

1.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Number of nodes searched (cutoff = 10) (fp: 1) 5352


(2, 2)

1.29 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [45]:


# Test 2
board = empty_board(shape =(6,7))
moves = [3,3,3,3,3,3,4,2,4,2,4]
play_list_plays(moves, board)
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=5, player='x'))

# Test 3
board = empty_board(shape =(6,7))
moves = [3,3,3,3,3,3,4,2,4,2,4]
play_list_plays(moves, board)
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=1, player='x'))


---------------
| | | |o| | | |
| | | |x| | | |
| | | |o| | | |
| | | |x|x| | |
| | |o|o|x| | |
| | |o|x|x| | |
---------------
Number of nodes searched (cutoff = 10) (fp: 5) 5150


(100, 4)

1.85 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
---------------
| | | |o| | | |
| | | |x| | | |
| | | |o| | | |
| | | |x|x| | |
| | |o|o|x| | |
| | |o|x|x| | |
---------------
Number of nodes searched (cutoff = 10) (fp: 1) 2575


(100, 4)

949 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns.

In [46]:
DEBUG = 1

board = empty_board(shape =(6,6))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))

board = empty_board(shape =(6,6))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=1, player='x'))


-------------
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
-------------
Number of nodes searched (cutoff = 10) (fp: None) 56515


(8, 3)

22.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
-------------
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
-------------
Number of nodes searched (cutoff = 10) (fp: 1) 63964


(3, 3)

26.6 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [47]:
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=5, player='x'))

board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, player='x'))


---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
Number of nodes searched (cutoff = 10) (fp: 5) 132065


(7, 3)

57.5 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
Number of nodes searched (cutoff = 10) (fp: None) 105987


(8, 3)

45.5 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [49]:
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=5, fp=1, player='x'))


---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
Number of nodes searched (cutoff = 5) (fp: 1) 2335


(3, 3)

1.09 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [50]:
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=6, fp=1, player='x'))


---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
Number of nodes searched (cutoff = 6) (fp: 1) 2335


(3, 3)

1.11 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [51]:
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=7, fp=1, player='x'))


---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
Number of nodes searched (cutoff = 7) (fp: 1) 19253


(3, 3)

8.94 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [152]:
board = empty_board(shape =(6,7))
pretty_board(board)
%timeit -n1 -r1 display(alpha_beta_search_cutoff(board, cutoff=10, fp=1, player='x'))


---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------


(3, 3)

1min 16s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### Playtime

Let two heuristic search agents (different cutoff depth, different heuristic evaluation function or different forward pruning) compete against each other on a reasonably sized board. Since there is no randomness, you only need to let them play once.

In [64]:
class ABAgent:
    """Class used to make agents easily"""
    def __init__(self, H, cutoff=None, fp=None):
        self.H = H
        self.cutoff = cutoff
        self.fp = fp
        
    def act(self, board, player):
        v, move = alpha_beta_search_cutoff(board,
                                           cutoff=self.cutoff,
                                           fp=self.fp,
                                           player=player,
                                           H=self.H)
        return move
    
    def __str__(self):
        return f"(AB Agent cutoff: {self.cutoff} forward Pruning: {self.fp})"


In [151]:
%%time
agent_10_10 = ABAgent(H=basic_heuristic, cutoff=10, fp=10)
agent_5_10 = ABAgent(H=basic_heuristic, cutoff=5, fp=10)
game_result = play_game(agent_10_10.act, agent_5_10.act)
print(f"x:{agent_10_10} o:{agent_5_10}, {game_result[0]} won")


x:(AB Agent cutoff: 10 forward Pruning: 10) o:(AB Agent cutoff: 5 forward Pruning: 10), x won
Wall time: 11min 37s


In [68]:
%%time
agent_1_10 = ABAgent(H=basic_heuristic, cutoff=1, fp=10)
agent_5_10 = ABAgent(H=basic_heuristic, cutoff=5, fp=10)
DEBUG = 0
game_result = play_game(agent_1_10.act, agent_5_10.act)
print(f"x:{agent_5_10} o:{agent_1_10}, {game_result[0]} won")

pretty_board(game_result[1])

x:(AB Agent cutoff: 5 forward Pruning: 10) o:(AB Agent cutoff: 1 forward Pruning: 10), c won
---------------
|o|x|o|x|x|o|o|
|x|o|o|x|o|x|x|
|o|x|o|o|o|x|o|
|x|o|x|x|o|o|x|
|o|x|x|o|x|x|o|
|x|o|o|x|x|o|x|
---------------
Wall time: 15.9 s


In [139]:
%%time
DEBUG = 0
game_result = play_game(agent_5_10.act, agent_1_10.act)
print(f"x:{agent_5_10} o:{agent_1_10}, {game_result[0]} won")

pretty_board(game_result[1])

x:(AB Agent cutoff: 5 forward Pruning: 10) o:(AB Agent cutoff: 1 forward Pruning: 10), x won
---------------
| | | |o|o| | |
| | | |x|x| | |
| | | |x|o| | |
| | |x|x|x| | |
| | |o|o|o|x| |
|o| |x|x|o|o|x|
---------------
Wall time: 6.69 s


In [150]:
%%time
game_result = play_game(agent_5_10.act, agent_10_10.act)
print(f"x:{agent_5_10} o:{agent_10_10}, {game_result[0]} won")

x:(AB Agent cutoff: 5 forward Pruning: 10) o:(AB Agent cutoff: 10 forward Pruning: 10), o won
Wall time: 30min 15s


In [311]:
%%time

agent_10_2 = ABAgent(H=basic_heuristic, cutoff=10, fp=2)
game_result = play_game(agent_10_10.act, agent_10_2.act)

22min 25s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


## Challenge task [+ 1 bonus point]

Find another student and let your best agent play against the other student's best player. We will set up a class tournament on Canvas. This tournament will continue after the submission deadline.

I am challenging Steven Larsen. 

In [2]:
import time
def play_game_with_time(player_x, player_o, shape=(6,7)):
    
    global game_number, turn_number, DEBUG
    players_turn = 'x'
    time_x = 0.0
    time_o = 0.0
    
    turn_number = 0
    board = empty_board(shape)
    while True:
        col = -1
        if players_turn == 'x':
            start = time.time()
            col = player_x(board, players_turn)
            end = time.time()
            time_x+=(end-start)
        else:
            start = time.time()
            col = player_o(board, players_turn)
            end = time.time()
            time_o+=(end-start)
        board = result(board=board, player=players_turn, action=col)
        winner = terminal(board)
        if winner == 'c':
            return ('c', board, time_x, time_o)
        elif winner != ' ':
            return (players_turn, board, time_x, time_o)
        
        turn_number+=1
        players_turn = flip_player(players_turn)
    if verbose >= 1:
        pretty_board(board)

In [7]:
# This cell is stand alone ex

import steven_agent as sa
import mini_max_nick as mmn
import sys  
sys.path.insert(0, '.\\')

stevens = sa.StevenAgent('o', 1000)
agent_5_5 = mmn.ABAgent(cutoff = 5, fp = 5)
game_result = play_game_with_time(agent_5_5.act, stevens.act)
print(f"x:{game_result[2]} o:{game_result[3]}, {game_result[0]} won")


x:15.644373178482056 o:26.382996320724487, x won


In [174]:
game_result = play_game_with_time(stevens.act, agent_5_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_5_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(stevens.act, agent_5_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_5_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")

steven:24.442021369934082 nick:25.156413078308105, o won
nick:9.312940120697021 steven:16.827030658721924, o won
steven:21.629891633987427 nick:34.1540744304657, x won
nick:8.599025011062622 steven:19.00697374343872, x won


In [175]:
# Try more powerful agents
stevens = sa.StevenAgent('o', 4000) # the character was changed in his code to be ignored
agent_7_5 = mmn.ABAgent(cutoff = 7, fp = 5)
game_result = play_game_with_time(stevens.act, agent_7_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_7_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(stevens.act, agent_7_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_7_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")

steven:94.90196323394775 nick:243.1410355567932, o won
nick:55.04803013801575 steven:67.45196461677551, x won
steven:88.15589928627014 nick:154.29910254478455, o won
nick:135.60292863845825 steven:58.16800022125244, x won


In [176]:
game_result = play_game_with_time(stevens.act, agent_7_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_7_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(stevens.act, agent_7_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_7_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(stevens.act, agent_7_5.act)
print(f"steven:{game_result[2]} nick:{game_result[3]}, {game_result[0]} won")
game_result = play_game_with_time(agent_7_5.act, stevens.act)
print(f"nick:{game_result[2]} steven:{game_result[3]}, {game_result[0]} won")

steven:95.55282473564148 nick:196.7740912437439, o won
nick:243.43890810012817 steven:106.86776638031006, o won
steven:102.67270874977112 nick:202.91994380950928, o won
nick:53.82269096374512 steven:89.35564875602722, o won
steven:93.08422636985779 nick:180.92898154258728, o won
nick:64.40093994140625 steven:101.32003331184387, x won


## Graduate student advanced task: Pure Monte Carlo Search and Best First Move [1 point]

__Undergraduate students:__ This is a bonus task you can attempt if you like [+1 Bonus point].

### Pure Monte Carlos Search

Implement Pure Monte Carlo Search and investigate how this search performs on the test boards that you have used above. 

#### Playout playouts function

In [141]:
def playout(board, action, player='x'):
    board = result(board, player, action)
    current_player = flip_player(player)
    
    while(True):
        u = utility(board, player)
        if u is not None: #if terminal return the utility
            return(u)
        
        col = agent_random(board, current_player)
        board = result(board, current_player, col)
        
        current_player = flip_player(current_player)

board = empty_board()
print(playout(board, 0))
print(playout(board, 0))
print(playout(board, 0))

1
-1
1


In [142]:
def playouts(board, action, player='x', N =100):
    return [playout(board, action, player) for i in range(N)]

board = empty_board()
p = playouts(board, 0)
print('--- first move 0 ---')
print(f"mean utility {np.mean(p)}")
print(f"win prob {sum(np.array(p) == +1)/len(p)}")
print(f"lose prob {sum(np.array(p) == -1)/len(p)}")
print('--- first move 3 ---')
p = playouts(board, 3)
print(f"mean utility {np.mean(p)}")
print(f"win prob {sum(np.array(p) == +1)/len(p)}")
print(f"lose prob {sum(np.array(p) == -1)/len(p)}")

--- first move 0 ---
mean utility 0.03
win prob 0.51
lose prob 0.48
--- first move 3 ---
mean utility 0.34
win prob 0.67
lose prob 0.33


In [143]:
def pure_monte_carlo_search(board, N=100, player='x'):
    
    global DEBUG
    
    action_options = actions(board)
    # find how many playouts each actions gets, force it to be at least 1
    n = max(1, N//len(action_options))
    if DEBUG >= 2:
        print(f"Actions: {action_options} ({n} playouts per actions)")
    
    # create a dictionary of the actions and their playout average value
    play_outcomes = {i:np.mean(playouts(board, i, player, N=n)) for i in action_options}
    
    if DEBUG >= 2:
        display(play_outcomes)
    
    action = max(play_outcomes, key=play_outcomes.get)
    return action

board = empty_board()
pretty_board(board)
DEBUG = 1
print("1000 playouts on empty board gets you:")
print(pure_monte_carlo_search(board, N=1000))
print("10 playouts on empty board gets you:")
print(pure_monte_carlo_search(board, N=10))

    

---------------
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
---------------
1000 playouts on empty board gets you:
2
10 playouts on empty board gets you:
0


#### Monte Carlo Agents

In [144]:
def pmcs_1000(board, player):
    action = pure_monte_carlo_search(board, N=1000, player=player)
    return action

def pmcs_100(board, player):
    action = pure_monte_carlo_search(board, N=100, player=player)
    return action

winnings = play_games(100, pmcs_1000, pmcs_100, (6,7))
print('1000(x) won:', winnings[0], '100(o) won', winnings[1], 'tie ', winnings[2])

GameNumber:101

1000(x) won: 100 100(o) won 0 tie  0


### Best First Move

How would you determine what the best first move is? You can use Pure Monte Carlo Search or any algorithms 
that you have implemented above.

In [145]:
for i in range(7):
    board = empty_board()
    p = playouts(board, i, N=1000)
    print(f"--- first move {i} ---")
    print(f"mean utility {np.mean(p)}")
    print(f"win prob {sum(np.array(p) == +1)/len(p)}")
    print(f"lose prob {sum(np.array(p) == -1)/len(p)}")
    

--- first move 0 ---
mean utility -0.003
win prob 0.497
lose prob 0.5
--- first move 1 ---
mean utility 0.065
win prob 0.531
lose prob 0.466
--- first move 2 ---
mean utility 0.2
win prob 0.599
lose prob 0.399
--- first move 3 ---
mean utility 0.255
win prob 0.626
lose prob 0.371
--- first move 4 ---
mean utility 0.145
win prob 0.57
lose prob 0.425
--- first move 5 ---
mean utility 0.074
win prob 0.535
lose prob 0.461
--- first move 6 ---
mean utility -0.021
win prob 0.488
lose prob 0.509


Here we use the playouts to determine how much each starting move wins if the rest of the moves are random across 1000 games. We see that moving at column 3, in the very middle does the best with a winning percentage of 64.4%

#### Lets look at what happens when we play a monte carlo agent against our best 

In [169]:
%%time

DEBUG = 0
agent_10_2 = ABAgent(H=basic_heuristic, cutoff=10, fp=2)
game_result = play_game_with_time(agent_10_2.act, pmcs_100)
print(f"x:agent_10_2 took {game_result[2]} o:pmcs_100 took {game_result[3]}, {game_result[0]} won")
pretty_board(game_result[1])

x:agent_10_2 took 1143.007229089737 o:pmcs_100 took 1.5600457191467285, x won
---------------
| | | | | | | |
| | | | | | | |
| |o|o|x| | | |
| |x|x|o| | | |
| |o|x|x|x|x| |
|o|o|o|x|o|x| |
---------------
Wall time: 19min 4s


In [170]:
%%time
DEBUG = 0
game_result = play_game_with_time(pmcs_100, agent_10_2.act)
print(f"x:pmcs_100 took {game_result[2]} o:agent_10_2 took {game_result[3]}, {game_result[0]} won")
pretty_board(game_result[1])

x:pmcs_100 took 1.2409982681274414 o:agent_10_2 took 1029.1525993347168, o won
---------------
| | | | | | | |
| | | | | | | |
| | | | |o| | |
| | | |x|o| | |
|x| |x|o|o| | |
|x|x|o|x|o| | |
---------------
Wall time: 17min 10s


In [171]:
%%time
DEBUG = 0
game_result = play_game_with_time(agent_10_2.act, pmcs_1000)
print(f"x:agent_10_2 took {game_result[2]} o:pmcs_1000 took {game_result[3]}, {game_result[0]} won")
pretty_board(game_result[1])

x:agent_10_2 took 490.829060792923 o:pmcs_1000 took 20.24896550178528, o won
---------------
| | | |x|x| | |
| | |x|o|x| | |
| |x|o|x|o| | |
| |o|o|o|x| | |
|o|o|x|x|x| | |
|o|o|o|x|o| |x|
---------------
Wall time: 8min 31s


In [172]:
%%time
DEBUG = 0
game_result = play_game_with_time(pmcs_1000, agent_10_2.act)
print(f"x:pmcs_100 took {game_result[2]} o:agent_10_2 took {game_result[3]}, {game_result[0]} won")
pretty_board(game_result[1])

x:pmcs_100 took 16.141958236694336 o:agent_10_2 took 144.44307470321655, x won
---------------
| | | | | | | |
| | | |o| | | |
| | | |o| |x| |
| | | |x|o|o| |
| | |x|x|x|x| |
|o| |x|x|o|o| |
---------------
Wall time: 2min 40s
