# Adversarial Search: Playing Connect 4


## Instructions

Total Points: Undegraduates 10, graduate students 11

Complete this notebook and submit it. The notebook needs to be a complete project report with your implementation, documentation including a short discussion of how your implementation works and your design choices, and experimental results (e.g., tables and charts with simulation results) with a short discussion of what they mean. Use the provided notebook cells and insert additional code and markdown cells as needed.

## Introduction

You will implement different versions of agents that play Connect 4:

> "Connect 4 is a two-player connection board game, in which the players choose a color and then take turns dropping colored discs into a seven-column, six-row vertically suspended grid. The pieces fall straight down, occupying the lowest available space within the column. The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own discs." (see [Connect Four on Wikipedia](https://en.wikipedia.org/wiki/Connect_Four))

## Task 1: Defining the Search Problem [1 point]

Define the components of the search problem:

* Initial state
* Actions
* Transition model
* Goal state

In [1]:
# Your code/answer goes here.

|||
|:-----------------|------|
|**Initial State**    | <li> &nbsp;&nbsp; Defined by the location of an agent's piece ('x', 'o') within the bottom row </li> |
|**Actions**          | <li> &nbsp;&nbsp; { Left, Right } </li> |
|**Transition Model** | <li> &nbsp;&nbsp; The agent may choose any column _C_, out of the 7 total, <br/> &nbsp;&nbsp;&nbsp; that is not yet full:  __7 - C__ </li> |
|**Goal State**       | <li> &nbsp;&nbsp; Put n 'x' or 'o' pieces on a 6 × 7 board having 4 (or more) <br/> &nbsp;&nbsp;&nbsp; of the same piece type ('x' or 'o') line up together in either <br/> &nbsp;&nbsp;&nbsp; the same row, column, or diagonally. This line must <br/> &nbsp;&nbsp;&nbsp; not have any gaps in between </li> |
|**Path Cost**        | <li> &nbsp;&nbsp; 1 per move </li> |

How big is the search space?

In [2]:
# Your code/ answer goes here.

- Our board is 6 x 7, thus the total number of locations on the board is <u>**42**</u>.
- Each location (row,column) within the board has <u>**3**</u> Possible states : 'x', 'o', or empty 

So that then gives us an `upper bound` of  $3^{42}$ , which isn't necessarily true.

This calculation implies that an agent could place a piece within any row at any given point, which is false.  
An example is if an agent wanted to place their first piece in the middle row. Due to gravity, the piece would  
drop down to the bottom row.

The best `lower bound` on the number of possible positions has been calculated by a computer program to be around _$1.6 * 10^{13}$_  [1].
    

## Task 2: Game Environment and Random Agent [2 point]

Use a numpy character array as the board.

In [3]:
import numpy as np

def empty_board(shape=(6, 7)):
    return np.full(shape=shape, fill_value=' ')

print(empty_board())

[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']]


Instead of colors for the players use 'x' and 'o' to represent the players. Make sure that your agent functions all have the from: `agent_type(board, player = 'x')`, where board is the current board position and player is the player whose next move it is and who the agent should play.

Implement the board and helper functions for:

* The transition model (result).
* The utility function.
* Check for terminal states.
* A check for available actions.
* A function to visualize the board.

Make sure that all these functions work with boards of different sizes.

Implement an agent that plays randomly and let two random agents play against each other 1000 times. How often does each player win? Is the result expected? 

In [4]:
# Your code/ answer goes here.

In [757]:
board = empty_board()

# Add move to the board -------------------------------------------------------------------------- 
def result(state, player, action):
    state = state.copy()
    
    for row in range(6,0,-1):
        if state[row - 1][action] == ' ':
            state[row - 1][action] = player
            break
            
        if row - 1 == -1 and row == 0 and state[row][action] == ' ': 
            state[row - 1][action] = player
            break
            
        if state[row - 1][action] != ' ' and (row - 1 > 0):
            continue
            #if state[row - 2][action] != ' ' and (row - 2 > 0):
            
           #     state[row - 2][action] = player
           #     break
        
        
    #if state[:,action]
    
    #valid_move = False
    #while not valid_move:
    #    for row in range (6,0,-1):
    #        #print(len(state) - 1)
    #        if (1 <= row <= 6) and (0 <= action <= 6) and (board[row-1][action] == ' '):
    #            if state[row-1][action] != ' ' and row-1 > 0:
    #                state[row-2][action] = player
    #                valid_move = True
    #            else:     
    #                state[row-1][action] = player
    #            valid_move = True
    #            break
                
    #print("--------------- State ---------------\n", state)
    #print("--------------- Player ---------------\n", player)
    #print("--------------- Action ---------------\n", action)
  
    return state


# Produce the Belief State* after the given action for a player. ---------------------------------
# *Belief State: The set of boards w/ the action & all possible reactions of an opponent.
def results(state, action, player = 'x'):
    if player == 'x': other = 'o'
    else: other = 'x'
    
    state = state.copy()
    
    # player's move
    state[action] = player
    
    # opponent reacts
    r = list()
    o_actions = actions(state)
    
    # board is full
    if len(o_actions) < 1 : return [state]
    
    for o_a in o_actions:
        s = state.copy()
        s[o_a] = other
        r.append(s)    
    
    return r

# Check the board & return : 'x', 'o', 'd' (draw), or n (next move)
def check_win(board):
    
    for row in range(3):
        for col in range(4):
            
            # Diagonal Check : TopLeft --> BottomRight
            if board[row][col] != ' ':
                if board[row][col] == board[row + 1][col + 1]:
                    if board[row + 1][col + 1] == board[row + 2][col + 2]:
                        if board[row + 2][col + 2] == board[row + 3][col + 3]:
                            return board[row][col]
                        
            # Vertical Check : Right Side       
            if board[row][col + 3] != ' ':
                if board[row + 3][col + 3] == board[row + 2][col + 3]:
                    if board[row + 2][col + 3] == board[row + 1][col + 3]:
                        if board[row + 1][col + 3] == board[row][col + 3]:
                            return board[row][col + 3]
                        
    for row in range(5, 2, -1):
        for col in range(4):
            
            # Diagonal Check : BottomLeft --> TopRight
            if board[row][col] != ' ': 
                if board[row][col] == board[row - 1][col + 1]:
                    if board[row - 1][col + 1] == board[row - 2][col + 2]:
                        if board[row - 2][col + 2] == board[row - 3][col + 3]:
                            return board[row][col]
                        
            # Vertical Check : Left Side        
            if board[row - 3][col] != ' ':
                if board[row][col] == board[row - 1][col]:
                    if board[row - 1][col] == board[row - 2][col]:
                        if board[row - 2][col] == board[row - 3][col]:
                            return board[row - 3][col]       
                        
    # Horizontal Check 
    for row in (2, 1, 0):
        for col in range(4):
            row2 = len(board[0]) - row - 2
            
            if board[row][col] != ' ': 
                if board[row][col] == board[row][col + 1]:
                    if board[row][col + 1] == board[row][col + 2]:
                        if board[row][col + 2] == board[row][col + 3]:
                            return str(board[row][col])
                        
            if board[row2][col] != ' ':
                if board[row2][col] == board[row2][col + 1]:
                    if board[row2][col + 1] == board[row2][col + 2]:
                        if board[row2][col + 2] == board[row2][col + 3]:
                            return str(board[row2][col])
                
    # check for draw
    if(np.sum(board == ' ') < 1):
        return 'd'
    
    return 'n'

# Returns win/loss (terminal) or false (non-terminal) --------------------------------------------
def is_terminal(state, player = 'x', draw_is_win = True):
    if player == 'x': other = 'o'
    else: other = 'x'
    
    goal = check_win(state)
    if goal == str(player): return 'win' 
    if goal == 'd': 
        if draw_is_win: return 'draw' 
        else: return None 
    if goal == other: return None  # loss is failure
    return False # continue

# Return possible actions as a vector of idx's ---------------------------------------------------
def actions(board):
    valid_columns = []
    for i in np.where(np.array(board == ' ')[0]):
        valid_columns.append(i)
    #print(np.array(valid_columns)[0])
    return np.array(valid_columns)[0]
    #print(np.where(np.array(board) == ' ')[0].tolist())
    #return np.where(np.array(board) == ' ')[0].tolist()

# Display the board ------------------------------------------------------------------------------
def show_board(board):
    print(board)
    # print(np.array(board))

In [758]:
# Simple player that chooses a random empty square. Player is unused. ----------------------------  
def random_player(board, player = None):
    val = np.random.choice(actions(board))
    print("Random Pick: ", val)
    return val

def switch_player(player, x, o):
    if player == 'x': return 'o', o
    else: return 'x', x

# 2 Random Agents, N times
def play(x, o, N = 100):
    """ x starts. x & y are agent functions that get the board as the percept and return their next action."""
    results = {'x': 0, 'o': 0, 'd': 0}
    
    for i in range(N):
        board = empty_board()
        player, fun = 'x', x
        
        while True:
            a = fun(board, player)
            board = result(board, player, a)
            
            win = check_win(board)   # returns the 'n' if the game is not done.
            if win != 'n':
                results[win] += 1
                break
            
            player, fun = switch_player(player, x, o)
            
            print(board, "\n")
    
    return results

In [759]:
%timeit -n 1 -r 1 display(play(random_player, random_player, N = 100))

Random Pick:  3
[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' 'x' ' ' ' ' ' ']] 

Random Pick:  1
[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' 'o' ' ' 'x' ' ' ' ' ' ']] 

Random Pick:  2
[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' 'o' 'x' 'x' ' ' ' ' ' ']] 

Random Pick:  5
[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' 'o' 'x' 'x' ' ' 'o' ' ']] 

Random Pick:  6
[[' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' 'o'

{'x': 49, 'o': 51, 'd': 0}

2.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [656]:
%%time
board = empty_board()

# board[0][0] = 'x'
# board[1][0] = 'x'
# board[2][0] = 'x'
# board[3][0] = 'x'
# board[4][0] = 'x'
# board[5][0] = 'x'

print("------------ Board ------------")
show_board(board)

is_terminal(board)
#print('Win? ' + check_win(board))
print("-----")
print("Actions --> ", actions(board))

print("-----")
random_player(board)

------------ Board ------------
[['x' ' ' ' ' ' ' ' ' ' ' ' ']
 [' ' ' ' ' ' ' ' ' ' ' ' ' ']
 ['x' ' ' ' ' ' ' ' ' ' ' ' ']
 ['x' ' ' ' ' ' ' ' ' ' ' ' ']
 ['x' ' ' ' ' ' ' ' ' ' ' ' ']
 ['x' ' ' ' ' ' ' ' ' ' ' ' ']]
-----
Actions -->  [1 2 3 4 5 6]
-----
2
CPU times: user 1.28 ms, sys: 5 µs, total: 1.28 ms
Wall time: 1.05 ms


5

## Task 3: Minimax Search with Alpha-Beta Pruning [4 points]

### Implement the search starting from a given board and specifying the player.



__Note:__ The search space for a $6 \times 7$ board is large. You can experiment with smaller boards (the smallest is $4 \times 4$) and/or changing the winning rule to connect 3 instead of 4.

In [7]:
# Your code/ answer goes here.

Experiment with some manually created boards (at least 5) to check if the agent spots winning opportunities.

In [8]:
# Your code/ answer goes here.

How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns.

In [9]:
# Your code/ answer goes here.

### Move ordering

Describe and implement a simple move ordering strategy. How does this strategy influence the time it takes to 
make a move?

In [10]:
# Your code/ answer goes here.

### Playtime

Let the Minimax Search agent play a random agent on a small board. Analyze wins, losses and draws.

In [11]:
# Your code/ answer goes here.

## Task 4: Heuristic Alpha-Beta Tree Search [3 points] 

### Heuristic evaluation function

Define and implement a heuristic evaluation function.

In [12]:
# Your code/ answer goes here.

### Cutting off search 

Modify your Minimax Search with Alpha-Beta Pruning to cut off search at a specified depth and use the heuristic evaluation function. Experiment with different cutoff values.

In [13]:
# Your code/ answer goes here.

Experiment with the same manually created boards as above to check if the agent spots wining opportunities.

In [14]:
# Your code/ answer goes here.

How long does it take to make a move? Start with a smaller board with 4 columns and make the board larger by adding columns.

In [15]:
# Your code/ answer goes here.

### Playtime

Let two heuristic search agents (different cutoff depth, different heuristic evaluation function) compete against each other on a reasonably sized board. Since there is no randomness, you only need to let them play once.

In [16]:
# Your code/ answer goes here.

## Challenge task [+ 1 bonus point]

Find another student and let your best agent play against the other student's best player. We will set up a class tournament on Canvas. This tournament will continue after the submission deadline.

## Graduate student advanced task: Pure Monte Carlo Search and Best First Move [1 point]

__Undergraduate students:__ This is a bonus task you can attempt if you like [+1 Bonus point].

### Pure Monte Carlos Search

Implement Pure Monte Carlo Search and investigate how this search performs on the test boards that you have used above. 

In [17]:
# Your code/ answer goes here.

### Best First Move

How would you determine what the best first move is? You can use Pure Monte Carlo Search or any algorithms 
that you have implemented above.

In [18]:
# Your code/ answer goes here.


--------------

_References_

[1]
“Connect Four,” 2010. [Online]. Available: https://web.mit.edu/sp.268/www/2010/connectFourSlides.pdf.
‌