---

# CSCI 3202, Fall 2022
# Final Project
# Project Due: Thursday December 8, 2022 at 6:00 PM
## Proposals Due: Friday November 18, 2022 at 6:00 PM


You have two options for completing your final project for this course. 

#### Option 1 ####
The first option is presented in this notebook and involves implementing a Connect Four game with AB pruning and A* as player strategies. 

#### Option 2 ####
The second option is to design your own project that includes any of the algorithms we've discussed throughout the semester, or an algorithm that you're interested in learning that we haven't discussed in class. Your project also needs to include some kind of analysis of how it performed on a specific problem. If you're interested in the design your own project option, you need to discuss your idea with one of the course instructors to get approval. If you do a project without getting approval, you will receive a 0 regardless of the quality of the project. 

**The rules:**

1. Choose EITHER the given problem to submit OR choose your own project topic. 

2. If you choose your own project topic, please adhere to the following guidelines:
- Send an email to the course instructors before Friday, November 18 at 6pm, with a paragraph description of your project. We will respond within 24 hours with feedback.
- The project can include an algorithm we've discussed throughout the semester or an algorithm that you're been curious to learn. Please don't recycle a project that you did in another class. 
- If you do your own project without prior approval, you will receive a 0 for this project.
- Your project code, explanation, and results must all be contained in a Jupyter notebook. 

3. All work, code and analysis must be **your own**.
4. You may use your course notes, posted lecture slides, textbook, in-class notebooks and homework solutions as resources.  You may also search online for answers to general knowledge questions, like the form of a probability distribution function, or how to perform a particular operation in Python. You may not use entire segments of code as solutions to any part of this project, e.g. if you find a Python implementation of policy iteration online, you can't use it.
5. You may **not** post to message boards or other online resources asking for help.
6. **You may not collaborate with classmates or anyone else.**
7. This is meant to be like a coding portion of an exam. So, we will be much less helpful than we typically are with homework. For example, we will not check answers, help debug your code, and so on.
8. If you have a question, post it first as a **private** Piazza message. If we decide that it is appropriate for the entire class, then we will make it a public post (and anonymous).
9. If any part of the given project or your personal project is left open-ended, it is because we intend for you to code it up however you want. Our primary concern is with the plots/analysis that your code produces. Feel free to ask clarifying questions though.

Violation of these rules will result in an **F** and a trip to the Honor Code council.

---
**By writing your name below, you agree to abide by these rules:**

**Your name: Olivia Golden** 

---


---

Some useful packages and libraries:



In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors
from collections import deque
import heapq
import unittest
from scipy import stats
import copy as cp
from time import time
import random

---

## Problem 1: Game Theory - Playing "intelligent" Connect Four

Connect Four is a two-player game where the objective is to get four pieces in a row - horizontally, vertically, or diagonally. Check out this video if you're unfamiliar with the game: https://www.youtube.com/watch?v=utXzIFEVPjA.

### (1a)   Defining the Connect Four class structure

We've provided the humble beginnings of a Connect Four game. You need to fill in this class structure for Connect Four using what we did during class as a guide, and then implement min-max search with AB pruning, and A* search with at least one heuristic function. The class structure includes the following:

* `moves` is a list of columns to represent which moves are available. Recall that we are using matrix notation for this, where the upper-left corner of the board, for example, is represented at (1,1).
* `result(self, move, state)` returns a ***hypothetical*** resulting `State` object if `move` is made when the game is in the current `state`. Note that when a 'move' is made, the column must have an open slot and the piece must drop to the lowest row. 
* `compute_utility(self, move, state)` calculates the utility of `state` that would result if `move` is made when the game is in the current `state`. This is where you want to check to see if anyone has gotten `nwin` in a row
* `game_over(self, state)` returns `True` if the game in the given `state` has reached a terminal state, and `False` otherwise.
* `utility(self, state, player)` returns the utility of the current state if the player is Red and $-1 \cdot$ utility if the player is Black.
* `display(self)` is a method to display the current game `state`. You get it for free because this would be super frustrating without it.
* `play_game(self, player1, player2)` returns an integer that is the utility of the outcome of the game (+1 if Red wins, 0 if draw, -1 if Black wins). `player1` and `player2` are functional arguments that we will deal with in parts **1b** and **1d**.

Some notes:
* Assume Red always goes first.
* Do **not** hard-code for 6x7 boards.
* You may add attributes and methods to these classes as needed for this problem.

In [28]:
class State:
    def __init__(self, moves):
        self.to_move = 'R'
        self.utility = 0
        self.board = {}
        self.moves = moves

class ConnectFour:
    def __init__(self, nrow=6, ncol=7, nwin=4):
        self.nrow = nrow
        self.ncol = ncol
        self.nwin = nwin
        moves = [(nrow, col) for col in range(1, ncol + 1)]
        self.state = State(moves)
        self.expanded_states = 0

    def result(self, move, state):
        '''
        What is the hypothetical result of move `move` in state `state` ?
        move  = (row, col) tuple where player will put their mark (R or B)
        state = a `State` object, to represent whose turn it is and form
                the basis for generating a **hypothetical** updated state
                that will result from making the given `move`
        '''
        
        # your code goes here...
        if move not in state.moves: #if move cannot be made
            return state
        new_state = cp.deepcopy(state)
        new_state.utility = self.compute_utility(move, state)
        new_state.board[move] = state.to_move
        new_state.moves.remove(move) #move can no longer be made
        if move[0]-1>0: #if column is not full
            new_state.moves.append((move[0]-1, move[1])) #add move above most recent move
        if state.to_move=='R': #move player
            new_state.to_move = 'B'
        else:
            new_state.to_move = 'R'
        return new_state
    
#     def compute_utility(self, move, state):
#         '''
#         What is the utility of making move `move` in state `state`?
#         If 'R' wins with this move, return 1;
#         if 'B' wins return -1;
#         else return 0.
#         '''        
        
#         # your code goes here...
#         row, col = move
#         player = state.to_move
        
#         # create a hypothetical copy of the board, with 'move' executed
#         board = cp.deepcopy(state.board)
#         board[move] = player
        
#         #check for row-wise win
#         in_a_row = 0
#         for c in range(0,self.ncol+1):
#             if board.get((row,c))==player:
#                 in_a_row += 1
#                 if (in_a_row==self.nwin):
#                     if player=='R':
#                         #print('Player R wins with a row')
#                         return 1
#                     else:
#                         #print('Player B wins with a row')
#                         return -1            
#             else:
#                 in_a_row=0

#         # check for column-wise win
#         in_a_col = 0
#         for r in range(0,self.nrow+1):
#         #for r in range(0,self.nrow+1):
#             if board.get((r,col))==player:
#                 in_a_col += 1
#                 if (in_a_col==self.nwin):
#                     if player=='R':
#                         #print('Player R wins with a column')
#                         return 1
#                     else:
#                         #print('Player B wins with a column')
#                         return -1  
#             else:
#                 in_a_col=0

#         #check for NW->SE diagonal win
#         in_a_diag1 = 0
#         for r in range(row,0,-1):
#             if (board.get((r,col-(row-r)))==player):
#                 in_a_diag1 += 1
#                 if (in_a_diag1==self.nwin):
#                     if player=='R':
#                         #print('Player R wins with a diagonal')
#                         return 1
#                     else:
#                         #print('Player B wins with a diagonal')
#                         return -1  
#             else:
#                 break
                
#         for r in range(row+1,self.nrow+1):
#             if (board.get((r,col-(row-r)))==player):
#                 in_a_diag1 += 1
#                 if (in_a_diag1==self.nwin):
#                     if player=='R': 
#                         #print('Player R wins with a diagonal')
#                         return 1
#                     else:
#                         #print('Player B wins with a diagonal')
#                         return -1  
#             else:
#                 break

#         #check for SW->NE diagonal win
#         in_a_diag2 = 0
#         for r in range(row,0,-1):
#             if (board.get((r,col+(row-r)))==player):
#                 in_a_diag2 += 1
#                 if (in_a_diag2==self.nwin):
#                     if player=='R':
#                         #print('Player R wins with 4 in a diagonal')
#                         return 1
#                     else:
#                         #print('Player B wins with 4 in a diagonal')
#                         return -1  
#             else:
#                 break
#         for r in range(row+1,self.nrow+1):
#             if (board.get((r,col+(row-r)))==player):
#                 in_a_diag2 += 1
#                 if (in_a_diag2==self.nwin):
#                     if player=='R':
#                         #print('Player R wins with a diagonal')
#                         return 1
#                     else:
#                         #print('Player B wins with a diagonal')
#                         return -1  
#             else:
#                 break         
       
#         return 0

    def compute_utility(self, move, state):
        '''
        What is the utility of making move `move` in state `state`?
        If 'R' wins with this move, return nwins;
        if 'B' wins return -nwins;
        else return the maximum number of player's pieces in a row from inputted move.
        '''        
        
        # your code goes here...
        row, col = move
        player = state.to_move
        
        # create a hypothetical copy of the board, with 'move' executed
        board = cp.deepcopy(state.board)
        board[move] = player
        
        #get max number of players pieces in a row from move
        in_a_row = 0
        for c in range(col,self.ncol+1): 
            if board.get((row,c))==player:
                in_a_row += 1          
            else:
                break
        for c in range(col-1,col-self.nwin-1,-1):
            if board.get((row,c))==player:
                in_a_row += 1         
            else:
                break
        
        #get max number of players pieces in a column from move
        in_a_col = 0
        for r in range(row,self.nrow+1):
            if board.get((r,col))==player:
                in_a_col += 1
            else:
                break

        #get max number of players pieces in a NW->SE diagonal from move
        in_a_diag1 = 0
        for r in range(row,0,-1):
            if (board.get((r,col-(row-r)))==player):
                in_a_diag1 += 1
            else:
                break
                
        for r in range(row+1,self.nrow+1):
            if (board.get((r,col-(row-r)))==player):
                in_a_diag1 += 1
            else:
                break

        #get max number of players pieces in a SW->NE diagonal from move
        in_a_diag2 = 0
        for r in range(row,0,-1):
            if (board.get((r,col+(row-r)))==player):
                in_a_diag2 += 1
            else:
                break
        for r in range(row+1,self.nrow+1):
            if (board.get((r,col+(row-r)))==player):
                in_a_diag2 += 1
            else:
                break       
        
       
        if player=='R': #if player is R return positive max
            #print(in_a_diag2, in_a_diag1, in_a_row, in_a_col)
            return max(in_a_diag2, in_a_diag1, in_a_row, in_a_col)  
        else: #if player is B return negative max
            #print(in_a_diag2, in_a_diag1, in_a_row, in_a_col)
            player_b = max(in_a_diag2, in_a_diag1, in_a_row, in_a_col)
            return -player_b

    def game_over(self, state):
        '''game is over if someone has won (utility!=0) or there
        are no more moves left'''
        
        # your code goes here...  
        return state.utility==self.nwin or state.utility==-self.nwin or len(state.moves)==0 
        
    def utility(self, state, player):
        '''Return the value to player; 1 for win, -1 for loss, 0 otherwise.'''
        
        # your code goes here...
        if (player=='R'): #get utility relative to player
            return state.utility
        else:
            return -state.utility
        
    def display(self):
        board = self.state.board
        for row in range(1, self.nrow + 1):
            for col in range(1, self.ncol + 1):
                print(board.get((row, col), '.'), end=' ')
            print()

    def play_game(self, player1, player2):
        '''Play a game of Connect Four!'''
        
        # your code goes here...
        turn_limit = self.nrow*self.ncol  # limit in case of buggy code
        turn = 0
        i=0
        while turn<=turn_limit:
            for player in [player1, player2]:
                turn += 1
                move = player(self)
                self.state = self.result(move, self.state)
                #self.display()
                if self.game_over(self.state):
                    #self.display()
                    #return self.state.utility, self #used for A* comparison, commented out otherwise
                    return self.state.utility
                

### (1b) Define a random player

Define a function `random_player` that takes a single argument of the `ConnectFour` class and returns a random move out of the available legal moves in the `state` of the `ConnectFour` game.

In your code for the `play_game` method above, make sure that `random_player` could be a viable input for the `player1` and/or `player2` arguments.

In [6]:
def random_player(game):
    '''A player that chooses a legal move at random out of all
    available legal moves in ConnectFour state argument'''
    
    # your code goes here...
    possible_actions = game.state.moves
    return possible_actions[np.random.randint(low=0, high=len(possible_actions))]
    


We know from experience and/or because I'm telling you right now that if two `random_player`s play many games of ConnectFour against one another, whoever goes first will win about 55% of the time.  Verify that this is the case by playing at least 1,000 games between two random players. Report the proportion of the games that the first player has won.**(Chris: is this true for TicTacToe, or Connect Four)**

**"Unit tests":** If you are wondering how close is close enough to 55%, I simulated 100 tournaments of 1,000 games each. The min-max range of win percentage by the first player was 52-59%.

In [7]:
# 1000 games between two random players

# Your code here
wins = np.array([])
for i in range(10000): #play 10000 games
    c4 = ConnectFour()
    wins = np.append(wins, c4.play_game(random_player, random_player))

print('Percent of Player 1 wins: ' + str(100*(wins == 4).sum()/10000) + '%')
print('Percent of Player 2 wins: ' + str(100*(wins == -4).sum()/10000) + '%')
draws = 0 
for j in range(1, 4): #sum up all values not equal to a terminal state
    draws += (wins == j).sum()
    draws += (wins == -j).sum()
print('Percent of Draws: ' + str(100*draws/10000) + '%')

Percent of Player 1 wins: 55.06%
Percent of Player 2 wins: 44.1%
Percent of Draws: 0.75%


### (1c) What about playing randomly on different-sized boards?

What does the long-term win percentage appear to be for the first player in a 10x10 ConnectFour tournament, where 4 marks must be connected for a win?  Support your answer using a simulation and printed output, similar to **1b**.

**Also:** The win percentage should have changed substantially. Did the decrease in wins turn into more losses for the first player or more draws? Write a few sentences explaining the behavior you observed.  *Hint: think about how the size of the state space has changed.*

In [5]:
# 1000 games between two random players

# Your code here
wins_ = np.array([])
for j in range(10000): #play 10000 games
    c4_ = ConnectFour(nrow=10,ncol=10,nwin=6)
    wins_ = np.append(wins_, c4_.play_game(random_player, random_player))

print('Percent of Player 1 wins: ' + str(100*(wins_ == 6).sum()/10000) + '%')
print('Percent of Player 2 wins: ' + str(100*(wins_ == -6).sum()/10000) + '%')
draws = 0 
for j in range(1, 6):
    draws += (wins_ == j).sum() #sum up all values not equal to a terminal state
    draws += (wins_ == -j).sum()
print('Percent of Draws: ' + str(100*draws/10000) + '%')

Percent of Player 1 wins: 44.22%
Percent of Player 2 wins: 42.86%
Percent of Draws: 12.71%


Player 1's wins decreased since now a player must get 6 in a row.  Using two random players, it is more likely to complete a full game than it is to get 6 in a row, even on a 10x10 board.  Therefore, there are more draws and less Player 1 wins. 

### (1d) Define an alpha-beta player

Alright. Let's finally get serious about our Connect Four game.  No more fooling around!

Craft a function called `alphabeta_player` that takes a single argument of a `ConnectFour` class object and returns the minimax move in the `state` of the `ConnectFour` game. As the name implies, this player should be implementing alpha-beta pruning as described in the textbook and lecture.

Note that your alpha-beta search for the minimax move should include function definitions for `max_value` and `min_value` (see the aggressively realistic pseudocode from the lecture slides).

In your code for the `play_game` method above, make sure that `alphabeta_player` could be a viable input for the `player1` and/or `player2` arguments.

In [33]:
node_counter = 0 #counter for node exploration in 1e

def alphabeta_player(game): #alpha beta player
    return alpha_beta_search(game.state, game)

def alpha_beta_search(state, game):
    """
    ---- Implemented from the CSCI 3202 AIMA GitHub games.py [2]---
    Search game to determine best action; use alpha-beta pruning.
    As in [Figure 5.7], this version searches all the way to the leaves."""

    player = state.to_move

    # Functions used by alpha_beta
    def max_value(state, alpha, beta):
        global node_counter 
        node_counter+=1 #increment expanded node counter
        if game.game_over(state): #if terminal state is found
            return game.utility(state, player) #return game utility relative to player
        v = -np.inf
        for a in state.moves:
            v = max(v, min_value(game.result(a, state), alpha, beta)) #get max of value and min_value recursive call
            if v >= beta: #pruning - commented out when running minimax
                return v
            alpha = max(alpha, v)
        return v

    def min_value(state, alpha, beta):
        global node_counter 
        node_counter+=1 #increment expanded node counter
        if game.game_over(state): 
            return game.utility(state, player)
        v = +np.inf
        for a in state.moves:
            v = min(v, max_value(game.result(a, state), alpha, beta)) #get min of value and max_value recursive call
            if v <= alpha: #pruning - commented out when running minimax
                return v
            beta = min(beta, v)
        return v

    # Body of alpha_beta_search:
    best_score = -np.inf #acts as alpha
    beta = +np.inf
    best_action = None
    for a in state.moves:
        v = min_value(game.result(a, state), best_score, beta) 
        if v > best_score: #if returned v value greater than current best_score value
            best_score = v #set best_score to new v
            best_action = a #set best_action as action associated with new best_score
    return best_action #return action that has utility of value to max
    

Verify that your alpha-beta player code is working appropriately through the following tests, using a standard 6x7 ConnectFour board. Run **10 games for each test**, and track the number of wins, draws and losses. Report these results for each case.

1. An alpha-beta player who plays first should never lose to a random player who plays second.
2. Two alpha-beta players should always draw. One player is the max player and the other player is the min player.

**Nota bene:** Test your code with fewer games between the players to start, because the alpha-beta player will require substantially more compute time than the random player.  This is why I only ask for 10 games, which still might take a minute or two. You are welcome to run more than 10 tests if you'd like. 

In [32]:
# Your code here
wins_1 = np.array([])
wins_2 = np.array([])
wins_3 = np.array([])
for k in range(100):
    c4_1 = ConnectFour(3,4,3) #3x4 board with 3 to win 
    wins_1 = np.append(wins_1, c4_1.play_game(alphabeta_player, random_player)) #alpha beta player first
    c4_2 = ConnectFour(3,4,3)
    wins_2 = np.append(wins_2, c4_2.play_game(random_player, alphabeta_player)) #alpha beta player second
    c4_3 = ConnectFour(3,3,3) #3x3 board with 3 to win
    wins_3 = np.append(wins_3, c4_3.play_game(alphabeta_player, alphabeta_player)) #2 alpha beta players draw

def display_stats(nwins, trials):
    print('Alpha Beta v Random Player')
    print('Percent of Player 1 wins: ' + str(100*(wins_1 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_1 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins): #sum up all values not equal to terminal state
        draws += (wins_1 == j).sum()
        draws += (wins_1 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')

    print('Random Player v Alpha Beta')
    print('Percent of Player 1 wins: ' + str(100*(wins_2 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_2 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins):
        draws += (wins_2 == j).sum()
        draws += (wins_2 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')
    
    print('Two Alpha-Beta players 3x3 Board')
    print('Percent of Player 1 wins: ' + str(100*(wins_3 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_3 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins):
        draws += (wins_3 == j).sum()
        draws += (wins_3 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')
    
display_stats(3, 100)

Alpha Beta v Random Player
Percent of Player 1 wins: 100.0%
Percent of Player 2 wins: 0.0%
Percent of Draws: 0.0%
Random Player v Alpha Beta
Percent of Player 1 wins: 24.0%
Percent of Player 2 wins: 74.0%
Percent of Draws: 2.0%
Two Alpha-Beta players 3x3 Board
Percent of Player 1 wins: 0.0%
Percent of Player 2 wins: 0.0%
Percent of Draws: 100.0%


### (1e) What has pruning ever done for us?

Calculate the number of number of states expanded by the minimax algorithm, **with and without pruning**, to determine the minimax decision from the initial 6x7 ConnectFour board state.  This can be done in many ways, but writing out all the states by hand is **not** one of them (as you will find out!).

Then compute the percent savings that you get by using alpha-beta pruning. i.e. Compute $\frac{\text{Number of nodes expanded with pruning}}{\text{Number of nodes expanded with minimax}} $

Write a sentence or two, commenting on the difference in number of nodes expanded by each search.

In [14]:
node_counter=0
ConnectFour(3,4,3).play_game(alphabeta_player, random_player)
ab_counter = node_counter
print('Nodes expanded with pruning: ' + str(ab_counter))

Nodes expanded with pruning: 5663


In [12]:
node_counter=0
ConnectFour(3,4,3).play_game(alphabeta_player, random_player)
mini_counter = node_counter
print('Nodes expanded with minimax: ' + str(mini_counter))

Nodes expanded with minimax: 308204


In [15]:
print('Percent savings through alpha-beta pruning: ' + str(100-(ab_counter/mini_counter)*100) + '%')

Percent savings through alpha-beta pruning: 98.16258062841494%


After mutliple runs, the alphabeta player expands roughly 5500 nodes while the minimax player expands roughly 300,000 on a 3x4 board.  Obviously, the minimax player is expanding a substantially larger amount of nodes than the alphabeta player, yet both players should return the same move.  It is clear that the alphabeta player is much more efficient than the minimax player with around 98% savings.

### (2) A* Search

In Part II of this project, you need to implement a player strategy to employ A* Search in order to win at ConnectFour. To test your A* player, play 10 games against the random player and 10 games against the AB pruning player. 

Write an explanation of this strategy's implementation and performance in comparison to the random player and the AB Pruning player from **1d**.

A lot of the code that you wrote for the minimax player and the Connect Four game structure can be reused for the A* player. However, you will need to write a new utility function for A* that considers the path cost and heuristic cost.


### (2a) Define a heuristic function
Your A* player will need to use a heuristic function. You have two options for heurstics: research an existing heuristic for Connect Four, or games similar to Connect Four, and use that. Or, design your own heuristic. Write a one-paragraph description of the heuristic you're using, including a citation if you used an existing heuristic.

For the A* algorithm, I choose to use a combination of exisitng Connect Four heuristics.  First, I altered the Connect Four class's utility function to return the maximum number of a player's pieces in a row from a single move.  This utility function is adapted from one proposed in *Research on Different Heuristics for Minimax Algorithm Insight from Connect-4 Game* [1].  Since this new utility function allows non-terminal states to provide information to the algorithm, I was able to implent a heuristic cutoff from the class GitHub [2] so that A* only looks the neccessary amount of moves ahead and can be run on larger boards.  I also used the new utility function to find which next move would result in the largest utility and recurse down that move first.  Lastly, I implemented a heuristic mentioned in class that returns the action with the largest end utility and lowest recursion depth, or the move that results in a win in the fewest moves.

### (2b) Compare A* to other algorithms
Next, play 10 games of Connect Four using your A* player and a random player and 10 games against the AB pruning player. In four or five paragraphs, report on the outcome. Did one player win more than the other? How often was the game a draw? How many moves did each player make? Were there situations where one player appeared to do better than the other? Given the outcome, are there other heuristics you would like to implement?

In [34]:
def cutoff_depth(d): #cutoff_depth function to stop recursion at certain depth
    ''' Taken from CSCI 3202 AIMA GitHub Repo games4e.ipynb [2]'''
    return lambda game, state, depth: depth > d

def astar_player(game, cutoff=cutoff_depth(6)):
    state = game.state
    player = state.to_move
    
    def max_value(state, alpha, beta, cut_depth): 
        if game.game_over(state) or cutoff(game, state, cut_depth): #if at terminal state or cut off depth
            return game.utility(state, player), 0 #return game utility and start reverse depth counter
        v = -np.inf
        results = {}
        for a in state.moves: 
            r = game.result(a,state)
            results.update({r: r.utility})
        sorted_results= sorted(results.items(), key=lambda x:x[1], reverse=True) #sort actions based on utilites
        #explore best actions first
        for r,_ in sorted_results: #for each result 
            v2, depth = min_value(r, alpha, beta, cut_depth+1)
            depth+=1
            if v2 > v:
                v=v2
                alpha = max(alpha, v)
            if v >= beta: #prune
                return v, depth
            
        return v, depth

    def min_value(state, alpha, beta, cut_depth):
        if game.game_over(state) or cutoff(game, state, cut_depth): #if at terminal state or cut off depth
            return game.utility(state, player), 0  #return game utility and start revrse depth counter
        v, move = +np.inf, None
        results = {}
        for a in state.moves: 
            r = game.result(a,state)
            results.update({r: r.utility})
        sorted_results=sorted(results.items(), key=lambda x:x[1], reverse=True) #sort actions based on utilites
        for r,_ in sorted_results:
            v2, depth = max_value(r, alpha, beta, cut_depth+1)
            depth+=1
            if v2 < v:
                v=v2
                beta = min(beta, v)
            if v <= alpha: #prune
                return v, depth
        return v, depth

    # Body of minmax_decision:
    best_score = -np.inf
    beta = +np.inf
    best_action = None
    moves={}
    for a in state.moves:
        v, depth = min_value(game.result(a, state), best_score, beta, 0)
        if v > best_score:
            best_score = v
            best_action = a
            moves.update({(best_action, depth): v}) #keep track of moves, their depth, and their returned utility value
    max_moves = [key for key, value in moves.items() if value == max(moves.values())] #sort moves based on best utility value 
    sorted_moves= sorted(max_moves, key=lambda t: t[1]) #sort moves based on depth
    return sorted_moves[0][0] #return action w lowest depth


In [22]:
wins_r1 = np.array([])
wins_r2 = np.array([])
for k in range(10): #play 10 games
    #6x7 board with 4 to win
    c4_4 = ConnectFour().play_game(astar_player, random_player) #astar is player 1
    wins_r1 = np.append(wins_r1, c4_4[0])
    c4_5 = ConnectFour().play_game(random_player, astar_player) #astar is player 2
    wins_r2 = np.append(wins_r2, c4_5[0])
    if k==0: #display first final game board for both types of games
        print('Example of A* v Random Player')
        c4_4[1].display()
        print('Example of Random Player v A*')
        c4_5[1].display()
        print('--')
        

def display_stats(nwins, trials):
    print('A* v Random Player')
    print('Percent of Player 1 wins: ' + str(100*(wins_r1 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_r1 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins): #sum up all values that are not terminal states
        draws += (wins_r1 == j).sum()
        draws += (wins_r1 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')
    
    print('Random Player v A*')
    print('Percent of Player 1 wins: ' + str(100*(wins_r2 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_r2 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins):
        draws += (wins_r2 == j).sum()
        draws += (wins_r2 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')

display_stats(4, 10)

Example of A* v Random Player
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . . . . 
. . . . B . . 
R R R R B B . 
Example of Random Player v A*
. . . . . . . 
. . . . . . . 
. . . B . . . 
. . B R . . R 
. B R R B . R 
B R R B B B R 
--
A* v Random Player
Percent of Player 1 wins: 100.0%
Percent of Player 2 wins: 0.0%
Percent of Draws: 0.0%
Random Player v A*
Percent of Player 1 wins: 0.0%
Percent of Player 2 wins: 100.0%
Percent of Draws: 0.0%


In [23]:
wins_ab1 = np.array([])
wins_ab2 = np.array([])
for k in range(10): #play 10 games
    #3x4 board with 3 to win 
    c4_6 = ConnectFour(3,4,3).play_game(astar_player, alphabeta_player) #astar is player 1
    wins_ab1 = np.append(wins_ab1, c4_6[0])
    c4_7 = ConnectFour(3,4,3).play_game(alphabeta_player, astar_player) #astar is player 2
    wins_ab2 = np.append(wins_ab2, c4_7[0])
    if k==0: #display first final game board for both types of games
        print('Example of A* v Alpha-Beta')
        c4_6[1].display()
        print('Example of Alpha-Beta v A*')
        c4_7[1].display()
        print('--')
        

def display_stats(nwins, trials):
    print('A* v Alpha-Beta')
    print('Percent of Player 1 wins: ' + str(100*(wins_ab1 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_ab1 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins): #sum up all values that are not terminal states
        draws += (wins_ab1 == j).sum()
        draws += (wins_ab1 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')
    
    print('Alpha-Beta v A*')
    print('Percent of Player 1 wins: ' + str(100*(wins_ab2 == nwins).sum()/trials) + '%')
    print('Percent of Player 2 wins: ' + str(100*(wins_ab2 == -nwins).sum()/trials) + '%')
    draws = 0 
    for j in range(1, nwins):
        draws += (wins_ab2 == j).sum()
        draws += (wins_ab2 == -j).sum()
    print('Percent of Draws: ' + str(100*draws/trials) + '%')
    
display_stats(3, 10)

Example of A* v Alpha-Beta
R . . . 
R B . . 
R B B R 
Example of Alpha-Beta v A*
R . . . 
R . B . 
R B B R 
--
A* v Alpha-Beta
Percent of Player 1 wins: 100.0%
Percent of Player 2 wins: 0.0%
Percent of Draws: 0.0%
Alpha-Beta v A*
Percent of Player 1 wins: 100.0%
Percent of Player 2 wins: 0.0%
Percent of Draws: 0.0%


To properly evaluate A*, I ran 10 games on a 6x7 board with A* as player 1 and as player 2 against a random player. I also ran 10 games on a 3x4 board with A* as player 1 and as player 2 against an alphabeta player.

For A* vs a random player, it is clear that A* is the better player. If A* is player 1, it will always win against a random player.  If A* is player 2, it experiences the player 2 disadvantage but will rarely lose. However, the final game boards where A* player goes second were more full than those where A* plays first, showing that even though A* player 2 will win against a random player, it must take a substantial amount of moves to do so.

For A* v an alphabeta player, the player who goes first will always win, no matter the algorithm. After analyzing the final game boards of when A* went first and second, I thought it was interesting that even though the boards were different, the winning 3 pieces were in the same position and each game took the same amount of moves.  I think this might be due to the fact that both algorithms have a similar implementation.  Playing two alphabeta players against each other on a 3x4 board also gives player 1 winning 100% of the time.

Given this outcome, I think that by implementing other heuristics it would be possible to produce an A* algorithm that wins 100% of the time against an alpha-beta player, no matter the order of players. One heuristic that can be added to my A* implementation is the Monte Carlo Tree search which combines the minimax algorithm with an expected-outcome model created from random game playthroughs.

[1] Kang, Xiyu, et al. “Research on Different Heuristics for Minimax Algorithm Insight from Connect-4 Game.” *Journal of Intelligent Learning Systems and Applications*, vol. 11, no. 02, 2019, pp. 15–31., https://doi.org/10.4236/jilsa.2019.112002. 

[2] AIMA (2019) aima-python [Source code]. https://github.com/aimacode/aima-python