# Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening

Prashant Kumar Thakur

# Table of Contents
* [Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening](#Assignment-4:-Negamax-with-Alpha-Beta-Pruning-and-Iterative-Deepening)
	* [Initial Code](#Initial-Code)
	* [Add moves counter](#Add-moves-counter)
	* [negamaxIDS](#negamaxIDS)
	* [negamaxIDSab](#negamaxIDSab)
	* [Grading](#Grading)
	* [Extra Credit](#Extra-Credit)


Reference: The code provided by the professor was used directly in this notebook.

For this assignment, I have investigated the advantages of alpha-beta
pruning applied to Tic-Tac-Toe.  Following are the steps:

## Negamax implementation 

The negamax is an algorithm implemented to search for the move that would lead to a win. It is implemented in zero sum game where all the utility value (for the win and loose) sums up to zero. In the tic-tac-toe we assume the loss as -1, draw as 0 and win as 1 which basically is a zero-sum game so we can implement negamax to it. If the game is not a zero-sum game then we might have to consider the min-max algorithm. This function takes the depth and tries to search for the best move for the player which optimize the chance of his/her winning. Every call of this function returns the value which is negated (because the next recursion will be the opponent turn who would try to maximize the value in his turn) and then the function is recursively called untill the depth is exhusted or the best move for the player is found. 

In [7]:
def negamax(game, depthLeft):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None # call to negamax knows the move
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamax(game, depthLeft-1)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
    return bestValue, bestMove

## NegamaxIDS implementation

The negamaxIDS calls the negamax module recursively till the given depth. The `depthLeft+1` is used so that the depth is reached. For instance: if depthLimit was set to 9 then the search should proceed to 9th level and not terminate on 8th level. Additionally, a method `getWinningValue()` was implemented to generalize the game. It simply returns the utility value assigned for the win. For this instance of tic-tac-toe the win state has been assigned `1` and this tells that if the move has the utility value of 1 that means it is a winning move. Whenever a winning move has been found the current score and move is returned. Similarly, if there is no winning moves then the best move that has been found so far is stored in `bestValue` and `bestMove`. If the depth is exhusted and the winning move is not found then the best move and corresponding value is returned by this function.

In [8]:
def negamaxIDS(game, depthLeft):
    # Initialize an empty best moves and score value.
    bestValue, bestMove = None, None
    for depth in range(depthLeft+1):
        value,move = negamax(game,depth)
        if game.getWinningValue() == value:
            return value, move # Winning step found so break out of further search.
        # Return the most possible value that can be achieved
        if bestValue is None or bestValue < value:
            bestValue = value
            bestMove = move
    return bestValue, bestMove

## NegamaxAB implementation

This module uses most of the fundamental functionality of negamax function defined above. The function takes the alpha-beta pruning into consideration which simply means the player is only going to consider those moves which would lead to a win else it would simply skip exploring those moves in the game tree. The initial alpha is set to -infinity because we want to get the maximum possible value to get to a win, on the other hand the beta value is set to infinity so that we can keep track of the value we are trying to beat. The value of alpha and beta gradually changes based on the move in the program. The recursive call for the function is the opponent tern so we have to swap the values of alpha and beta and negate them as the opponent would try to sort its chances of winning by maximizing its moves. The bestValue is keept ini track to make sure the search returns the possible moves if it's not a win. However, if the best move is found which means the value of bestValue calculated for the move is greater than the beta then we are sure that the step that the opponent took can be turned into a win with the move with the bestValue >= beta.

In [9]:
def negamaxAB(game, depthLeft, alpha=-float('inf'), beta=float('inf')):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamaxAB(game, depthLeft-1, -beta, -alpha)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            bestValue = value
            bestMove = move
        alpha = max(bestValue, alpha)
        if beta <= bestValue:
            break
    return bestValue, bestMove

## negamaxIDSab implementation

The function `negamaxIDSab` is simply a recursive implementation of negamaxAB.

In [10]:
def negamaxIDSab(game, depthLeft):
    bestValue, bestMove = None, None
    for depth in range(depthLeft+1):
        value,move = negamaxAB(game,depth)
        if game.getWinningValue() == value:
            return value, move # Winning step found so break out of further search.
        # Return the most possible value that can be achieved
        if bestValue is None or bestValue < value:
            bestValue = value
            bestMove = move
    return bestValue, bestMove

## EBF implementation

The EBF function has been used directly from the Professor's solution which computes the effective branching factor for the given number of nodes and depth value.

In [11]:
def ebf(nNodes, depth, precision=0.01):
    if nNodes == 0:
        return 0

    def ebfRec(low, high):
        mid = (low + high) * 0.5
        if mid == 1:
            estimate = 1 + depth
        else:
            estimate = (1 - mid**(depth + 1)) / (1 - mid)
        if abs(estimate - nNodes) < precision:
            return mid
        if estimate > nNodes:
            return ebfRec(low, mid)
        else:
            return ebfRec(mid, high)

    return ebfRec(1, nNodes)

## Tic-Tac-Toe Implementation

The TTT class has been changed a bit to count the numbers of move made by different algorithms. Similarly, a function getWinningScore was also implemented to get the value for winning.

In [12]:
class TTT(object):

    def __init__(self):
        self.board = [' ']*9
        self.player = 'X'
        if False:
            self.board = [' ', ' ', 'O', 'X', 'O', 'O', ' ', ' ', ' ']
            self.player = 'O'
        self.playerLookAHead = self.player
        self.movesExplored = 0

    def locations(self, c):
        return [i for i, mark in enumerate(self.board) if mark == c]

    def getMoves(self):
        moves = self.locations(' ')
        return moves

    def getUtility(self):
        whereX = self.locations('X')
        whereO = self.locations('O')
        wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
                [0, 3, 6], [1, 4, 7], [2, 5, 8],
                [0, 4, 8], [2, 4, 6]]
        isXWon = any([all([wi in whereX for wi in w]) for w in wins])
        isOWon = any([all([wi in whereO for wi in w]) for w in wins])
        if isXWon:
            return 1 if self.playerLookAHead is 'X' else -1
        elif isOWon:
            return 1 if self.playerLookAHead is 'O' else -1
        elif ' ' not in self.board:
            return 0
        else:
            return None  ########################################################## CHANGED FROM -0.1

    def isOver(self):
        return self.getUtility() is not None

    def makeMove(self, move):
        self.board[move] = self.playerLookAHead
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'
        self.movesExplored += 1

    def changePlayer(self):
        self.player = 'X' if self.player == 'O' else 'O'
        self.playerLookAHead = self.player

    def unmakeMove(self, move):
        self.board[move] = ' '
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def __str__(self):
        s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
        return s

    def getNumberMovesExplored(self):
        return self.movesExplored

    def getWinningValue(self):
        return 1

Check that the following function `playGame` runs
correctly. Notice that we are using *negamax* to find the best move for
Player X, but Player O, the opponent, is using function *opponent*
that follows the silly strategy of playing in the first open position.

In [22]:
def opponent(board):
    return board.index(' ')

def play(game,opponent,depthLimit,func):
    print(game)
    while not game.isOver():
        score,move = func(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)   ### FIXED ERROR IN THIS LINE!
            print(game)
            game.changePlayer()
            
import copy
def playGame(game,opponent,depthLimit):
    data = [] # [negamax, negamaxIDS, negamaxIDSab]
    initialGame = copy.deepcopy(game)
    print("negamax:")
    play(game,opponent,depthLimit,negamax)
    data.append(['negamax',game.board.count('X'),game.getNumberMovesExplored(),game.board.count('X')+game.board.count('O')])
    game = copy.deepcopy(initialGame)
    print("negamaxIDS:")
    play(game,opponent,depthLimit,negamaxIDS)
    data.append(['negamaxIDS',game.board.count('X'),game.getNumberMovesExplored(),game.board.count('X')+game.board.count('O')])
    game = copy.deepcopy(initialGame)
    print("negamaxIDSab:")
    play(game,opponent,depthLimit,negamaxIDSab)
    data.append(['negamaxIDSab',game.board.count('X'),game.getNumberMovesExplored(),game.board.count('X')+game.board.count('O')])
    
    for i in data:
        print("{0} made {1} moves. {2} moves explored for ebf({2}, {3}) of {4:.3f}".format(i[0],i[1],i[2],i[3],ebf(i[2],i[3])))
    


In [23]:
game = TTT()
playGame(game,opponent,10)

negamax:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 
negamaxIDS:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
negamaxIDSab:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
negamax

## Grading

As always, download and extract from [A4grader.tar](http://www.cs.colostate.edu/~anderson/cs440/notebooks/A4grader.tar)

In [25]:
%run -i A4grader.py


Testing negamax starting from ['O', 'X', ' ', 'O', ' ', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of 1

--- 10/10 points. negamax correctly explored 124 states.

Testing negamax starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of -1 and move of 5

Testing negamaxIDS with max depth of 5, starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamaxIDS correctly returns value of -1 and move of 5

Testing negamaxIDSab starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 20/20 points. negamaxIDSab correctly returns value of -1 and move of 5

Testing playGame with opponent that always plays in highest numbered position.
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 8
X| | 
-----
 | | 
-----
 | |O
Player X to 2 for score 1
X| |X
-----
 | | 
-----
 | |O
Player O to 7
X| |X
-----
 | | 
-----
 |O|O
Player X to 1 for 