# Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening

# Table of Contents
* [Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening](#Assignment-4:-Negamax-with-Alpha-Beta-Pruning-and-Iterative-Deepening)
	* [Initial Code](#Initial-Code)
	* [Add moves counter](#Add-moves-counter)
	* [negamaxIDS](#negamaxIDS)
	* [negamaxIDSab](#negamaxIDSab)


For this assignment, I investigated the advantages of alpha-beta pruning applied to Tic-Tac-Toe. Alpha-beta pruning is a way to cut off extra searching by "pruning" branches that would be a worse result than the current node. In order to test what this means I implemented the `negamax` implementaiton of alpha-beta pruning.

Negamax is similar to minimax, except that the max value is always returned and then negated. Each search it will find the best move for itself _or_ for its opponant (assuming it is playing vs the very best opponent).

I will also call Negamax with Iterative Deepening Search that I have shown in previous assignments. I will compare the results below.

## Initial Code

`negamax` will go through the depth given and recursively call itself looking for an end state. If no end state is found, or if game is over, return the utility of that state. In tic tac toe this is a -1 for loss, 1 for win, 0 for a non-ending move and `None` for a draw state.

as negamax moves through the moves to see what the values will be, it keeps track of a best value by negating the value returned and checking against the previous best value

In [29]:
def negamax(game, depthLeft):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None # call to negamax knows the move
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamax(game, depthLeft-1)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
    return bestValue, bestMove

`TTT` is the tic tac toe game class. It has a few helper functions for playing the game

In [30]:
class TTT(object):

    def __init__(self):
        self.board = [' ']*9 #create empty board
        self.player = 'X' # player X goes first
        if False: # debug code
            self.board = ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']
            self.player = 'X'
        self.playerLookAHead = self.player # next move is current player X
        self.movesExplored = 0 # used to keep track of moves

    def locations(self, c):
        # find the amount of characters in the board
        return [i for i, mark in enumerate(self.board) if mark == c]

    def getMoves(self):
        moves = self.locations(' ')
        return moves

    def getUtility(self):
        # utility is the value of the current move, 
        # if it's a win state return 1, -1 for loss, 0 for any move and None if moves aren't available
        whereX = self.locations('X')
        whereO = self.locations('O')
        wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
                [0, 3, 6], [1, 4, 7], [2, 5, 8],
                [0, 4, 8], [2, 4, 6]]
        isXWon = any([all([wi in whereX for wi in w]) for w in wins])
        isOWon = any([all([wi in whereO for wi in w]) for w in wins])
        if isXWon:
            return 1 if self.playerLookAHead is 'X' else -1
        elif isOWon:
            return 1 if self.playerLookAHead is 'O' else -1
        elif ' ' not in self.board:
            return 0
        else:
            return None  ########################################################## CHANGED FROM -0.1

    def isOver(self):
        return self.getUtility() is not None

    def getWinningValue(self):
        # in tic tac toe, 1 is a winning value
        return 1
    
    def getMovesExplored(self):
        return self.movesExplored

    def makeMove(self, move):
        self.movesExplored += 1
        self.board[move] = self.playerLookAHead
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def changePlayer(self):
        self.player = 'X' if self.player == 'O' else 'O'
        self.playerLookAHead = self.player

    def unmakeMove(self, move):
        self.board[move] = ' '
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def __str__(self):
        s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
        return s

Below is a function `playGame` that runs a loop to calculate the negamax move for the first player, and a simple move (fill in first blank) for second player. Output below.

In [31]:
def opponent(board):
    return board.index(' ') # opponent is simple and just places an O in the first blank

def playGame(game,opponent,depthLimit):
    """ plays the game using negamax
    prints values of steps along the way to show the game progressing"""
    print(game)
    while not game.isOver():
        score,move = negamax(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)
            print(game)
            game.changePlayer()

In [32]:
game = TTT()
# print(game.getUtility())
playGame(game,opponent,20)
game.getMovesExplored()

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 


558334

## Iterative Deepening Search

Player one could make better moves. Let's use iterative deepening search to find the best moves and get a nicer victory -- instead of just playing the game, IDS will find the most shallow win route similar to previous assignments -- this will significantly reduce the number of moves we have to explore. Essentially, once the move of 1 is returned, we don't need to search any more!

In [33]:
def negamaxIDS(game, maxDepth):
    """ combine negamax with an iterative deepening search range(1, maxDepth)"""
    
    bestScore = -float('infinity') # keep track of the best move found
    bestMove = None # initial best move is nothing
    
    for depth in range(1, maxDepth):
        score, move = negamax(game, depth)
        if (score != None and bestScore < score):
            bestScore = score
            bestMove = move
        
        if (move == None):
            continue
        
        if (score == game.getWinningValue()):
            return (score, move) # we like winning, return
        
    return (bestScore, bestMove) # if nothing was found that is a win, return best found so far

In [34]:
def playGameIDS(game,opponent,depthLimit):
    """ same as playgame but calls negamaxIDS"""
    print(game)
    while not game.isOver():
        score,move = negamaxIDS(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)
            print(game)
            game.changePlayer()

In [35]:
game = TTT()
playGameIDS(game,opponent,20)
game.getMovesExplored()

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 


23338

Phew! A better more 'obvious' win! Let me try to explain what happens: There is no need to explore other nodes if a win is only 1 move away (like the last move for player one). This can cut down on some of the searching and the number of moves explored. We can do even better though!

## Moves counter

In order to evaluate the efficiency of the search, I kept track of the number of nodes explored, which is the same as the number of moves explored during a game. I did this by adding a counter named `movesExplored` to the `TTT` constructor where it is initialized to 0 and incremented the counter in the `TTT.makeMove` method.  The method `ttt.getMovesExplored()` gets the current value. You can see the output at the end of the above example.

## negamaxIDSab

Ok time for the main event. The following function is a copy of the negamaxIDS but this time cuts down on extra searching by using alpha-beta pruning.

In [36]:
def negamaxAB(alpha, beta, game, depthLeft):    
    if game.isOver() or depthLeft == 0:       
        return game.getUtility(), None
    
    bestValue, bestMove = -float('infinity'), None
    for move in game.getMoves():
        game.makeMove(move)
        value, _ = negamaxAB(-beta, -alpha, game, depthLeft-1) # swap and negate the values
        game.unmakeMove(move)
        
        if value is None:
            continue
            
        value = - value
        if (value > bestValue):
            bestValue = value
            bestMove = move
        
        alpha = max(alpha, value)
        if (alpha >= beta):
            break
            
    return bestValue, bestMove

def negamaxIDSab(game, maxDepth):
    alpha = -float('infinity')
    beta = float('infinity')
    
    for depth in range(1, maxDepth):
        score, move = negamaxAB(alpha, beta, game, depth)
        
        if (move == None):
            continue
        
        if (score == game.getWinningValue()):
            return (score, move)
        
    return (alpha, move)

In [37]:
game = TTT()
negamaxAB(-float('infinity'), float('infinity'), game, 20)

(0, 0)

In [38]:
def playGameIDSab(game,opponent,depthLimit):
    """ same as playgame but calls into negamaxIDSab"""
    print(game)
    while not game.isOver():
        score,move = negamaxIDSab(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' , -score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)
            print(game)
            game.changePlayer()

In [39]:
game = TTT()
playGameIDSab(game, opponent, 20)
game.getMovesExplored()

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score -1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score -1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score -1
X|O|O
-----
X| | 
-----
X| | 


20804

Excellent. The IDS with alpha-beta pruning found the result in even fewer expanded nodes due to the fact that any path that has a option for the oposing player to win won't be explored. The assumption is that the opposing player would make that move and that makes all other paths useless.

## Play game!

This function will play the game for the three different functions (`negamax`, `negamaxIDS`, and `negamaxIDSab`). At the end of the games, it prints out the number of X's, the number of moves explored, the depth of the game, and the effective branching factor. 

In [40]:
def ebf(nNodes, depth, precision=0.01):
    """ calcluate the effective branching factor to a precision value"""
    if nNodes == 0:
        return 0

    def ebfRec(low, high):
        mid = (low + high) * 0.5
        if mid == 1:
            estimate = 1 + depth
        else:
            estimate = (1 - mid**(depth + 1)) / (1 - mid)
        if abs(estimate - nNodes) < precision:
            return mid
        if estimate > nNodes:
            return ebfRec(low, mid)
        else:
            return ebfRec(mid, high)

    return ebfRec(1, nNodes)

def playGames(opponent, depth):
    """ play 3 games of tic tac toe with the different algorithms: negamax, negamax with iterative deepening
    and negamax with iterative deepening and alpha-beta pruning. prints results """
    game = TTT()
    playGame(game, opponent, depth)
    depthExplored = len(game.locations('X')) + len(game.locations('O'))
    print('\n\n')
    
    game2 = TTT()
    playGameIDS(game2, opponent, depth)
    print('\n\n')
    
    game3 = TTT()
    playGameIDSab(game3, opponent, depth)
    print('\n\n')
    
    print('negamax made {} moves. {} moves explored for ebf({}, {}) of {}'
          .format(len(game.locations('X')), game.getMovesExplored(), 
                      game.getMovesExplored(), depthExplored, ebf(game.getMovesExplored(), depthExplored)))
          
    depthExplored2 = len(game2.locations('X')) + len(game2.locations('O'))
    print('negamaxIDS made {} moves. {} moves explored for ebf({}, {}) of {}'
          .format(len(game2.locations('X')), game2.getMovesExplored(), 
                      game2.getMovesExplored(), depthExplored2, ebf(game2.getMovesExplored(), depthExplored2)))
    
    depthExplored3 = len(game3.locations('X')) + len(game3.locations('O'))
    print('negamaxIDSab made {} moves. {} moves explored for ebf({}, {}) of {}'
          .format(len(game3.locations('X')), game3.getMovesExplored(), 
                      game3.getMovesExplored(), depthExplored3, ebf(game3.getMovesExplored(), depthExplored3)))
    

In [41]:
playGames(opponent, 25)

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 



 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 



 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score -1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score -1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score -1
X|O|O
-----
X| | 
-----
X| | 



negamax made 4 moves. 558334 m

## Conclusion

As the results show, negamax with alpha-beta pruning and iterative deepening has the lowest effective branching factor and explores the fewest amount of nodes. The results on the different algroithms in terms of total moves made are the same for iterative deepening. I imagine more complicated games would benefit much more from a total-moves standpoint when regarding iterative deepening vs not.

I had results that had negamax with even fewer nodes explored, closer to the professor's answer of ~6k explored. However, when I looked at the values for alpha/beta they weren't updating correctly. The current implementation of `negamaxAB` is more in-line with what the pseudocode in the book or found [here](https://en.wikipedia.org/wiki/Negamax#Negamax_with_alpha_beta_pruning), even though the results aren't as dramatic now.

Alpha beta pruning makes a lot of sense in a zero sum game like tic-tac-toe. There is a lot of wasted effort that can go into exploring, especially when you consider some of the odd game states that might occur where many substates could be explored even though the winning or losing solution is on the very first depth. The iterative deepening nature of the search prevents a lof of those nodes from being explored, but alpha-beta pruning teams up for even better results.