# Negamax with Alpha-Beta Pruning and Iterative Deepening

# Table of Contents
* [Setting up the Tic-Tac-Toe board](#Setting-up-the-Tic-Tac-Toe-board)
* [Negamax Implementation](#Negamax-Implementation)
* [Playing a Game](#Playing-a-Game)
* [Implementing negamaxIDS](#Implementing-negamaxIDS)
* [Implementing negamaxIDSab](#Implementing-negamaxIDSab)
* [Extra Credit](#Extra-Credit)

For this project, I will be using the adverserial search algorithm of Negamax to play Tic-Tac_Toe. I will also be exploring the efficiency that Alpha-Beta Pruning brings to Negamax.

## Setting up the Tic-Tac-Toe board

This class represents the various aspects of the game. The board is represented as a standard list and initialized with blanks. Player X will always start first and the valid moves are found through the locations of the blanks. We can then make moves by indicating which index we want to place the X or O in.

We determine if a game is over by calling the utility function. If 'None' is returned, then the game is not over yet. If a 0 is returned, then the game has ended in a draw. Otherwise, the player whose turn it is has won with utility of 1.

There is also a `movesExplored` variable that Negamax will use to track the number of moves it explores along its adverserial search.

In [1]:
class TTT(object):

    def __init__(self):
        self.board = [' ']*9
        self.player = 'X'
        if False:
            self.board = ['X', 'X', ' ', 'X', 'O', 'O', ' ', ' ', ' ']
            self.player = 'O'
        self.playerLookAHead = self.player
        self.movesExplored = 0

    def locations(self, c):
        return [i for i, mark in enumerate(self.board) if mark == c]

    def getMoves(self):
        moves = self.locations(' ')
        return moves

    def getUtility(self):
        whereX = self.locations('X')
        whereO = self.locations('O')
        wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
                [0, 3, 6], [1, 4, 7], [2, 5, 8],
                [0, 4, 8], [2, 4, 6]]
        isXWon = any([all([wi in whereX for wi in w]) for w in wins])
        isOWon = any([all([wi in whereO for wi in w]) for w in wins])
        if isXWon:
            return 1 if self.playerLookAHead is 'X' else -1
        elif isOWon:
            return 1 if self.playerLookAHead is 'O' else -1
        elif ' ' not in self.board:
            return 0
        else:
            return None  ########################################################## CHANGED FROM -0.1

    def isOver(self):
        return self.getUtility() is not None

    def makeMove(self, move):
        self.board[move] = self.playerLookAHead
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'
        self.movesExplored += 1

    def changePlayer(self):
        self.player = 'X' if self.player == 'O' else 'O'
        self.playerLookAHead = self.player

    def unmakeMove(self, move):
        self.board[move] = ' '
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def getNumberMovesExplored(self):
        return self.movesExplored

    def getWinningValue(self):
        return 1

    def __str__(self):
        s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
        return s

## Negamax Implementation

Negamax is a simpler way to implement the *MiniMax* algorithm. By negating the utility values achieved at the leaf nodes, the player at each level will only have to choose the maximum value available to them. This is simpler than the *MiniMax* algortihm because it does not have to differentiate between choosing the Max or Mini value depending on which player's turn it is.

In [2]:
def negamax(game, depthLeft):
    '''
    Returns the move with the best utility.
    :param game: 
    :param depthLeft: 
    :return: 
    '''
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None

    # Find best move and its value from current state
    bestValue, bestMove = None, None

    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)

        # Use depth-first search to find eventual utility value and back it up.
        # Negate it because it will come back in context of next player
        value, _ = negamax(game, depthLeft-1)

        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)

        if value is None:
            continue

        # negate the value returned from recursive call
        value = -value

        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
    return bestValue, bestMove

## Playing a Game

In order to play Tic-Tac-Toe games, we will have player X's move determined by `negamax`. But what about the opponent's move? Well, let's have the opponenet employ the silly strategy of playing the first open position. Thus, we can write an `opponent` function as follows:

In [3]:
def opponent(board):
    '''
    Returns the first open position.
    :param board: 
    :return: 
    '''
    return board.index(' ')

Now we can implement a `playGame` function that pits the strategy of Negamax against the bold strategy of choosing the first open position.

In [4]:
def playGame(game, opponent, negamaxF, depthLimit):
    '''
    Plays a game with the specified negamax algorithm.
    :param game:
    :param opponent:
    :param negamaxF:
    :param depthLimit:
    :return:
    '''
    print(game)
    while not game.isOver():
        score, move = negamaxF(game, depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)
            print(game)
            game.changePlayer()

Now we are ready to play a game of Tic-Tac-Toe.

*Note: the depthLimit of `20` below is not really necessary since Tic-Tac-Toe has a max depth of `9` since there are only nine positions on the board, but searching at a greater depth limit will still give the same result.*

In [5]:
game = TTT()
playGame(game, opponent, negamax, 20)

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 


## Implementing negamaxIDS 

Most times, the depth at which a win is achieved is not known beforehand. However, we can figure this out by combining iterative deepening search with `negamax` to form the `negamaxIDS` function.

For Tic-Tac-Toe, we can stop as soon as a call to `negamax` returns a winning move. To keep the `negamaxIDS` function general, I have added a method called `getWinningValue` to the `TTT` class that just returns 1.  Then, `negamaxIDS` can call `game.getWinningValue()` to determine the value of a winning move for this game.  If the maximum depth is reached and no winning move has been found, the best move found over all depth limts is returned.

In [6]:
def negamaxIDS(game, depthLimit):
    winningValue = game.getWinningValue()
    bestValue, bestMove = None, None
    for depth in range(1, depthLimit + 1):
        bestValue, bestMove = negamax(game, depth)
        if bestValue == winningValue:
            return bestValue, bestMove
    return bestValue, bestMove

## Implementing negamaxIDSab

The concept of Alpha-Beta Pruning is to limit the number of sub-trees explored. To help us understand how this concept works in Negamax, let us first examine the use of Alpha-Beta Pruning in *MiniMax*.

In the *MiniMax* algorithm, alpha and beta would be initialized to -$\infty$ and $\infty$ respectively. Alpha represents the best choice we have found so far at any choice point along the path for Max. Beta represents the best choice we have found so far at any choice point along the path for Min. Since we are playing from the point of view of X, we want to maximize our utility. So, if Min has found a move that is equal to or less than the best move Max has found (alpha), then we can prune away the rest of Min's search tree since Min will at least choose this move (beta) if not a worse move for Max.

With that in mind, we can implement Alpha-Beta Pruning for Negamax by creating a new function that takes in `alpha` and `beta` arguments and call it `negamaxAB`. We then negate and swap the `alpha` and `beta` values in the `negamaxAB` recursive call. We also incorporate the iterative deepening approach into the final `negamaxIDSab` function.

In [7]:
def negamaxAB(game, alpha, beta, depthLeft):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None

    # Find best move and its value from current state
    bestValue, bestMove = None, None

    for move in game.getMoves():
        # negate and swap alpha and beta values
        #alpha, beta = -beta, -alpha

        # Apply a move to current state
        game.makeMove(move)

        # Use depth-first search to find eventual utility value and back it up.
        # Negate it because it will come back in context of next player
        value, _ = negamaxAB(game, -beta, -alpha, depthLeft-1)

        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue

        # negate the value returned from recursive call
        value = -value

        if bestValue is None or value > bestValue:
            bestValue, bestMove = value, move
        if bestValue >= beta:
            # Mini will choose at least this value, so we can prune away rest of the tree
            return bestValue, bestMove
        alpha = max(bestValue, alpha)
    return bestValue, bestMove

In [8]:
def negamaxIDSab(game, depthLimit):
    alpha, beta = -float('infinity'), float('infinity')
    winningValue = game.getWinningValue()
    bestValue, bestMove = None, None
    for depth in range(1, depthLimit + 1):
        bestValue, bestMove = negamaxAB(game, alpha, beta, depth)
        if bestValue == winningValue:
            return bestValue, bestMove
    return bestValue, bestMove

## Playing Games

I have implemented a `playGames` function that plays games with all three Negamax implementations. At the end, a report is shown indicating the number of moves made by player X, moves explored, effective branching factor (EBF) for depth when game finished, and runtime for each negamax implementation.

The depth used in the EBF calculation is the total number of moves made by X and by O during the search.

In [9]:
import time  # needed to calculate runtime below

In [10]:
def ebf(nNodes, depth, precision=0.01):
    if nNodes == 0:
        return 0

    def ebfRec(low, high):
        mid = (low + high) * 0.5
        if mid == 1:
            estimate = 1 + depth
        else:
            estimate = (1 - mid**(depth + 1)) / (1 - mid)
        if abs(estimate - nNodes) < precision:
            return mid
        if estimate > nNodes:
            return ebfRec(low, mid)
        else:
            return ebfRec(mid, high)

    return ebfRec(1, nNodes)

In [11]:
def playGames(opponent, depthLimit):
    '''
    Plays three games with the three different negamax algorithms.
    Prints the moves made by X, moves explored, and effective branching factor for depth when game finished.
    :param opponent:
    :param depthLimit:
    :return:
    '''
    game1 = TTT()
    game2 = TTT()
    game3 = TTT()

    t0 = time.time()
    print('negamax:')
    playGame(game1, opponent, negamax, depthLimit)
    depth1 = len(game1.locations('X')) + len(game1.locations('O'))
    time1 = time.time() - t0

    t0 = time.time()
    print('negamaxIDS:')
    playGame(game2, opponent, negamaxIDS, depthLimit)
    depth2 = len(game2.locations('X')) + len(game2.locations('O'))
    time2 = time.time() - t0

    t0 = time.time()
    print('negamaxIDSab:')
    playGame(game3, opponent, negamaxIDSab, depthLimit)
    depth3 = len(game3.locations('X')) + len(game3.locations('O'))
    time3 = time.time() - t0

    print()
    print('negamax made {0} moves. {1} moves explored for ebf({2}, {3}) of {4:.2f} | Time taken: {5:.2f}s'.format(len(game1.locations('X')),
                                                                                           game1.getNumberMovesExplored(),
                                                                                           game1.getNumberMovesExplored(),
                                                                                           depth1, ebf(game1.getNumberMovesExplored(), depth1), time1))

    print('negamaxIDS made {0} moves. {1} moves explored for ebf({2}, {3}) of {4:.2f} | Time taken: {5:.2f}s'.format(len(game2.locations('X')),
                                                                                              game2.getNumberMovesExplored(),
                                                                                              game2.getNumberMovesExplored(),
                                                                                              depth2, ebf(game2.getNumberMovesExplored(), depth2), time2))

    print('negamaxIDSab made {0} moves. {1} moves explored for ebf({2}, {3}) of {4:.2f} | Time taken: {5:.2f}s'.format(len(game3.locations('X')),
                                                                                                game3.getNumberMovesExplored(),
                                                                                                game3.getNumberMovesExplored(),
                                                                                                depth3, ebf(game3.getNumberMovesExplored(), depth3), time3))

Let's play some games!

In [12]:
playGames(opponent, 9)

negamax:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 
negamaxIDS:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 
negamaxIDSab:
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X| | 
-----
X| | 

negama

## Extra Credit 

I implemented a random move chooser as the opponent (player O) and determined how many times player X could win, using the `negamaxIDSab` strategy, against this opponent as an average over multiple games.

### Random move opponent

In [13]:
import random

In [14]:
def rand_opponent(board):
    validMoves = [i for i in range(len(board)) if board[i] == ' ']
    move = random.choice(validMoves)
    return move

### Average wins of X

In [15]:
def avgWinsX(opponent, numGames):
    timesWon = 0
    for i in range(numGames):
        game = TTT()
        playGame(game, opponent, negamaxIDSab, 9)
        if game.isOver() == 1 and game.player == 'X':
            timesWon += 1
    return timesWon / numGames * 100

Let's see how player X will deal with the pseudo-unpredictableness of player O.

In [16]:
numGames = 7

In [18]:
print('On average, X won {0:.2f}% of the {1} games played.'.format(avgWinsX(rand_opponent, numGames), numGames))

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 7
X| | 
-----
 | | 
-----
 |O| 
Player X to 1 for score 1
X|X| 
-----
 | | 
-----
 |O| 
Player O to 4
X|X| 
-----
 |O| 
-----
 |O| 
Player X to 2 for score 1
X|X|X
-----
 |O| 
-----
 |O| 
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 5
X| | 
-----
 | |O
-----
 | | 
Player X to 1 for score 1
X|X| 
-----
 | |O
-----
 | | 
Player O to 7
X|X| 
-----
 | |O
-----
 |O| 
Player X to 2 for score 1
X|X|X
-----
 | |O
-----
 |O| 
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 1
X| | 
-----
 | | 
-----
 | | 
Player O to 5
X| | 
-----
 | |O
-----
 | | 
Player X to 1 for score 1
X|X| 
-----
 | |O
-----
 | | 
Player O to 2
X|X|O
-----
 | |O
-----
 | | 
Player X to 8 for score 1
X|X|O
-----
 | |O
-----
 | |X
Player O to 6
X|X|O
-----
 | |O
-----
O| |X
Player X to 4 for score 1
X|X|O
-----
 |X|O
-----
O| |X
 | | 
-----
 | | 
-----
 | | 
Player X to 

Well then, it looks like player X is the clear winner here!