# Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening

# Table of Contents
* [Assignment 4: Negamax with Alpha-Beta Pruning and Iterative Deepening](#Assignment-4:-Negamax-with-Alpha-Beta-Pruning-and-Iterative-Deepening)
	* [Initial Code](#Initial-Code)
	* [Add moves counter](#Add-moves-counter)
	* [negamaxIDS](#negamaxIDS)
	* [negamaxIDSab](#negamaxIDSab)
	* [Grading](#Grading)
	* [Extra Credit](#Extra-Credit)


In [1]:
def negamax(game, depthLeft):
    # If at terminal state or depth limit, return utility value and move None
    if game.isOver() or depthLeft == 0:
        return game.getUtility(), None # call to negamax knows the move
    # Find best move and its value from current state
    bestValue, bestMove = None, None
    for move in game.getMoves():
        # Apply a move to current state
        game.makeMove(move)
        # Use depth-first search to find eventual utility value and back it up.
        #  Negate it because it will come back in context of next player
        value, _ = negamax(game, depthLeft-1)
        # Remove the move from current state, to prepare for trying a different move
        game.unmakeMove(move)
        if value is None:
            continue
        value = - value
        if bestValue is None or value > bestValue:
            # Value for this move is better than moves tried so far from this state.
            bestValue, bestMove = value, move
    return bestValue, bestMove

In [10]:
class TTT(object):

    def __init__(self):
        self.movesExplored = 0
        self.board = [' ']*9
        self.player = 'X'
        if True:
            self.board = ['X', 'X', ' ', 'X', 'O', 'O', ' ', ' ', ' ']
            self.player = 'O'
        self.playerLookAHead = self.player

    def locations(self, c):
        return [i for i, mark in enumerate(self.board) if mark == c]

    def getMoves(self):
        moves = self.locations(' ')
        return moves

    def getUtility(self):
        whereX = self.locations('X')
        whereO = self.locations('O')
        wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
                [0, 3, 6], [1, 4, 7], [2, 5, 8],
                [0, 4, 8], [2, 4, 6]]
        isXWon = any([all([wi in whereX for wi in w]) for w in wins])
        isOWon = any([all([wi in whereO for wi in w]) for w in wins])
        if isXWon:
            return 1 if self.playerLookAHead is 'X' else -1
        elif isOWon:
            return 1 if self.playerLookAHead is 'O' else -1
        elif ' ' not in self.board:
            return 0
        else:
            return None  ########################################################## CHANGED FROM -0.1

    def isOver(self):
        return self.getUtility() is not None

    def makeMove(self, move):
        self.movesExplored += 1
        self.board[move] = self.playerLookAHead
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def changePlayer(self):
        self.player = 'X' if self.player == 'O' else 'O'
        self.playerLookAHead = self.player

    def unmakeMove(self, move):
        self.board[move] = ' '
        self.playerLookAHead = 'X' if self.playerLookAHead == 'O' else 'O'

    def __str__(self):
        s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
        return s
    
    def getWinningValue(self):
        return 1
    
    def getNumberMovesExplored(self):
        return self.movesExplored

Check that the following function `playGame` runs
correctly. Notice that we are using *negamax* to find the best move for
Player X, but Player O, the opponent, is using function *opponent*
that follows the silly strategy of playing in the first open position.

In [3]:
def opponent(board):
    return board.index(' ')

def playGame(game,opponent,depthLimit):
    print(game)
    while not game.isOver():
        score,move = negamax(game,depthLimit)
        if move == None :
            print('move is None. Stopping.')
            break
        game.makeMove(move)
        print('Player', game.player, 'to', move, 'for score' ,score)
        print(game)
        if not game.isOver():
            game.changePlayer()
            opponentMove = opponent(game.board)
            game.makeMove(opponentMove)
            print('Player', game.player, 'to', opponentMove)   ### FIXED ERROR IN THIS LINE!
            print(game)
            game.changePlayer()

In [4]:
game = TTT()
playGame(game,opponent,20)

 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 1
X|O| 
-----
 | | 
-----
 | | 
Player X to 3 for score 1
X|O| 
-----
X| | 
-----
 | | 
Player O to 2
X|O|O
-----
X| | 
-----
 | | 
Player X to 4 for score 1
X|O|O
-----
X|X| 
-----
 | | 
Player O to 5
X|O|O
-----
X|X|O
-----
 | | 
Player X to 6 for score 1
X|O|O
-----
X|X|O
-----
X| | 


## Add moves counter

Evaluate the efficiency of the search by keepting track of the number of nodes explored, which is the same as the number of moves explored, during a game. Do this by adding a counter named `movesExplored` to the `TTT` constructor where it is initialized to 0 and increment the counter in the `TTT.makeMove` method.  Add a method `ttt.getNumberMovesExplored()` to get its current value.  So

    print('Number of moves explored', game.getMovesExplored())
    
will print the number of moves explored. You will not use a global variable for counting this time.

## negamaxIDS 

<font color='red'>UPDATED Oct 4</font>

Write a new function named `negamaxIDS` that performs an iterative deepening negamax search.  Replace the first line in the `while` loop of `playGame` with

        score,move = negamaxIDS(game,depthLimit)
        
where `depthLimit` is now the maximum depth and multiple `negamax` searches are performed for depth limits of $1, 2, \ldots,$ maximum depth.

But, when should you stop?  Can you stop before readhing the depthLimit?  If not, there is no point to doing iterative deepening.

For Tic-Tac-Toe, we can stop as soon as a call to `negamax` returns a winning move.  This will have a value of 1 for Tic-Tac-Toe.  To keep our `negamaxIDS` function general, add a method called `getWinningValue` to the `TTT` class that just returns 1.  Then `negamaxIDS` can call `game.getWinningValue()` to determine the value of a winning move for this game.  If the maximum depth is reached and no winning move has been found, return the best move found over all depth limts.

In [5]:
def negamaxIDS(game, maxDepth):
    bestValue, bestMove = None, None
    for depth in range(maxDepth):
        bestValue, bestMove = negamax(game,depth)
        if bestValue == game.getWinningValue():
            return bestValue, bestMove
    return bestValue, bestMove


## negamaxIDSab

In [6]:
def negamaxIDSab(game, maxDepth):
    def dls(depth, alpha, beta):
        bestValue, bestMove = None, None
        alpha, beta = -beta, -alpha
        if game.isOver() or depth == 0:
            return game.getUtility(), None
        for move in game.getMoves():
            game.makeMove(move)
            value, _ = dls(depth-1, alpha, beta)
            game.unmakeMove(move)
            if value is None:
                continue
            value = -value
            if bestValue is None or value > bestValue:
                bestValue, bestMove = value, move
            if bestValue >= beta:
                break
            alpha = max(alpha, bestValue)
        return bestValue, bestMove
    bestValue, bestMove = None, None
    for depth in range(maxDepth):
        bestValue, bestMove = dls(depth,float('-inf'),float('inf'))
        if bestValue == game.getWinningValue():
            return bestValue, bestMove
    return bestValue, bestMove

Now for the hardest part.  Make a new function `negamaxIDSab` by duplicating `negamaxIDS` and add the code to implement alpha-beta pruning.

## playGames

Now duplicate the game playing loop so three complete tic-tac-toe games are played.  Call this new version `playGames`. For the first game, use `negamax`. For the second game, use `negamaxIDS`.  For the third game, use `negamaxIDSab`.  At the end of each game, print the number of X's in the final board, the number moves explored, the depth of the game which is the number of moves made by X and O, and the effective branching factor.  When you run `playGames` you should see the tic-tac-toe positions after each move and, after all games are done, a line for each game like the following lines, which were <font color='red'>UPDATED Oct 8</font>.

    negamax made 4 moves. 558334 moves explored for ebf(558334, 7) of 6.46
    negamaxIDS made 3 moves. 23338 moves explored for ebf(23338, 5) of 7.26
    negamaxIDSab made 3 moves 6053 moves explored for ebf(6053, 5) of 5.48

Your results may be different. 

The value of the depth is the total number of moves made by X and by O during the search.  You can calculate this by keeping a list of all board states, or by just counting the number of X's and O's in the final board.

In [7]:
def playGames(opponent,depthLimit):
    def play(game,algorithm):
        print(game)
        while not game.isOver():
            score,move = algorithm(game,depthLimit)
            if move == None :
                print('move is None. Stopping.')
                break
            game.makeMove(move)
            print('Player', game.player, 'to', move, 'for score' ,score)
            print(game)
            if not game.isOver():
                game.changePlayer()
                opponentMove = opponent(game.board)
                game.makeMove(opponentMove)
                print('Player', game.player, 'to', opponentMove)   ### FIXED ERROR IN THIS LINE!
                print(game)
                game.changePlayer()
    
    def ebf (nNodes, depth):
        return 0 if depth == 0 else nNodes ** (1 / depth)

    formatString = '{} made {} moves. {} moves explored for ebf({}, {}) of {}'
    
    print('Negamax')
    negamax_game = TTT()
    play(negamax_game,negamax)
    negamax_playerXMoves = negamax_game.board.count('X')
    negamax_moves = negamax_game.board.count('X') + negamax_game.board.count('O')
    negamax_explored = negamax_game.getNumberMovesExplored()
    negamax_ebf = ebf(negamax_explored, negamax_moves)
    
    print('Negamax Iterative Deepening')
    negamaxIDS_game = TTT()
    play(negamaxIDS_game,negamaxIDS)
    negamaxIDS_playerXMoves = negamaxIDS_game.board.count('X')
    negamaxIDS_moves = negamaxIDS_game.board.count('X') + negamaxIDS_game.board.count('O')
    negamaxIDS_explored = negamaxIDS_game.getNumberMovesExplored()
    negamaxIDS_ebf = ebf(negamaxIDS_explored, negamaxIDS_moves)
    
    print('Negamax Iterative Deepening with Alpha Beta Pruning')
    negamaxIDSab_game = TTT()
    play(negamaxIDSab_game,negamaxIDSab)
    negamaxIDSab_playerXMoves = negamaxIDSab_game.board.count('X')
    negamaxIDSab_moves = negamaxIDSab_game.board.count('X') + negamaxIDSab_game.board.count('O')
    negamaxIDSab_explored = negamaxIDSab_game.getNumberMovesExplored()
    negamaxIDSab_ebf = ebf(negamaxIDSab_explored, negamaxIDSab_moves)
    
    
    print(formatString.format('negamax', negamax_playerXMoves, negamax_explored,
                              negamax_explored, negamax_moves, negamax_ebf))
    print(formatString.format('negamaxIDS', negamaxIDS_playerXMoves, negamaxIDS_explored,
                              negamaxIDS_explored, negamaxIDS_moves, negamaxIDS_ebf))
    print(formatString.format('negamaxIDSab', negamaxIDSab_playerXMoves, negamaxIDSab_explored,
                              negamaxIDSab_explored, negamaxIDSab_moves, negamaxIDSab_ebf))

Here are some example results. <font color='red'>Updated October 8, 3:15pm </font>

In [11]:
playGames(opponent, 9)

Negamax
X|X| 
-----
X|O|O
-----
 | | 
Player O to 2 for score -1
X|X|O
-----
X|O|O
-----
 | | 
Player X to 8
X|X|O
-----
X|O|O
-----
 | |X
Player O to 6 for score 1
X|X|O
-----
X|O|O
-----
O| |X
Negamax Iterative Deepening
X|X| 
-----
X|O|O
-----
 | | 
Player O to 2 for score -1
X|X|O
-----
X|O|O
-----
 | | 
Player X to 8
X|X|O
-----
X|O|O
-----
 | |X
Player O to 6 for score 1
X|X|O
-----
X|O|O
-----
O| |X
Negamax Iterative Deepening with Alpha Beta Pruning
X|X| 
-----
X|O|O
-----
 | | 
Player O to 2 for score -1
X|X|O
-----
X|O|O
-----
 | | 
Player X to 8
X|X|O
-----
X|O|O
-----
 | |X
Player O to 6 for score 1
X|X|O
-----
X|O|O
-----
O| |X
negamax made 4 moves. 40 moves explored for ebf(40, 8) of 1.5858331751372434
negamaxIDS made 4 moves. 223 moves explored for ebf(223, 8) of 1.96579446486959
negamaxIDSab made 4 moves. 91 moves explored for ebf(91, 8) of 1.757438653093661


## Grading

As always, download and extract from [A4grader.tar](http://www.cs.colostate.edu/~anderson/cs440/notebooks/A4grader.tar)

In [9]:
%run -i A4grader.py


Testing negamax starting from ['O', 'X', ' ', 'O', ' ', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of 1

--- 10/10 points. negamax correctly explored 124 states.

Testing negamax starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamax correctly returns value of -1 and move of 5

Testing negamaxIDS with max depth of 5, starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 10/10 points. negamaxIDS correctly returns value of -1 and move of 5

Testing negamaxIDSab starting from ['O', 'X', 'X', 'O', 'O', ' ', ' ', 'X', ' ']

--- 20/20 points. negamaxIDSab correctly returns value of -1 and move of 5

Testing playGame with opponent that always plays in highest numbered position.
 | | 
-----
 | | 
-----
 | | 
Player X to 0 for score 0
X| | 
-----
 | | 
-----
 | | 
Player O to 8
X| | 
-----
 | | 
-----
 | |O
Player X to 2 for score 1
X| |X
-----
 | | 
-----
 | |O
Player O to 7
X| |X
-----
 | | 
-----
 |O|O
Player X to 1 for 

## Extra Credit 

Earn one extra credit point for each of the following.

  - Implement another game and repeat the above steps.

  - Implement a random move chooser as the opponent (Player O) and determine how many times Player X can win against this opponent as an average over multiple games.