# Game theory: Prisoner’s dilemma

These games, used when considering a game where players move or play their strategies simultaneously, are commonly used in many fields. From military strategies to collusion agreements, the analysis of these situations as simultaneous games can help us discover the best way to act.

The prisoner’s dilemma is probably the most widely used game in game theory. Its use has transcended Economics, being used in fields such as business management, psychology or biology, to name a few. Nicknamed in 1950 by Albert W. Tucker, who developed it from earlier works, it describes a situation where two prisoners, suspected of burglary, are taken into custody. However, policemen do not have enough evidence to convict them of that crime, only to convict them on the charge of possession of stolen goods.

## Prisoner's dilemma
If none of them confesses (they cooperate with each other), they will both be charged the lesser sentence, a year of prison each. The police will question them on separate interrogation rooms, which means that the two prisoners cannot communicate (hence imperfect information). The police will try to convince each prisoner to confess the crime by offering them a “get out of jail free card”, while the other prisoner will be sentenced to a ten years term. If both prisoners confess (and therefore they defect), each prisoner will be sentenced to eight years. Both prisoners are offered the same deal and know the consequences of each action (complete information) and are completely aware that the other prisoner has been offered the exact same deal (therefore, it’s common knowledge).



<table style="width:100%">
  <tr>
    <th><img src="photos/Prisoners-dilemma.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

## Nash equilibria

Nash equilibria are defined as the combination of strategies in a game in such a way, that there is no incentive for players to deviate from their choice. This is the best option a player can make, taking into account the other players’ decision and where a change in a player’s decision will only lead to a worse result if the other players stick to their strategy. One of the best known Nash equilibriums can be found in the prisoner’s dilemma. This concept belongs to game theory, specifically to non-cooperative games, and was named after John Nash who developed it.

There are a few consistency requirements that must be taken into account when dealing with Nash equilibria. One of them is known as common knowledge, which extends the necessity of complete information. Therefore, expectations about other player’s strategies must be rational.

A Nash equilibrium is therefore a combination of beliefs about probabilities over strategies and the choices of the other player. It is quite easy to understand this using an example, in this case the prisoner’s dilemma as depicted in the adjacent game matrix.

## Description:

Since prisoners cannot communicate and will (supposedly) make their decision at the same time, this is considered to be a simultaneous game, and can be analysed using the strategic form, as in the adjacent game matrix. As described before, if both prisoners confess the crime they will be charged an eight years sentence each. If neither confesses, they will be charged one year each. If only one confesses, that prisoner will go free, while the other will be charged a ten years sentence. These can be seen as the respective payoffs for each set of strategies.

Prisoner's dilemma - Nash and Pareto equilibria Eliminating all dominated strategies, in order to get the dominant strategy, can solve this game. This is, each prisoner will analyse their best strategy given the other prisoner’s possible strategies. Prisoner 1 (P1) has to build a belief about what choice P2 is going to make, in order to choose the best strategy. If P2 confesses (P2C), he will get either -8 or 0, and if he lies (P2L) he will get either -10 or -1. It can be easily seen that P2 will choose to confess, since he will be better off. Therefore, P1 must choose the best strategy given that P2 will choose to confess: P1 can either confess (P1C, which pays -8) or lie (P1L, which pays -10). The rational thing to do for P1 is to confess. Proceeding inversely, we analyse the beliefs of P2 about P1’s strategies, which gets us to the same point: the rational thing to do for P2 is to confess. Therefore, “to confess” is the dominant strategy. P1C, P2C is the Nash equilibrium in this game (underlined in red), since it is the set of strategies that maximise each prisoner’s utility given the other prisoner’s strategy.

Nash equilibriums can be used to predict the outcome of finite games, whenever such equilibrium exists. On the downside, we find the issue that arises when dealing with a Nash equilibrium that is neither social nor ethical, and where efficiency may be subjective, which is the case in the prisoner’s dilemma, where the Nash equilibrium does not meet the criteria for being Pareto optimal (underlined in green).



<table style="width:100%">
  <tr>
    <th><img src="photos/Prisoners-dilemma-Nash-and-Pareto-equilibria-600x336.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

## Generalisation of the game:

The prisoner’s dilemma is not always presented as we have seen in this case. Payoffs for each set of strategies will vary, depending on each person. However, there are a few rules that can be used to build a “proper” prisoner’s dilemma game.

Prisoner's dilemma - StructureIn the adjacent game matrix, we’ve renamed each player’s payoffs, in order to determine the conditions needed to design a prisoner’s dilemma game. In a traditional prisoner’s dilemma, we have: A > B > C > D (in absolute terms). In our previous example, this condition is met (A=10, B=8, C=1 and D=0). In every case, A>B and C>D imply that confess-confess is a Nash equilibrium.

It must be noted that the asymmetry of the game is not the important part of the prisoner’s dilemma. The interesting thing about this game is the fact that its Nash equilibrium is not socially optimum.



<table style="width:100%">
  <tr>
    <th><img src="photos/Prisoners-dilemma-Structure-600x336.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

## Repeated prisoner’s dilemma games:

In order to see what equilibrium will be reached in a repeated game of the prisoner’s dilemma kind, we must analyse two cases: the game is repeated a finite number of times, and the game is repeated an infinite number of times.

When the prisoners know the number of repetitions, it’s interesting to operate a backwards induction to solve the game. Consider the strategies of each player when they realise the next round is going to be the last. They behave as if it was a one-shot game, thus the Nash equilibrium applies, and the equilibrium would be confess-confess, just like in the one-time game. Now consider the game before the last. Since each player knows in the next, final round they are going to confess, there’s no advantage to lie (cooperate with each other) on this round either. The same logic applies for prior moves. Therefore, confess-confess is the Nash equilibrium for all rounds.

The situation with an infinite number of repetitions is different, since there will be no last round, a backwards induction reasoning does not work here. At each round, both prisoners reckon there will be another round and therefore there are always benefits arising form the cooperate (lie) strategy. However, prisoners must take into account punishment strategies, in case the other player confesses in any round.

In [7]:
# The player class introduces an artificial agent that implements the decision 
# process of game theory based on Nash decision making process);
#
# The game class (defines the structure of the game and calculates the pure strategies Nash Equilibrium of a game).
# 
# The current example applies the above classes to solve the two-person Prisoner Dilemma Game

class player:
    
    def __init__(self,name,order,strategySpace,payoffs,choice,suboptimal,strategies,state,gameplay):
        self.name = name
        self.order = order
        self.strategySpace = strategySpace
        self.payoffs = payoffs
        self.choice = choice
        self.suboptimal = suboptimal
        self.strategies = strategies
        self.state = state
        self.gameplay = gameplay


    def processGame(self,G):
        for i in range(0,len(G)):
            X = G[i]
            if X[0] == self.name:
                for j in range(1,len(X)):
                    Branch = X[j]
                    Alternative = list(Branch)
                    del Alternative[len(Alternative) - 1]
                    self.strategySpace = self.strategySpace + [tuple(Alternative)]
                    self.payoffs = self.payoffs + [Branch[len(Branch) - 1]]
        
    def evaluate(self):
        X = []
        for i in range(0,len(self.strategySpace)):            
            Alternative1 = self.strategySpace[i]
            for j in range(0,len(self.strategySpace)):
                Alternative2 = self.strategySpace[j]
                if Alternative1 != Alternative2:
                    if len(Alternative1) == len(Alternative2):
                        Compare = 0
                        for k in range(0,len(Alternative1) - 1):
                            if Alternative1[k] == Alternative2[k]:
                                Compare = Compare + 0
                            else:
                                Compare = Compare + 1
                        if Compare == 0:
                            PayoffCompare = [self.payoffs[i],self.payoffs[j]]
                            M = max(PayoffCompare)
                            if self.payoffs[i] == M:
                                self.choice = Alternative1
                                X = X + [self.choice]
                            else:
                                self.suboptimal = self.suboptimal + [Alternative1]
                            if self.payoffs[j] == M:
                                self.choice = Alternative2
                                X = X + [self.choice]
                            else:
                                self.suboptimal = self.suboptimal + [Alternative2]
            
        X = set(X)
        self.suboptimal = set(self.suboptimal)
        self.strategies = list(X - self.suboptimal)
        print ("\nStrategies selected by ", self.name,":")
        print (self.strategies)
        for l in range(0,len(self.strategies)):
            strategy = self.strategies[l]
            for m in range(0,len(strategy)):
                O = self.order[m]
                self.state[O] = strategy[m]
            self.gameplay = self.gameplay + [tuple(self.state)]
                    
        
        

class game:
    def __init__(self,players,structure,optimal):
        self.players = players
        self.structure = structure
        self.optimal = optimal

    def Nash(self,GP):
        Y = set(GP[0])
        for i in range(0,len(GP)):
            X = set(GP[i])
            Y = Y & X
        self.optimal = list(Y)
        if len(self.optimal) != 0:
            print ("\nThe pure strategies Nash equilibria are:")
            for k in range(0,len(self.optimal)):
                print (self.optimal[k])
        else:
            print ("\nThis game has no pure strategies Nash equilibria!")
 

In [8]:
           
#2 Person Prisoner Dilemma

# This version of the prisoner dilemma is based on a cooperation game between two political
# agents, such that each agent can either cooperate forming a partnership or defect, 
# in which case no partnership is formed. The payoffs are measured in units of political
# gains by each agent. The political gains are being measured on a scale of 0 to 5.

#Game structure:

GameA = ['A', # Player A
         ('C','C',3), # When B cooperates: if A cooperates, A receives 3 units of political gains
         ('C','D',5), # When B cooperates: if A defects, A receives 5 units of political gains
         ('D','C',0), # When B defects: if A cooperates, A receives 0 units of political gains
         ('D','D',1)] # When B defects: if A defects, A receives 1 unit of political gains
GameB = ['B', # Player B
         ('C','C',3), # When A cooperates: if B cooperates, B receives 3 units of political gains
         ('C','D',5), # When A cooperates: if B defects, B receives 5 units of political gains
         ('D','C',0), # When A defects: if B cooperates, B receives 0 units of political gains
         ('D','D',1)] # When A defects: if B defects, B receives 1 unit of political gains

#game(players,structure,plays,optimal)
Game = game(('A','B'),[GameA,GameB],None)


#player(self,name,order,strategySpace,payoffs,choice,suboptimal,strategies,state,gameplay):
PlayerA = player('A',(1,0), [], [], None, [], None, [0,0], [])
PlayerB = player('B',(0,1), [], [], None, [], None, [0,0], [])


Players = [PlayerA, PlayerB]

for i in range(0,len(Players)):
    Players[i].processGame(Game.structure)
    Players[i].evaluate()

GP = []

for i in range(0,len(Players)):
    X = Players[i].gameplay
    GP = GP + [X]

Game.Nash(GP)


Strategies selected by  A :
[('C', 'D'), ('D', 'D')]

Strategies selected by  B :
[('C', 'D'), ('D', 'D')]

The pure strategies Nash equilibria are:
('D', 'D')


# Minimax Algorithm
## Minimax Algorithm in Game Theory | (Introduction)


Minimax is a kind of backtracking algorithm that is used in decision making and game theory to find the optimal move for a player, assuming that your opponent also plays optimally. It is widely used in two player turn-based games such as Tic-Tac-Toe, Backgammon, Mancala, Chess, etc.

In Minimax the two players are called maximizer and minimizer. The maximizer tries to get the highest score possible while the minimizer tries to do the opposite and get the lowest score possible.

Every board state has a value associated with it. In a given state if the maximizer has upper hand then, the score of the board will tend to be some positive value. If the minimizer has the upper hand in that board state then it will tend to be some negative value. The values of the board are calculated by some heuristics which are unique for every type of game.

**Example:** <br>
Consider a game which has 4 final states and paths to reach final state are from root to 4 leaves of a perfect binary tree as shown below. Assume you are the maximizing player and you get the first chance to move, i.e., you are at the root and your opponent at next level. Which move you would make as a maximizing player considering that your opponent also plays optimally?

<table style="width:100%">
  <tr>
    <th><img src="photos/minmax.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

Since this is a backtracking based algorithm, it tries all possible moves, then backtracks and makes a decision.

- Maximizer goes LEFT: It is now the minimizers turn. The minimizer now has a choice between 3 and 5. Being the minimizer it will definitely choose the least among both, that is 3
- Maximizer goes RIGHT: It is now the minimizers turn. The minimizer now has a choice between 2 and 9. He will choose 2 as it is the least among the two values.

Being the maximizer you would choose the larger value that is 3. Hence the optimal move for the maximizer is to go LEFT and the optimal value is 3.

Now the game tree looks like below :

<table style="width:100%">
  <tr>
    <th><img src="photos/minmax1.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [1]:
# A simple Python3 program to find 
# maximum score that 
# maximizing player can get 
import math 
  
def minimax (curDepth, nodeIndex, 
             maxTurn, scores,  
             targetDepth): 
  
    # base case : targetDepth reached 
    if (curDepth == targetDepth):  
        return scores[nodeIndex] 
      
    if (maxTurn): 
        return max(minimax(curDepth + 1, nodeIndex * 2,  
                    False, scores, targetDepth),  
                   minimax(curDepth + 1, nodeIndex * 2 + 1,  
                    False, scores, targetDepth)) 
      
    else: 
        return min(minimax(curDepth + 1, nodeIndex * 2,  
                     True, scores, targetDepth),  
                   minimax(curDepth + 1, nodeIndex * 2 + 1,  
                     True, scores, targetDepth)) 

In [3]:
# Driver code 
scores = [3, 5, 2, 9, 12, 5, 23, 23] 
  
treeDepth = math.log(len(scores), 2) 
  
print("The optimal value is : ", end = "") 
print(minimax(0, 0, True, scores, treeDepth)) 

The optimal value is : 12


## Minimax Algorithm in Game Theory | (Introduction to Evaluation Function)

As seen in the above article, each leaf node had a value associated with it. We had stored this value in an array. But in the real world when we are creating a program to play Tic-Tac-Toe, Chess, Backgamon, etc. we need to implement a function that calculates the value of the board depending on the placement of pieces on the board. This function is often known as Evaluation Function. It is sometimes also called Heuristic Function.

The evaluation function is unique for every type of game. In this post, evaluation function for the game Tic-Tac-Toe is discussed. The basic idea behind the evaluation function is to give a high value for a board if maximizer‘s turn or a low value for the board if minimizer‘s turn.

For this scenario let us consider X as the maximizer and O as the minimizer.

Let us build our evaluation function :
 1. If X wins on the board we give it a positive value of +10.
 2. If O wins on the board we give it a negative value of -10.
 3. If no one has won or the game results in a draw then we give a value of +0.


<table style="width:100%">
  <tr>
    <th><img src="photos/TicTacToe.png" alt="Drawing" style="width:600px;"/></th>
    <th><img src="photos/TicTacToe1.png" alt="Drawing" style="width:600px;"/></th>
    <th><img src="photos/TicTacToe2-1.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [24]:
# Python program to compute evaluation function for 
# Tic Tac Toe Game. 

# Returns a value based on who is winning 
# board is the Tic-Tac-Toe board 
def evaluate(board):
    #  Checking for Rows for X or O victory.
    for row in board: 
        if row[0] == row[1] and row[1] == row[2]:
            if row[0] == 'x':
                return 10
            elif row[0] == 'o':
                return -10
    
    #  Checking for Columns for X or O victory.
    for columns in range(len(board)): 
        if board[0][columns] == board[1][columns] and board[1][columns] == board[2][columns]:
            if board[0][columns] == 'x':
                return 10
            elif board[0][columns] == 'o':
                return -10
            
    #  Checking for Diagonals for X or O victory.
    if board[0][0] == board[1][1] and board[1][1] == board[2][2]:
        if board[0][0] == 'x':
            return 10
        elif board[0][0] == 'o':
            return -10
        
    if board[0][2] == board[1][1] and board[1][1] == board[2][0]:
        if board[0][2] == 'x':
            return 10
        elif board[0][2] == 'o':
            return -10
        
    #  Else if none of them have won then return 0
    
    return 0

In [25]:
# Driver code 
board = [['x', '_', 'o'], ['_', 'x', 'o'], ['_', '_', 'x']]
  
value = evaluate(board) 
print("The value of this board is {0}".format(value)) 

The value of this board is 10


## Minimax Algorithm in Game Theory | (Tic-Tac-Toe AI – Finding optimal move)

Let us combine what we have learnt so far about minimax and evaluation function to write a proper Tic-Tac-Toe AI (Artificial Intelligence) that plays a perfect game. This AI will consider all possible scenarios and makes the most optimal move.

### Finding the Best Move :

We shall be introducing a new function called findBestMove(). This function evaluates all the available moves using minimax() and then returns the best move the maximizer can make. The pseudocode is as follows :

    function findBestMove(board):
        bestMove = NULL
        for each move in board :
            if current move is better than bestMove
                bestMove = current move
        return bestMove
### Minimax :

To check whether or not the current move is better than the best move we take the help of minimax() function which will consider all the possible ways the game can go and returns the best value for that move, assuming the opponent also plays optimally
The code for the maximizer and minimizer in the minimax() function is similar to findBestMove() , the only difference is, instead of returning a move, it will return a value. Here is the pseudocode :

    function minimax(board, depth, isMaximizingPlayer):

        if current board state is a terminal state :
            return value of the board
    
        if isMaximizingPlayer :
            bestVal = -INFINITY 
            for each move in board :
                value = minimax(board, depth+1, false)
                bestVal = max( bestVal, value) 
            return bestVal

        else :
            bestVal = +INFINITY 
            for each move in board :
                value = minimax(board, depth+1, true)
                bestVal = min( bestVal, value) 
            return bestVal
### Checking for GameOver state :

To check whether the game is over and to make sure there are no moves left we use isMovesLeft() function. It is a simple straightforward function which checks whether a move is available or not and returns true or false respectively. Pseudocode is as follows :


    function isMovesLeft(board):
        for each cell in board:
            if current cell is empty:
                return true
        return false
    
### Making our AI smarter :

One final step is to make our AI a little bit smarter. Even though the following AI plays perfectly, it might choose to make a move which will result in a slower victory or a faster loss. Lets take an example and explain it.

Assume that there are 2 possible ways for X to win the game from a give board state.

Move A : X can win in 2 move <br>
Move B : X can win in 4 moves <br>

Our evaluation function will return a value of +10 for both moves A and B. Even though the move A is better because it ensures a faster victory, our AI may choose B sometimes. To overcome this problem we subtract the depth value from the evaluated score. This means that in case of a victory it will choose a the victory which takes least number of moves and in case of a loss it will try to prolong the game and play as many moves as possible. So the new evaluated value will be

Move A will have a value of +10 – 2 = 8 <br>
Move B will have a value of +10 – 4 = 6 <br>

Now since move A has a higher score compared to move B our AI will choose move A over move B. The same thing must be applied to the minimizer. Instead of subtracting the depth we add the depth value as the minimizer always tries to get, as negative a value as possible. We can subtract the depth either inside the evaluation function or outside it. Anywhere is fine. I have chosen to do it outside the function. Pseudocode implementation is as follows.

    if maximizer has won:
        return WIN_SCORE – depth

    else if minimizer has won:
        return LOOSE_SCORE + depth  



## Minimax Algorithm in Game Theory | (Alpha-Beta Pruning)

Alpha-Beta pruning is not actually a new algorithm, rather an optimization technique for minimax algorithm. It reduces the computation time by a huge factor. This allows us to search much faster and even go into deeper levels in the game tree. It cuts off branches in the game tree which need not be searched because there already exists a better move available. It is called Alpha-Beta pruning because it passes 2 extra parameters in the minimax function, namely alpha and beta.

Let’s define the parameters alpha and beta.
Alpha is the best value that the maximizer currently can guarantee at that level or above.
Beta is the best value that the minimizer currently can guarantee at that level or above.

Pseudocode :

    function minimax(node, depth, isMaximizingPlayer, alpha, beta):

        if node is a leaf node :
            return value of the node
    
        if isMaximizingPlayer :
            bestVal = -INFINITY 
            for each child node :
                value = minimax(node, depth+1, false, alpha, beta)
                bestVal = max( bestVal, value) 
                alpha = max( alpha, bestVal)
                if beta <= alpha:
                    break
            return bestVal

        else :
            bestVal = +INFINITY 
            for each child node :
                value = minimax(node, depth+1, true, alpha, beta)
                bestVal = min( bestVal, value) 
                beta = min( beta, bestVal)
                if beta <= alpha:
                    break
            return bestVal
            
// Calling the function for the first time.
    minimax(0, 0, true, -INFINITY, +INFINITY)
    
Let’s make above algorithm clear with an example.


<table style="width:100%">
  <tr>
    <th><img src="photos/MIN_MAX1.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

- The initial call starts from A. The value of alpha here is -INFINITY and the value of beta is +INFINITY. These values are passed down to subsequent nodes in the tree. At A the maximizer must choose max of B and C, so A calls B first
- At B it the minimizer must choose min of D and E and hence calls D first.
- At D, it looks at its left child which is a leaf node. This node returns a value of 3. Now the value of alpha at D is max( -INF, 3) which is 3.
- To decide whether its worth looking at its right node or not, it checks the condition beta<=alpha. This is false since beta = +INF and alpha = 3. So it continues the search.
- D now looks at its right child which returns a value of 5.At D, alpha = max(3, 5) which is 5. Now the value of node D is 5
- D returns a value of 5 to B. At B, beta = min( +INF, 5) which is 5. The minimizer is now guaranteed a value of 5 or lesser. B now calls E to see if he can get a lower value than 5.
- At E the values of alpha and beta is not -INF and +INF but instead -INF and 5 respectively, because the value of beta was changed at B and that is what B passed down to E
- Now E looks at its left child which is 6. At E, alpha = max(-INF, 6) which is 6. Here the condition becomes true. beta is 5 and alpha is 6. So beta<=alpha is true. Hence it breaks and E returns 6 to B
- Note how it did not matter what the value of E‘s right child is. It could have been +INF or -INF, it still wouldn’t matter, We never even had to look at it because the minimizer was guaranteed a value of 5 or lesser. So as soon as the maximizer saw the 6 he knew the minimizer would never come this way because he can get a 5 on the left side of B. This way we didn't have to look at that 9 and hence saved computation time.
- E returns a value of 6 to B. At B, beta = min( 5, 6) which is 5.The value of node B is also 5

So far this is how our game tree looks. The 9 is crossed out because it was never computed.


<table style="width:100%">
  <tr>
    <th><img src="photos/MIN_MAX2.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

- B returns 5 to A. At A, alpha = max( -INF, 5) which is 5. Now the maximizer is guaranteed a value of 5 or greater. A now calls C to see if it can get a higher value than 5.
- At C, alpha = 5 and beta = +INF. C calls F
- At F, alpha = 5 and beta = +INF. F looks at its left child which is a 1. alpha = max( 5, 1) which is still 5.
- F looks at its right child which is a 2. Hence the best value of this node is 2. Alpha still remains 5
- F returns a value of 2 to C. At C, beta = min( +INF, 2). The condition beta <= alpha becomes false as beta = 2 and alpha = 5. So it breaks and it dose not even have to compute the entire sub-tree of G.
- The intuition behind this break off is that, at C the minimizer was guaranteed a value of 2 or lesser. But the maximizer was already guaranteed a value of 5 if he choose B. So why would the maximizer ever choose C and get a value less than 2 ? Again you can see that it did not matter what those last 2 values were. We also saved a lot of computation by skipping a whole sub tree.
- C now returns a value of 2 to A. Therefore the best value at A is max( 5, 2) which is a 5.
- Hence the optimal value that the maximizer can get is 5

This is how our final game tree looks like. As you can see G has been crossed out as it was never computed.

<table style="width:100%">
  <tr>
    <th><img src="photos/MIN_MAX3.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [45]:
# Python program to demonstrate 
# working of Alpha-Beta Pruning 

# Initial values of 
# Aplha and Beta 
import math 

MAX = 1000; 
MIN = -1000
  
# Returns optimal value for 
# current player(Initially called 
# for root and maximizer) 

def minimax (depth, nodeIndex, 
             maximizingPlayer, values,  
             alpha, beta): 
  
    # Terminating condition. i.e  
    # leaf node is reached 
    if (depth == 3):
        return values[nodeIndex]

    if (maximizingPlayer):
        
        best = MIN; 
  
        # Recur for left and  
        # right children 
        for i in range(2):
            val = minimax(depth + 1, nodeIndex * 2 + i, False, values, alpha, beta)
            best = max(best, val)
            alpha = max(alpha, best) 
  
            # Alpha Beta Pruning 
            if (beta <= alpha):
                break
            
        return best
    else:
        best = MAX
  
        # Recur for left and 
        # right children 
        for i in range(2): 
            val = minimax(depth + 1, nodeIndex * 2 + i, True, values, alpha, beta)
            best = min(best, val)
            beta = min(beta, best) 
  
            # Alpha Beta Pruning 
            if (beta <= alpha):
                break
                
        return best 

In [46]:
# Driver code 
values = [3, 5, 6, 9, 1, 2, 0, -1] 

Depth = math.log(len(scores), 2) 

print("The optimal value is : ", end = "") 
print( minimax(0, 0, True, values, MIN, MAX)) 

The optimal value is : 5


## Tic-Tac-Toe

In [11]:
import random

def drawBoard(board):
    # This function prints out the board that it was passed.

    # "board" is a list of 10 strings representing the board (ignore index 0)
    print('   |   |')
    print(' ' + board[7] + ' | ' + board[8] + ' | ' + board[9])
    print('   |   |')
    print('-----------')
    print('   |   |')
    print(' ' + board[4] + ' | ' + board[5] + ' | ' + board[6])
    print('   |   |')
    print('-----------')
    print('   |   |')
    print(' ' + board[1] + ' | ' + board[2] + ' | ' + board[3])
    print('   |   |')

def inputPlayerLetter():
    # Lets the player type which letter they want to be.
    # Returns a list with the player's letter as the first item, and the computer's letter as the second.
    letter = ''
    while not (letter == 'X' or letter == 'O'):
        print('Do you want to be X or O?')
        letter = input().upper()

    # the first element in the tuple is the player's letter, the second is the computer's letter.
    if letter == 'X':
        return ['X', 'O']
    else:
        return ['O', 'X']

def whoGoesFirst():
    # Randomly choose the player who goes first.
    if random.randint(0, 1) == 0:
        return 'computer'
    else:
        return 'player'

def playAgain():
    # This function returns True if the player wants to play again, otherwise it returns False.
    print('Do you want to play again? (yes or no)')
    return input().lower().startswith('y')

def makeMove(board, letter, move):
    board[move] = letter

def isWinner(bo, le):
    # Given a board and a player's letter, this function returns True if that player has won.
    # We use bo instead of board and le instead of letter so we don't have to type as much.
    return ((bo[7] == le and bo[8] == le and bo[9] == le) or # across the top
    (bo[4] == le and bo[5] == le and bo[6] == le) or # across the middle
    (bo[1] == le and bo[2] == le and bo[3] == le) or # across the bottom
    (bo[7] == le and bo[4] == le and bo[1] == le) or # down the left side
    (bo[8] == le and bo[5] == le and bo[2] == le) or # down the middle
    (bo[9] == le and bo[6] == le and bo[3] == le) or # down the right side
    (bo[7] == le and bo[5] == le and bo[3] == le) or # diagonal
    (bo[9] == le and bo[5] == le and bo[1] == le)) # diagonal

def getBoardCopy(board):
    # Make a duplicate of the board list and return it the duplicate.
    dupeBoard = []

    for i in board:
        dupeBoard.append(i)

    return dupeBoard

def isSpaceFree(board, move):
    # Return true if the passed move is free on the passed board.
    return board[move] == ' '

def getPlayerMove(board):
    # Let the player type in his move.
    move = ' '
    while move not in '1 2 3 4 5 6 7 8 9'.split() or not isSpaceFree(board, int(move)):
        print('What is your next move? (1-9)')
        move = input()
    return int(move)

def chooseRandomMoveFromList(board, movesList):
    # Returns a valid move from the passed list on the passed board.
    # Returns None if there is no valid move.
    possibleMoves = []
    for i in movesList:
        if isSpaceFree(board, i):
            possibleMoves.append(i)

    if len(possibleMoves) != 0:
        return random.choice(possibleMoves)
    else:
        return None

def getComputerMove(board, computerLetter):
    # Given a board and the computer's letter, determine where to move and return that move.
    if computerLetter == 'X':
        playerLetter = 'O'
    else:
        playerLetter = 'X'

    # Here is our algorithm for our Tic Tac Toe AI:
    # First, check if we can win in the next move
    for i in range(1, 10):
        copy = getBoardCopy(board)
        if isSpaceFree(copy, i):
            makeMove(copy, computerLetter, i)
            if isWinner(copy, computerLetter):
                return i

    # Check if the player could win on his next move, and block them.
    for i in range(1, 10):
        copy = getBoardCopy(board)
        if isSpaceFree(copy, i):
            makeMove(copy, playerLetter, i)
            if isWinner(copy, playerLetter):
                return i

    # Try to take one of the corners, if they are free.
    move = chooseRandomMoveFromList(board, [1, 3, 7, 9])
    if move != None:
        return move

    # Try to take the center, if it is free.
    if isSpaceFree(board, 5):
        return 5

    # Move on one of the sides.
    return chooseRandomMoveFromList(board, [2, 4, 6, 8])

def isBoardFull(board):
    # Return True if every space on the board has been taken. Otherwise return False.
    for i in range(1, 10):
        if isSpaceFree(board, i):
            return False
    return True

In [12]:
print('Welcome to Tic Tac Toe!')

while True:
    # Reset the board
    theBoard = [' '] * 10
    playerLetter, computerLetter = inputPlayerLetter()
    turn = whoGoesFirst()
    print('The ' + turn + ' will go first.')
    gameIsPlaying = True

    while gameIsPlaying:
        if turn == 'player':
            # Player's turn.
            drawBoard(theBoard)
            move = getPlayerMove(theBoard)
            makeMove(theBoard, playerLetter, move)

            if isWinner(theBoard, playerLetter):
                drawBoard(theBoard)
                print('Hooray! You have won the game!')
                gameIsPlaying = False
            else:
                if isBoardFull(theBoard):
                    drawBoard(theBoard)
                    print('The game is a tie!')
                    break
                else:
                    turn = 'computer'

        else:
            # Computer's turn.
            move = getComputerMove(theBoard, computerLetter)
            makeMove(theBoard, computerLetter, move)

            if isWinner(theBoard, computerLetter):
                drawBoard(theBoard)
                print('The computer has beaten you! You lose.')
                gameIsPlaying = False
            else:
                if isBoardFull(theBoard):
                    drawBoard(theBoard)
                    print('The game is a tie!')
                    break
                else:
                    turn = 'player'

    if not playAgain():
        break

Welcome to Tic Tac Toe!
Do you want to be X or O?
O
The player will go first.
   |   |
   |   |  
   |   |
-----------
   |   |
   |   |  
   |   |
-----------
   |   |
   |   |  
   |   |
What is your next move? (1-9)
1
   |   |
 X |   |  
   |   |
-----------
   |   |
   |   |  
   |   |
-----------
   |   |
 O |   |  
   |   |
What is your next move? (1-9)
3
   |   |
 X |   |  
   |   |
-----------
   |   |
   |   |  
   |   |
-----------
   |   |
 O | X | O
   |   |
What is your next move? (1-9)
2
What is your next move? (1-9)
2
What is your next move? (1-9)
3
What is your next move? (1-9)
4
   |   |
 X |   | X
   |   |
-----------
   |   |
 O |   |  
   |   |
-----------
   |   |
 O | X | O
   |   |
What is your next move? (1-9)
5
   |   |
 X | X | X
   |   |
-----------
   |   |
 O | O |  
   |   |
-----------
   |   |
 O | X | O
   |   |
The computer has beaten you! You lose.
Do you want to play again? (yes or no)
yes
Do you want to be X or O?
x
The computer will go first.
   | 