# Chapter 5: Depth Pruning in Minimax

You have learned how the minimax algorithm works in Chapter 4. You'll use it to play two other games in this chapter: Tic Tac Toe and Connect Four. The algorithm exhausts all possibilities in the Tic Tac Toe game. The minimax agent plays perfectly: no strategy can beat it. However, it takes about half a minute for the agent to make its first move.

You'll then apply the minimax algorithm to the Connect Four game. While there is nothing wrong with the algorithm, it takes forever for the agent to make a move. That is, the minimax algorithm doesn't work unless you have a super computer. 

You'll learn how to cut down on the amount of time the minimax agent can come up with a move. The obvious answer is depth pruning: the agent stops searching after a fixed number of stages. For example, you can limit the program to look ahead at most four steps so that the program can recommend a solution in just a few seconds.

After that, you'll test your minimax agents with the rule-based AI that you developed in Chapters 2 and 3 and see which agent is more powerful. 

***
$\mathbf{\text{Create a subfolder for files in Chapter 5}}$<br>
***
We'll put all files in Chapter 5 in a subfolder /files/ch05. Run the code in the cell below to create the subfolder.

***

In [1]:
import os

os.makedirs("files/ch05", exist_ok=True)

## 1. Minimax Tree Search in Tic Tac Toe

You have already learned how the minimax algorithm works in Chapter 4. Basically, the algorithm assumes each player in the game makes the best possible decisions at each step. Players also know that their opponents make fully rational decisions as well. 

The minimax agent in Tic Tac Toe come up with best moves through backward induction. It starts with the terminal state of the game (in Tic Tac Toe, when the game is tied or when one player has won) and finds out the payoffs to each player in that state. In the second to last stage of the game, the player looks one step ahead and makes the best decision for himself/herself, anticipating that the opponent makes the best decision in the previous stage, and so on.

In Tic Tac Toe, Each game has a maximum of 9 stages. In stage 9 (assuming the game is not over by then), player X has only one choice so no decision is needed. In stage 8, player O looks at the two choices and picks the best one for himself/herself. In stage 7, player X picks the best decision, knowing that player O will pick a choice that minimizes player X's payoff in stage 8, and so on. The reasoning goes all the way back to the very first step when player X makes a decision. 

Since the total number of possible scenarios in a Tic Tac Toe game is small (less than $3^9=19,683$), the computer program can exhaust all scenarios in a reasonable amount of time and find the best solution for each player in every stage of the game. We'll discuss how to reduce the amount of time that the agent needs to make a decision through depth pruning. 

## 1.1. The Minimax Algorithm in Tic Tac Toe

We'll use the self-made Tic Tac Toe game environment we created in Chapter 2. Specifically, the module is saved as *ttt_env.py* in the folder *utils* in this GitHub repository. Download the file and save it under /Desktop/ai/utils/ on your computer. 

First, let's define a couple of functions that the algorithm uses. 

We'll define a minimax_X() function for the player X. Potentially we can define one function for both players but it's more difficult to explain. There is a tradeoff between coding efficiency and readabiliyt of the code. So here we choose the latter. 

The function tells the player X what's the best next move, anticipating that player O will make the best decision in the next stage as well, and so on. 

In [2]:
def minimax_X(env):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward==1:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=maximized_payoff(env_copy,reward,done)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return choice(losses)

At each step, player X iterates through all possible next moves. If a move allows player X to win the game right away, player X will stop searching and take the move. Otherwise, player X will see what's the best outcome for player O in the next stage, knowing full well that player O will make the best decision to maximize player O's payoff. Since it's a zero-sum game, payer X's payoff is the opposite of player O's payoff. Player X will then pick winning moves if there is one; otherwise, he/she will pick a typing move; otherwise, player X has no choice but to pick whatever move is left. 

Here, we use the *maximized_payoff()* function to find the best payoff for player O in the next stage. Let's define that function next.   

Next, we'll define the *maximized_payoff(env,reward,done)* function. This function produces the best possible outcome for the next player in the next stage of the game. Note this function applies to any stage of the game so we don't need to define one for player X and one for player O.

In [3]:
def maximized_payoff(env,reward,done):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # Otherwise, search for action to maximize payoff
    best_payoff=-2
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=maximized_payoff(env_copy,reward,done)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        # update your best payoff 
        if my_payoff>best_payoff:        
            best_payoff=my_payoff
    return best_payoff

## 1.3. Test the Minimax Algorithm in Tic Tac Toe
Next, you'll play a game against the minimax algorithm. We'll let the minimax agent move first and see if it can win the game. We also time how long it takes for the minimax agent to come up with each move.

Warning: it takes about 20 seconds on my computer for the minimax agent to make the first move. It may take longer depending on your computer. 

In [4]:
from utils.ttt_env import ttt
from utils.ch05util import minimax_X,maximized_payoff 
import time

# Initiate the game environment
env=ttt()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action = minimax_X(env)
    end=time.time()
    print(f"Player X has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.reshape(3,3)}")
    if done:
        if reward==1:
            print(f"Player X has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"the current state is \n{state.reshape(3,3)}")
    if done:
        print(f"Player O has won!") 
        break

Player X has chosen action=5
It took the agent 29.12164831161499 seconds
the current state is 
[[0 0 0]
 [0 1 0]
 [0 0 0]]
Player O, what's your move?
4
Player O has chosen action=4
the current state is 
[[ 0  0  0]
 [-1  1  0]
 [ 0  0  0]]
Player X has chosen action=7
It took the agent 0.3951141834259033 seconds
the current state is 
[[ 0  0  0]
 [-1  1  0]
 [ 1  0  0]]
Player O, what's your move?
3
Player O has chosen action=3
the current state is 
[[ 0  0 -1]
 [-1  1  0]
 [ 1  0  0]]
Player X has chosen action=8
It took the agent 0.01595616340637207 seconds
the current state is 
[[ 0  0 -1]
 [-1  1  0]
 [ 1  1  0]]
Player O, what's your move?
9
Player O has chosen action=9
the current state is 
[[ 0  0 -1]
 [-1  1  0]
 [ 1  1 -1]]
Player X has chosen action=2
It took the agent 0.0 seconds
the current state is 
[[ 0  1 -1]
 [-1  1  0]
 [ 1  1 -1]]
Player X has won!


The minimax algorithm first occupies Cell 5. I occupied Cell 4. The minimax algorithm then occupies Cells 7 and 8, creating a double attack: it can win in either Cell 2 or Cell 9 in the next move. Since I can only stop one of the two attacks, the minimax algorithm has generated a move to guarantee a win. 

## 1.4. Efficacy of the Minimax Algorithm in Tic Tac Toe
Next, we’ll test how often the Minimax Algorithm wins against the think-three-steps-ahead game strategy that we developed in Chapter 2. 

To do that, we first define a minimax_O() function in the local ch05util module, as follows:

In [5]:
def minimax_O(env):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward==-1:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=maximized_payoff(env_copy,reward,done)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return choice(losses)      

The only difference is that we changed

        if done and reward==1:

to 

        if done and reward==-1:

We have also defined a few other functions in the local ch05util module: AI_think1(), AI_think2(), AI_think3(), and test_ttt_game(). They are defined similar to what we have done in Chapter 2. You can open the file ch05util.py to have a look.

To test how the minimax agent fairs against the think-three-steps-ahead AI player, we do the following:

In [6]:
from utils.ch05util import minimax_X,minimax_O,AI_think3,test_ttt_game 


# Play a full game manually
results=[]
for i in range(10):
    # AI moves first if i is an even number
    if i%2==0:
        result=test_ttt_game(minimax_X,AI_think3)
        # record game outcome
        results.append(result)
    # AI moves second if i is an odd number
    else:
        result=test_ttt_game(AI_think3,minimax_O)
        # record negative of game outcome
        results.append(-result)

We test 10 games. The minimax agent goes first in five games and the think-three-steps-ahead AI player goes first in the other half so no player has a first-mover's advantage. The game outcome is 1 when the first player wins and -1 when the second player wins. The game is tied when the outcome is 0. When the minimax agent is playing second, we multiply the outcome by -1 so that in all 10 games, a value 1 indicates that the minimax agent has won in the list *results*. 

In [7]:
# count how many times the minimax agent has won
wins=results.count(1)
print(f"the minimax agent has won {wins} games")
# count how many times minimax agent has lost
losses=results.count(-1)
print(f"the minimax agent has lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game has tied {ties} times")  

the minimax agent has won 5 games
the minimax agent has lost 0 games
the game has tied 5 times


The results show that the minimax agent has won 5 games, while the rest 5 games are tied. The minimax agent has never lost to the think-three-steps ahead AI player. 

# 2. Depth Pruning in Tic Tac Toe

In the last section, you have seen that it took the minimax agent 29 seconds to make the first move. While this is tolerable, in more complicated games such as Connect Four or Chess, it takes forever for the agent to make a move. Therefore, something has to be done. 

Depth pruning is one solution: instead of searching all the way to the terminal state of the game, the algorithm stops searching after a fixed number of steps. In this section, you'll learn how to implement depth pruning in the game of Tic Tac Toe. 

## 2.1. The max_payoff() Function
We'll define a max_payoff() function. The function is similar to the maximized_payoff() function we defined in the last section. However, there are two important differences. First, there is a depth argument in the function to control how many steps the minimax agent searches. Second, we'll make the function general so that it can be applied to Tic Tac Toe as well as the Connect Four game later in this chapter. 

The max_payoff() function is defined as follows. It's saved in the file ch05util.py that you just downloaded. 

In [8]:
def max_payoff(env,reward,done,depth):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # If the maximum depth is reached, assume tie game
    if depth==0:
        return 0        
    # Otherwise, search for action to maximize payoff
    best_payoff=-2
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=max_payoff(env_copy,reward,done,depth-1)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        # update your best payoff 
        if my_payoff>best_payoff:        
            best_payoff=my_payoff
    return best_payoff

In the function, if the variable depth reaches 0, we assume the game is tied and the payoff is 0. Later in this book, we'll use position evaluation functions to generate a value that's more realisitc. But a value of 0 will do for the moment. 

When the palyer makes a hypothetical move and anticipate the best response from the opponent, it uses the function max_payoff(env_copy,reward,done,depth-1). This means that each time the player searches to the next level, the depth variable decreases by 1. Once the variable depth reaches 0, the agent stops searching. 

## 2.2. The minimax() Function
We also define a minimax() function to produce the best move for the minimax agent. However, we make two changes: first, there is a depth argument in the function to control how many steps the minimax agent searches. Second, we'll make the function general so that it can be applied to Tic Tac Toe as well as the Connect Four game later in this chapter. 

The minimax() function is defined as follows. It's saved in the file ch05util.py that you just downloaded. 

In [9]:
def minimax(env,depth=3):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=max_payoff(env_copy,reward,done,depth)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return choice(losses)      

The minimax() function has two arguments: env, which is the game environment. It can be either the Tic Tac Toe for the Connect Four mae environment. The second argument, depth, is how many steps the minimax agent searches before making a move. The default value of depth is set to 3. 

## 2.3. Speed of the Depth-Pruned Minimax Agent
Next, we test how fast is the depth-pruned minimax agent. We use the default depth of 3, and play a game with the agent. We let the agent play first again and measure how long it takes for the agent to make a move.

In [10]:
from utils.ch05util import minimax
from utils.ttt_simple_env import ttt
import time

# Initiate the game environment
env=ttt()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action = minimax(env,depth=3)
    end=time.time()
    print(f"Player X has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.reshape(3,3)}")
    if done:
        if reward==1:
            print(f"Player X has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"the current state is \n{state.reshape(3,3)}")
    if done:
        print(f"Player O has won!") 
        break

Player X has chosen action=2
It took the agent 0.1972355842590332 seconds
the current state is 
[[0 1 0]
 [0 0 0]
 [0 0 0]]
Player O, what's your move?
5
Player O has chosen action=5
the current state is 
[[ 0  1  0]
 [ 0 -1  0]
 [ 0  0  0]]
Player X has chosen action=7
It took the agent 0.055222272872924805 seconds
the current state is 
[[ 0  1  0]
 [ 0 -1  0]
 [ 1  0  0]]
Player O, what's your move?
1
Player O has chosen action=1
the current state is 
[[-1  1  0]
 [ 0 -1  0]
 [ 1  0  0]]
Player X has chosen action=9
It took the agent 0.0 seconds
the current state is 
[[-1  1  0]
 [ 0 -1  0]
 [ 1  0  1]]
Player O, what's your move?
8
Player O has chosen action=8
the current state is 
[[-1  1  0]
 [ 0 -1  0]
 [ 1 -1  1]]
Player X has chosen action=4
It took the agent 0.000995635986328125 seconds
the current state is 
[[-1  1  0]
 [ 1 -1  0]
 [ 1 -1  1]]
Player O, what's your move?
3
Player O has chosen action=3
the current state is 
[[-1  1 -1]
 [ 1 -1  0]
 [ 1 -1  1]]
Player X has cho

It took only 0.2 seconds for the minimax agent to make the first move, instead of 29 seconds. That's a huge cutdown on the amount of time it takes to come up with a move. 

# 3. Depth Pruning in Connect Four
Next, we’ll create a minimax ageent for the connect four game. The agent searches for a maximum of three steps. 

We first manually play a game. 

## 3.1. Manually Play A Game

In [11]:
from utils.conn_simple_env import conn
import time
from utils.ch05util import minimax

# Initiate the game environment
env=conn()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action=minimax(env,depth=3)
    end=time.time()
    print(f"The red player has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==1:
            print(f"The red player has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action=input("Player yellow, what's your move?\n")
    print(f"Player yellow has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==-1:
            print(f"The yellow player has won!")  
        else:
            print("Game over, it's a tie!")
        break                   

The red player has chosen action=5
It took the agent 0.14301228523254395 seconds
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0]]
Player yellow, what's your move?
1
Player yellow has chosen action=1
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  1  0  0]]
The red player has chosen action=6
It took the agent 0.14829635620117188 seconds
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  1  1  0]]
Player yellow, what's your move?
1
Player yellow has chosen action=1
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  0  0  0]
 [-1  0  0  0  1  1  0]]
The red player has chosen action=4
It took the agent 0.13970661163330078 s

The minimax agent is able to plan three steps ahead and create a double attack and win the game. 

## 3.2. The Minimax Algorithm verus Rule-Based AI
We'll test if the minimax algorithm that searches for three steps ahead can beat the rule-based think-three-steps-ahead AI player that we created in Chapter 3. 

In [12]:
from utils.ch05util import test_conn_game,conn_think3

results=[]
for i in range(100):
    # minimax agent moves first if i is an even number
    if i%2==0:
        result=test_conn_game(env,minimax,conn_think3)
        # record game outcome
        results.append(result)
    # minimax agent moves second if i is an odd number
    else:
        result=test_conn_game(env,conn_think3,minimax)
        # record negative of game outcome
        results.append(-result)

We create a list *results* to store game outcomes. We simulate 100 games and half the time, the minimax agent moves first and the other half, the rule-based AI player moves first. This way, no player has a first-mover advantage and we have a fair assessment of the power of the minimax agent against the rule-based AI. Whenever the minimax moves second, we multiple the outcome by -1 so that a value 1 in the list *results* indicates that the minimax has won. 

In [13]:
# count how many times minimax agent has won
wins=results.count(1)
print(f"the minimax agent has won {wins} games")
# count how many times minimax agent has lost
losses=results.count(-1)
print(f"the minimax agent has lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game has tied {ties} games")

the minimax agent has won 68 games
the minimax agent has lost 26 games
the game has tied 6 games


The above output shows that the MiniMax agent has won 68 games, lost 26, and the rest 6 games are tied. The results show that the MiniMax agent is better than a think-three-step-ahead agent. 