# Chapter 5: Depth Pruning in Minimax



***
*“Genius sometimes consists of knowing when to stop.”*

-- Charles De Gaulle
***



What you'll learn in this chapter:

* Creating a MiniMax agent in Tic Tac Toe
* Understanding the idea behind depth pruning
* Creating a generic MiniMax agent with depth pruning
* Testing MiniMax with depth pruning in Tic Tac Toe and Connect Four


Your learned how MiniMax tree search works in the previous chapter and applied it to the coin game by searching for the best move in the next
step recursively until the game ends. The MiniMax agent solves the coin
game: it wins 100% of the time when it plays second.

In this chapter, you'll first create a MiniMax agent in Tic Tac Toe by using
recursion, as you did in Chapter 4. MiniMax tree search exhausts all possible future game
paths and solves the Tic Tac Toe game. However, at the beginning
of the game, it takes more than 34 seconds for the MiniMax agent to
make a move. Later moves take much less time, though.

In more complicated games such as Connect Four, Chess, or Go, the
MiniMax algorithm cannot exhaust all possible future game paths in a
short amount of time. However, this doesn't mean that MiniMax tree
search is useless in these games. One of the answers lies in depth pruning: Instead of searching all the way to the terminal state of the game, you
search a certain number of moves ahead and stop searching. You can then evaluate the game outcome by using a position evaluation function. In this chapter, we assume that the game is tied after searching a fixed number of steps ahead and the game is still not terminal. We'll discuss how to apply position evaluation functions in Chapter 7. 

In this chapter, you'll learn to create a generic MiniMax agent with depth pruning that can be applied to both Tic Tac Toe and Connect Four. Depth pruning allows the MiniMax agent to come up with intelligent (though not perfect) game strategies in a short amount of time. In fact, the algorithm used by Deep Blue to beat World Chess Champion Gary Kasparov in 1997 was
based on MiniMax tree search with depth pruning (along with other
strategies).

After that, you'll test the effectiveness of your MiniMax agents against the rule-based AI that you developed in Chapters 2 and 3.

# 1. MiniMax Tree Search in Tic Tac Toe

## 1.1. The MiniMax Algorithm in Tic Tac Toe

We'll use a simplified version of the self-made Tic Tac Toe game environment from Chapter 2 to speed up MiniMax tree search. Specifically, the module is saved as *ttt_simple_env.py* in the folder *utils* in the book's GitHub repository https://github.com/markhliu/AlphaGoSimplified. Download the file and save it in the folder /Desktop/ags/utils/ on your computer. The file *ttt_simple_env.py* is the same as *ttt_env.py* that we used in Chapter 2, except that we have deleted the graphical game window functionality. As a result, you cannot use the *render()* method in the simplified Tic Tac Toe game environment. We use the simplified coin game environment to make the MiniMax agent make moves faster: to search ahead, the algorithm makes hypothetical moves by creating a deep copy of the current game environment. Without the *render()* method, the algorithm makes deep copies (hence decisions) faster.

We'll define a *MiniMax_X()* function for Player X a different function *MiniMax_O()* function for Player O. Potentially we can define one function for both players, but it's easier to explain the functions when we have one for each player. There is a tradeoff between code efficiency and code readability and here we choose the latter. 

In the local module *ch05util*, we first define a *MiniMax_X()* function for Player X who is about to make a move. Download the file *ch05util.py* from the book's GitHub repository and save it in /Desktop/ai/utils/ on your computer. The file acts as a local module with a few functions in it. The *MiniMax_X()* function is defined as follows:

```python
def MiniMax_X(env):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If wins right away with move m, take it.
        if done and reward==1:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=maximized_payoff(env_copy,reward,done)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return env.sample()
```

```python
def maximized_payoff(env,reward,done):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # Otherwise, search for action to maximize payoff
    best_payoff=-2
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=maximized_payoff(env_copy,reward,done)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        # update your best payoff 
        if my_payoff>best_payoff:        
            best_payoff=my_payoff
    return best_payoff
```

## 1.3. Test the MiniMax Algorithm in Tic Tac Toe

In [1]:
from utils.ttt_simple_env import ttt
from utils.ch05util import MiniMax_X, maximized_payoff 
import time

# Initiate the game environment
env=ttt()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action = MiniMax_X(env)
    end=time.time()
    print(f"Player X has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        if reward==1:
            print("Player X has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        print("Player O has won!") 
        break

Player X has chosen action=5
It took the agent 34.98157286643982 seconds
Current state is 
[[0 0 0]
 [0 1 0]
 [0 0 0]]
Player O, what's your move?
1
Player O has chosen action=1
Current state is 
[[ 0  0  0]
 [ 0  1  0]
 [-1  0  0]]
Player X has chosen action=6
It took the agent 0.4136021137237549 seconds
Current state is 
[[ 0  0  0]
 [ 0  1  1]
 [-1  0  0]]
Player O, what's your move?
4
Player O has chosen action=4
Current state is 
[[ 0  0  0]
 [-1  1  1]
 [-1  0  0]]
Player X has chosen action=7
It took the agent 0.014605283737182617 seconds
Current state is 
[[ 1  0  0]
 [-1  1  1]
 [-1  0  0]]
Player O, what's your move?
3
Player O has chosen action=3
Current state is 
[[ 1  0  0]
 [-1  1  1]
 [-1  0 -1]]
Player X has chosen action=2
It took the agent 0.0018301010131835938 seconds
Current state is 
[[ 1  0  0]
 [-1  1  1]
 [-1  1 -1]]
Player O, what's your move?
8
Player O has chosen action=8
Current state is 
[[ 1 -1  0]
 [-1  1  1]
 [-1  1 -1]]
Player X has chosen action=9
It t

The game is tied. It took the MiniMax algorithm about 35 seconds to make the very first move. It took the agent a fraction of a second to make later moves. 

## 1.4. Efficacy of the MiniMax Algorithm in Tic Tac Toe
Next, we’ll test how often the MiniMax Algorithm wins against the think-three-steps-ahead game strategy that we developed in Chapter 2. 

To do that, we first define a *MiniMax_O()* function in the local *ch05util* module, as follows:

```python
def MiniMax_O(env):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward==-1:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=maximized_payoff(env_copy,reward,done)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return env.sample()      
```

The function is very similar to the *MiniMax_X()* function we defined before. The only difference between is that we changed

        if done and reward==1:
            return m

to 

        if done and reward==-1:
            return m

In [2]:
from utils.ch05util import MiniMax_O
from utils.ch02util import ttt_think3, one_ttt_game 

# Play 20 games
results=[]
for i in range(20):
    # MiniMax moves first if i is an even number
    if i%2==0:
        result=one_ttt_game(MiniMax_X,ttt_think3)
        # record game outcome
        results.append(result)
    # MiniMax moves second if i is an odd number
    else:
        result=one_ttt_game(ttt_think3,MiniMax_O)
        # record negative game outcome
        results.append(-result)

In [3]:
# count how many times the MiniMax agent has won
wins=results.count(1)
print(f"the MiniMax agent has won {wins} games")
# count how many times the MiniMax agent has lost
losses=results.count(-1)
print(f"the MiniMax agent has lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game was tied {ties} times")  

the MiniMax agent has won 9 games
the MiniMax agent has lost 0 games
the game was tied 11 times


The results show that the MiniMax agent has won 9 games, while the rest 11 games are tied. The MiniMax agent has never lost to the think-three-steps ahead AI. 

# 2. Depth Pruning in Tic Tac Toe

## 2.1. The max_payoff() Function
We'll define a *max_payoff()* function in the local module *ch05util*. The function is similar to the *maximized_payoff()* function we defined in the last section. However, there are two important differences. First, there is a depth argument in the function to control how many steps the MiniMax agent searches. Second, we'll make the function general so that it can be applied to Tic Tac Toe as well as the Connect Four game later in this chapter. 

Go to the file *ch05util.py* you just downloaded and take a look at the *max_payoff()* function, which is defined as follows: 

```python
def max_payoff(env,reward,done,depth):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # If the maximum depth is reached, assume tie game
    if depth==0:
        return 0        
    # Otherwise, search for action to maximize payoff
    best_payoff=-2
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=max_payoff(env_copy,reward,done,depth-1)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        # update your best payoff 
        if my_payoff>best_payoff:        
            best_payoff=my_payoff
    return best_payoff
```

## 2.2. The MiniMax_depth() Function
We also define a *MiniMax_depth()* function to produce the best move for the MiniMax agent. There is a *depth* argument in the function to control how many steps the MiniMax agent searches. The default *depth* value is set to 3. We'll make the function general so that it can be applied to Tic Tac Toe as well as the Connect Four game later in this chapter. 

The *MiniMax_depth()* function is defined as follows. It's saved in the file *ch05util.py* that you just downloaded. 

```python
def MiniMax_depth(env,depth=3):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=max_payoff(env_copy,reward,done,depth)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return env.sample()     
```

## 2.3. Speed of the Depth-Pruned MiniMax Agent
Next, we test how fast the depth-pruned MiniMax agent is. We use the default depth of 3, and play a game with the agent. We let the agent play first again and measure how long it takes for the agent to make a move.

In [4]:
from utils.ch05util import MiniMax_depth

# Initiate the game environment
env=ttt()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action = MiniMax_depth(env,depth=3)
    end=time.time()
    print(f"Player X has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        if reward==1:
            print("Player X has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        print("Player O has won!") 
        break

Player X has chosen action=5
It took the agent 0.19032764434814453 seconds
Current state is 
[[0 0 0]
 [0 1 0]
 [0 0 0]]
Player O, what's your move?
1
Player O has chosen action=1
Current state is 
[[ 0  0  0]
 [ 0  1  0]
 [-1  0  0]]
Player X has chosen action=7
It took the agent 0.0625150203704834 seconds
Current state is 
[[ 1  0  0]
 [ 0  1  0]
 [-1  0  0]]
Player O, what's your move?
2
Player O has chosen action=2
Current state is 
[[ 1  0  0]
 [ 0  1  0]
 [-1 -1  0]]
Player X has chosen action=3
It took the agent 0.0010073184967041016 seconds
Current state is 
[[ 1  0  0]
 [ 0  1  0]
 [-1 -1  1]]
Player X has won!


# 3. Depth Pruning in Connect Four
Next, we’ll create a MiniMax agent for the connect four game. The agent searches for a maximum of three steps. 

We first manually play a game against the MiniMax agent. 

## 3.1. The MiniMax Agent in Connect Four

In [5]:
from utils.conn_simple_env import conn

# Initiate the game environment
env=conn()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action=MiniMax(env,depth=3)
    end=time.time()
    print(f"The red player has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==1:
            print(f"The red player has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action=input("Player yellow, what's your move?\n")
    print(f"Player yellow has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==-1:
            print(f"The yellow player has won!")  
        else:
            print("Game over, it's a tie!")
        break                   

The red player has chosen action=5
It took the agent 0.14301228523254395 seconds
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 1 0 0]]
Player yellow, what's your move?
1
Player yellow has chosen action=1
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  1  0  0]]
The red player has chosen action=6
It took the agent 0.14829635620117188 seconds
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  1  1  0]]
Player yellow, what's your move?
1
Player yellow has chosen action=1
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  0  0  0]
 [-1  0  0  0  1  1  0]]
The red player has chosen action=4
It took the agent 0.13970661163330078 s

The MiniMax agent is able to plan three steps ahead and create a double attack and win the game. It takes less than 0.15 seconds for the agent to come up with a move in each step. 

## 3.2. MiniMax verus Rule-Based AI in Connect Four
We'll test if the MiniMax algorithm that searches for three steps ahead can beat the rule-based think-three-steps-ahead AI that we developed in Chapter 3. 

In [6]:
from utils.ch03util import one_conn_game, conn_think3
 
results=[]
for i in range(100):
    # MiniMax moves first if i is an even number
    if i%2==0:
        result=one_conn_game(MiniMax_depth,conn_think3)
        # record game outcome
        results.append(result)
    # MiniMax moves second if i is an odd number
    else:
        result=one_conn_game(conn_think3,MiniMax_depth)
        # record negative game outcome
        results.append(-result)

In [7]:
# count how many times MiniMax won
wins=results.count(1)
print(f"the MiniMax agent won {wins} games")
# count how many times MiniMax lost
losses=results.count(-1)
print(f"the MiniMax agent lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game was tied {ties} times")

the MiniMax agent won 68 games
the MiniMax agent lost 26 games
the game was tied 6 times
