# Chapter 6: Alpha-Beta Pruning



***
*“Art is the elimination of the unnecessary.”*

-- Pablo Picasso
***



What you'll learn in this chapter:

* The logic behind alpha-beta pruning
* Implementing alpha-beta pruning in Tic Tac Toe and Connect Four
* Calculating time saved by the alpha-beta pruning agent
* Verifying that alpha-beta pruning won’t affect game outcomes

As you have seen in Chapter 5, depth pruning makes the MiniMax algorithm possible in complicated games such as Connect Four, Chess, and Go. In this chapter, you'll use another method to improve the MiniMax algorithm and make it more efficient. Specifically, alpha beta pruning allows us to skip certain branches that cannot possibly influence the final game outcome. Doing so significantly reduces the amount of time for the MiniMax agent to come up with a move.

To implement alpha-beta pruning in a game, we keep track of two numbers: alpha and beta, the best outcomes so far for Players 1 and 2, respectively. Whenever we have $alpha>-beta$, or equivalently $beta>-alpha$, the MiniMax algorithm stop searching a branch. 

We implement alpha-beta pruning in both Tic Tac Toe and Connect Four in this chapter. We show that the outcomes are the same with and without alpha-beta pruning. We also show that alpha-beta pruning saves significant amount of time for the player to find the best moves. For example, in Tic Tac Toe, the amount of time for the MiniMax agent to come up with the first move decreases from 34 seconds without alpha-beta pruning to 1.06 seconds with alpha-beta pruning, a 97% reduction in the amount of time the MiniMax agent needs to come up with a move. In Connect Four, we find that on average, the time spent on a move has reduced from 0.15 seconds to 0.05 seconds after we added in alpha-beta pruning when we limit the depth to three. 

# 1. What is Alpha Beta Pruning?

# 2. Alpha-Beta Pruning in Tic Tac Toe

## 2.1. The maximized_payoff_ttt() Function

In [1]:
def maximized_payoff_ttt(env,reward,done,alpha,beta):
    # if game ended after previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # set initial alpha and beta to -2
    if alpha==None:
        alpha=-2
    if beta==None:
        beta=-2
    if env.turn=="X":
        best_payoff = alpha
    if env.turn=="O":
        best_payoff = beta         
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=maximized_payoff_ttt(env_copy,\
                                     reward,done,alpha,beta)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff > best_payoff:        
            best_payoff = my_payoff
            if env.turn=="X":
                alpha=best_payoff
            if env.turn=="O":
                beta=best_payoff 
        # skip the rest of the branch        
        if alpha>=-beta:
            break        
    return best_payoff         

## 2.2. The MiniMax_ab() Function

In [2]:
def MiniMax_ab(env):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=maximized_payoff_ttt(env_copy,\
                                     reward,done,-2,-2)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return env.sample()      

## 2.3. Time Saved by Alpha-Beta Pruning


In [3]:
from utils.ch06util import MiniMax_ab
from utils.ttt_simple_env import ttt
import time

# Initiate the game environment
env=ttt()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action=MiniMax_ab(env)
    end=time.time()
    print(f"Player X has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        if reward==1:
            print(f"Player X has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"Current state is \n{state.reshape(3,3)[::-1]}")
    if done:
        print(f"Player O has won!") 
        break

Player X has chosen action=1
It took the agent 1.0604541301727295 seconds
Current state is 
[[0 0 0]
 [0 0 0]
 [1 0 0]]
Player O, what's your move?
9
Player O has chosen action=9
Current state is 
[[ 0  0 -1]
 [ 0  0  0]
 [ 1  0  0]]
Player X has chosen action=7
It took the agent 0.06767606735229492 seconds
Current state is 
[[ 1  0 -1]
 [ 0  0  0]
 [ 1  0  0]]
Player O, what's your move?
4
Player O has chosen action=4
Current state is 
[[ 1  0 -1]
 [-1  0  0]
 [ 1  0  0]]
Player X has chosen action=3
It took the agent 0.006998538970947266 seconds
Current state is 
[[ 1  0 -1]
 [-1  0  0]
 [ 1  0  1]]
Player O, what's your move?
5
Player O has chosen action=5
Current state is 
[[ 1  0 -1]
 [-1 -1  0]
 [ 1  0  1]]
Player X has chosen action=2
It took the agent 0.0 seconds
Current state is 
[[ 1  0 -1]
 [-1 -1  0]
 [ 1  1  1]]
Player X has won!


It took only 1.06 seconds for the MiniMax agent to make the first move, instead of 34 seconds when alpha-beta pruning is not used. That's a huge improvement on the efficiency of the algorithm without affecting the effectiveness of the agent. 

# 3. Test MiniMax with Alpha-Beta Pruning


In [4]:
from utils.ch05util import MiniMax_X,MiniMax_O
from utils.ch02util import one_ttt_game 

results=[]
for i in range(10):
    # MiniMax with pruning moves first if i is an even number
    if i%2==0:
        result=one_ttt_game(MiniMax_ab,MiniMax_O)
        # record game outcome
        results.append(result)
    # MiniMax with pruning moves second if i is an odd number
    else:
        result=one_ttt_game(MiniMax_X,MiniMax_ab)
        # record negative game outcome
        results.append(-result)

In [5]:
# count how many times MiniMax with pruning won
wins=results.count(1)
print(f"MiniMax with pruning won {wins} games")
# count how many times MiniMax with pruning lost
losses=results.count(-1)
print(f"MiniMax with pruning lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game was tied {ties} times")          

MiniMax with pruning won 0 games
MiniMax with pruning lost 0 games
the game was tied 10 times


# 4. Alpha-Beta Pruning in Connect Four


## 4.1. Add Alpha-Beta Pruning in Connect Four


In [6]:
def max_payoff_conn(env,reward,done,depth,alpha,beta):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # If the maximum depth is reached, assume tie game
    if depth==0:
        return 0    
    if alpha==None:
        alpha=-2
    if beta==None:
        beta=-2
    if env.turn=="red":
        best_payoff = alpha
    if env.turn=="yellow":
        best_payoff = beta         
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=max_payoff_conn(env_copy,\
                                reward,done,depth-1,alpha,beta)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff > best_payoff:        
            best_payoff = my_payoff
            if env.turn=="red":
                alpha=best_payoff
            if env.turn=="yellow":
                beta=best_payoff   
        # Skip the rest of the branch
        if alpha>=-beta:
            break        
    return best_payoff        

In [7]:
def MiniMax_conn(env,depth=3):
    wins=[]
    ties=[]
    losses=[]  
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=max_payoff_conn(env_copy,\
                            reward,done,depth,-2,-2)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff==1:
            wins.append(m)
        elif my_payoff==0:
            ties.append(m)
        else:
            losses.append(m)
    # pick winning moves if there is any        
    if len(wins)>0:
        return choice(wins)
    # otherwise pick tying moves
    elif len(ties)>0:
        return choice(ties)
    return env.sample()      

## 4.2. Time Saved due to Alpha-Beta Pruning in Connect Four


In [8]:
from utils.ch06util import MiniMax_conn
from utils.conn_env import conn
import time

# Initiate the game environment
env=conn()
state=env.reset()   
# Play a full game manually
while True:
    # Mesure how long it takes to come up with a move
    start=time.time()
    action=MiniMax_conn(env,depth=3)
    end=time.time()
    print(f"The red player has chosen action={action}") 
    print(f"It took the agent {end-start} seconds")     
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.T[::-1]}")
    if done:
        if reward==1:
            print(f"The red player has won!")  
        else:
            print("Game over, it's a tie!")
        break   
    action=input("Player yellow, what's your move?\n")
    print(f"Player yellow has chosen action={action}")    
    state, reward, done, info = env.step(int(action))
    print(f"Current state is \n{state.T[::-1]}")
    if done:
        if reward==-1:
            print(f"The yellow player has won!")  
        else:
            print("Game over, it's a tie!")
        break                   

The red player has chosen action=6
It took the agent 0.034066200256347656 seconds
Current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 1 0]]
Player yellow, what's your move?
1
Player yellow has chosen action=1
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [-1  0  0  0  0  1  0]]
The red player has chosen action=1
It took the agent 0.03681349754333496 seconds
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0]
 [-1  0  0  0  0  1  0]]
Player yellow, what's your move?
2
Player yellow has chosen action=2
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0]
 [-1 -1  0  0  0  1  0]]
The red player has chosen action=5
It took the agent 0.047269582748413086 seconds
Current

## 4.3. Effectiveness of Alpha-Beta Pruning in Connect Four


In [9]:
from utils.ch05util import MiniMax_depth
from utils.ch03util import one_conn_game 

results=[]
for i in range(100):
    # MiniMax with pruning moves first if i is even 
    if i%2==0:
        result=one_conn_game(MiniMax_conn,MiniMax_depth)
        # record game outcome
        results.append(result)
    # MiniMax with pruning moves second if i is odd 
    else:
        result=one_conn_game(MiniMax_depth,MiniMax_conn)
        # record negative game outcome
        results.append(-result)

In [10]:
# count how many times MiniMax with alpha-beta pruning won
wins=results.count(1)
print(f"MiniMax with alpha-beta pruning won {wins} games")
# count how many times MiniMax with pruning lost
losses=results.count(-1)
print(f"MiniMax with alpha-beta pruning lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game was tied {ties} times")               

MiniMax with alpha-beta pruning won 41 games
MiniMax with alpha-beta pruning lost 39 games
the game was tied 20 times


The above results show that the MiniMax agent with alpha-beta pruning has won 41 times and lost 39 times. This shows that the MiniMax agent with alpha-beta pruning is as intelligent as the agent without alpha-beta pruning. Note that since the outcomes are random, you may get results showing that the MiniMax agent with alpha-beta pruning has lost more often than it has won. If that happens, run the above two cells again and see if the results change. 