# Chapter 7: Position Evaluation Functions in MiniMax

With depth pruning and alpha-beta pruning to reduce the time it needs to make a move, the minimax algorithm can produce fairly powerful agents with the advancements in hardwares. For example, "the 1997 version of Deep Blue searched between 100 million and 200 million positions per second, depending on the type of position. The system could search to a depth of between six and eight pairs of moves—one white, one black—to a maximum of 20 or even more pairs in some situations.” accordingly to an article in Scientific America by Larry Greenemeier in 2017. 

What made Deep Blue even more powerful is the position evaluation function it used. In Chapter 6, we assume that the game is tied when the number of depth is reached and the game is not over. In many real world games, however, even when the game is not over, we usually have a good estimate on the final outcome of the game based on heuristics. For example, in Chess, we can count the value of each game piece. Whichever side has a higher value of pieces tends to win. 

In this chapter, we introduce the concept of position evaluation functions for Connect Four. We show that these evaluation functions make the minimax agent much stronger. 

***
$\mathbf{\text{Create a subfolder for files in Chapter 7}}$<br>
***
We'll put all files in Chapter 7 in a subfolder /files/ch07. Run the code in the cell below to create the subfolder.

***

In [1]:
import os

os.makedirs("files/ch07", exist_ok=True)

# 1. What Are Position Evaluation Functions?

Position evaluation functions estimate the likelihood of the game outcome. We normalize it to a range between -1 and 1, where -1 means the second player wins for sure, and 1 means the first player wins for sure. A value fo 0 indicates that the game is most likely to be tied. A value of 0.5, for example, indicates that player 1 is likely to win. 

## 1.1. A Model to Predict Outcome in Connect Four

We'll add in a position evaluation function in Connect Four to show how it works. The position evaluation function we use are generated by a deep neural network later in Chapter 13 later in this book. For now, all you need to know is that it takes a game board (a vector with 42 values of 1s and 0s) as the input and generates a value between -1 and 1. 

Download the file DNN_conn.h5 from the book's GitHub page and save it in the folder /Desktop/ai/files/ch07/ on your computer. After that, we load the model using Keras as follows:

In [2]:
import tensorflow as tf

model=tf.keras.models.load_model('files/ch07/DNN_conn.h5')

If you feed a game board to the model, the output has three numbers in it: the probability of tying the game, the probabiltiy that Player 1 wins, and the proability that Player 2 wins. Dowload the file ch07util.py from the book's GitHub repository and place it in the folder /Desktop/ai/utils/ on your computer. In it, we define a predictions() function to generate game outcome probabilities based on the game state: 

In [3]:
def predictions(env,model):
    # obtain the current state, reshape it      
    state=env.state.reshape(-1,7,6,1)
    # make predictions
    probabilities=model.predict(state,verbose=0)
    return probabilities

We obtain the current state of the game and reshape it to a 7 by 6 two-dimensional game board and feed it to the model. The model has a convolutional layer in it. The output layer has three numbers in it: the probability of tying the game, the probabiltiy that Player 1 wins, and the proability that Player 2 wins. 

We can design game strategies based on these predictions. Below, we play a game by using the model's predictions to make moves. 

## 1.2. Play A Game with the Model Predictions
To have a better understanding of the model, we can play a Connect Four game based on the model predictions. 

Specifically, we'll make a hypothetical move and ask the model to tell us the probability of winning versus losing. We then select the move with the highest probability of winning the game:

In [4]:
from utils.conn_simple_env import conn
from utils.ch07util import predictions
from copy import deepcopy

env=conn()
state=env.reset()  
print(f"the current state is \n{state.T[::-1]}") 
while True:
    action = int(input("Player red, what's your move?")) 
    print(f"Player red chose column {action}")
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}") 
    if done: 
        print("Player red won!")
        break 
    # player yellow use the model to predict
    values={}
    for m in env.validinputs:
        # make a hypothetical move
        env_copy=deepcopy(env)
        s,r,d,_=env_copy.step(m)
        probabilities=predictions(env_copy,model)
        # value function is prob(O wins)-prob(X wins)
        value=probabilities[0][2]-probabilities[0][1]
        # add value to the dictionary
        values[m]=round(value,4)
    # print out valuations for all moves
    print(values)    
    # choose the move with the highest evaluation    
    action = max(values,key=values.get) 
    print(f"Player yellow chose column {action}")
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done: 
        if reward==-1:
            print("Player yellow won!")
        else:
            print("Game over, it's a tie!")
        break 

the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]
Player red, what's your move?4
Player red chose column 4
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
{1: -0.0876, 2: -0.0547, 3: 0.0292, 4: 0.0772, 5: 0.0626, 6: -0.0005, 7: -0.0454}
Player yellow chose column 4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0  1  0  0  0]]
Player red, what's your move?3
Player red chose column 3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  1  1  0  0  0]]
{1: -0.0779, 2: 0.0444, 3: 0.0219, 4: 0.0734, 5: 0.2601, 6: 0.0734, 7: -0.073}
Player yellow chose column 5
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0

At each step, the model calculates the probabilty of Player 1 and Player 2 winning the game if a hypothetical move is made. Player yellow then selects the move with the higest value of Prob(Player 2 wins) -  Prob(Player 1 wins). The strategy generates fairly good moves and wins the game.

## 1.3. Position Evaluation in Connect Four
In the file ch07util.py, you can also see the following position_eval() function:

In [5]:
def position_eval(env,model):
    probs=predictions(env,model)
    prob_player1_win=probs[0][1]
    prob_player2_win=probs[0][2]
    if env.turn=="red":
        evaluation=prob_player1_win-prob_player2_win
    elif env.turn=="yellow":
        evaluation=prob_player2_win-prob_player1_win
    return evaluation

We obtain the current state of the game and reshape it to a 7 by 6 two-dimensional game board and feed it to the model. The model has a convolutional layer in it. The output layer has three numbers in it: the probability of tying the game, the probabiltiy that Player 1 wins, and the proability that Player 2 wins. 

If the current player is red, the evaluation is the probability of Player 1 winning minus the proability of Player 2 winning. Note that the evaluation is a value between -1 and 1. Similarly, if the current player is yellow, the evaluation is the probability of Player 2 winning minus the proability of Player 2 winning. 

Once we have this evaluation function, we can add it to the minimax algorithm with depth pruning and alpha-beta pruning in Connect Four. This will make the algorithm more powerful because the evaluation function provides a more accurate valuation of the game state, instead of assuming the valuation is zero as long as the game has not ended.

# 2. Position Evaluation in Connect Four

We'll modify the minimax algorithm with alpha-beta pruning and depth pruning from Chapter 6. The difference is that once the algorithm reaches a depth of 0, it evaluates the position based on the position evaluation functions we defined in the last section. 

## 2.1. The eval_payoff_conn() Function
We'll define a eval_payoff_conn() function. The function keeps track of the best outcomes so far for the red and yellow players and call them alpha and beta, respectively. Whenever the condition alpha>-beta or beta>-alpha is met, the algorithm stops searching the current branch. That is, we use alpha-beta pruning here. Second, there is a depth argument in the function and if the depth reaches 0, the function uses the position evaluation function to assess the game board. 

The eval_payoff_conn() function is defined as follows. It's saved in the file ch07util.py that you just downloaded. 

In [6]:
def eval_payoff_conn(env,model,reward,done,depth,alpha,beta):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # If the maximum depth is reached, assume tie game
    if depth==0:
        return position_eval(env,model)    
    if alpha==None:
        alpha=-2
    if beta==None:
        beta=-2
    if env.turn=="red":
        best_payoff = alpha
    if env.turn=="yellow":
        best_payoff = beta         
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=eval_payoff_conn(env_copy,model,\
                             reward,done,depth-1,alpha,beta)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff > best_payoff:        
            best_payoff = my_payoff
            if env.turn=="red":
                alpha=best_payoff
            if env.turn=="yellow":
                beta=best_payoff       
        if alpha>=-beta:
            break        
    return best_payoff               

Next, we'll design a minimax algorithm with the position evaluation function.  

## 2.2. The minimax_conn_eval() Function
We also define a minimax_conn_eval() function to produce the best move for the minimax agent. The function is similar to the minimax_conn() function we used in Chapter 6. However, instead of assuming the game is tied once the search algorithm reaches a depth of 0, the function uses the position evaluation function to assess the game board. 

The minimax_conn_eval() function is defined as follows. It's saved in the file ch07util.py that you just downloaded. 

In [7]:
def minimax_conn_eval(env,model,depth=3,evaluation):
    values={} 
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If player X wins right away with move m, take it.
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=eval_payoff_conn(env_copy,\
                             model,reward,done,depth,-2,-2)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        values[m]=my_payoff
    # pick the move with the highest value       
    best_move=max(values,key=values.get)
    return best_move    

With this, we have created a minimax algorithm with position evaluation. 

## 2.3. Minimax with Position Evaluations
Next, we maually test a game with the minimax agent with the position evaluation function. We let the minimax agent play first and the human player move second. 

In [8]:
from utils.ch07util import minimax_conn_eval

# Initiate the game environment
env=conn()
state=env.reset()   
# Play a full game manually
while True:
    action = minimax_conn_eval(env,model)   
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done: 
        print("Player red won!")
        break    
    action = int(input("What's your move, player yellow?")) 
    print(f"Player yellow chose cell {action}")
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done: 
        if reward==-1:
            print("Player yellow won!")
        else:
            print("Game over, it's a tie!")
        break     

the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
What's your move, player yellow?3
Player yellow chose cell 3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  1  0  0  0]]
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
What's your move, player yellow?4
Player yellow chose cell 4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  1  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
What's your move, player yellow?5
Player yellow chose cell 5
the current state is 
[[

The agent has won. It takes longer for the agent to make a move, but the minimax agent is much more sophisticated than the naive minimax agent. 

# 3. Effectiveness of Position Evaluations
Next, we test the performance improvements of the minimax agent due to the use of the position evaluation function. We play ten games. In five games, the minimax agent with position evaluation moves first. In the other five games, the naive minimax agent (who assumes that the game is tied when the depth reaches 0) moves first. This way, no agent has a first-mover's advantage. 

In [9]:
from utils.ch07util import minimax_conn_eval
from utils.ch06util import minimax_conn

results=[]
for i in range(10):
    state=env.reset() 
    if i%2==0:
        action=minimax_conn(env,depth=3)    
        state,reward,done,_=env.step(action)
    while True:
        action=minimax_conn_eval(env,model,depth=3) 
        state,reward,done,_=env.step(action)
        if done: 
            if reward!=0:
                results.append(1)
            else:
                results.append(0)
            break 
        action=minimax_conn(env,depth=3) 
        state,reward,done,_=env.step(action)
        if done: 
            if reward!=0:
                results.append(-1)
            else:
                results.append(0)
            break             

We create a list *results* to record game outcome. If the minimax agent with position evaluation wins, we record a value of 1 in the list *results*. If the minimax agent with position evaluation loses, we record a value of -1 in the list *results*. If the game is tied, we record a value of 0. 

Next, we count how many times the minimax agent with position evaluation has won and how many times the agent has lost. 

In [10]:
# count how many times minimax with evaluation won
wins=results.count(1)
print(f"the minimax agent with evaluation has won {wins} games")
# count how many times minimax with evaluation lost
losses=results.count(-1)
print(f"the minimax agent with evaluation has lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game has tied {ties} times")          

the minimax agent with evaluation has won 10 games
the minimax agent with evaluation has lost 0 games
the game has tied 0 times


The above results show that the minimax agent with the evaluation function has won all ten games against the naive minimax agent. This indicates that the evaluation function has greatly improved the effectiveness of the minimax algorithm.