# Chapter 7: Position Evaluation in MiniMax



***
*“Part of the improvement between ‘96 and ‘97 is we detected more patterns in a chess position and could put values on them and therefore evaluate chess positions more accurately.”*

-- Murray Campbell, who programmed Deep Blue's evaluation function
***



What you'll learn in this chapter:

* What is a position evaluation function
* Designing a game strategy using an evaluation function in Connect Four
* Adding a position evaluation function to MiniMax with depth pruning
* Assessing the effectiveness of MiniMax augmented by an evaluation function

The IBM Chess engine Deep Blue had two matches with then-world
Chess champion Garry Kasparov. In the first match, which was held
in Philadelphia in 1996, Deep Blue lost to Kasparov. In the second match in New
York City in 1997, Deep Blue defeated Kasparov. What changed, you may wonder?
According to Murray Campbell, the team member who programmed Deep Blue’s
evaluation function, the Chess engine had a more accurate evaluation function in
1997 compared to 1996.

With depth pruning and alpha-beta pruning to reduce the time it needs to make
a move, the MiniMax algorithm can produce fairly powerful agents with advancements
in computing hardware. For example, “the 1997 version of Deep Blue searched
between 100 million and 200 million positions per second, depending on the type of
position. The system could search to a depth of between six and eight pairs of moves—
one white, one black—to a maximum of 20 or even more pairs in some situations.”
According to an article in Scientific America by Larry Greenemeier in 2017.

What made Deep Blue even more powerful was the position evaluation function it
used. In Chapter 6, we assume that the game is tied when the number of depth is
reached and the game is not over. In many real-world games, however, even when the
game is not over, we usually have a good estimate of the outcome of the game based
on heuristics. For example, in Chess, we can count the value of each game piece.
Whichever side has a higher value of pieces tends to win.

In this chapter, we introduce the concept of the position evaluation function and
apply it to the Connect Four game. We show that our evaluation function makes
the MiniMax agent much stronger. Specifically, you’ll use an evaluation function that
we’ll develop in Chapter 15: the function takes a game board as the input, and returns
an evaluation between −1 and 1. An evaluation of −1 means that the current player
will lose for sure, while an evaluation of 1 means that the current player will win for
sure. An evaluation of 0 means the game is most likely to be tied.

You’ll first use the evaluation function to design a game strategy: an agent armed with
this evaluation function will make hypothetical moves and evaluate each future game
state. The agent will then select the next move that leads to the highest evaluation
of the future game state. We show that the agent beats random moves 97 percent
of the time. We then augment the MiniMax algorithm with the position evaluation
function. We’ll use the evaluation function to evaluate the future game state when
the number of depth is reached and the game is not over. The evaluation function
provides a more accurate assessment of the game state, hence allowing the MiniMax
agent to make more intelligent moves. We show that the augmented MiniMax agent
beats the earlier version of the MiniMax agent without position evaluation in seven
out of ten games.

We’ll also use the concept of position evaluation extensively in later chapters of this
book. For example, AlphaGo trained two deep neural networks when designing its
game strategies: a policy network and a value network. The value network was used
to assess the strength of each next board position, and this, in turn, helps AlphaGo
select the best next move.   

# 1. What Are Position Evaluation Functions?

## 1.1. A Model to Predict Outcome in Connect Four

In [1]:
!pip install tensorflow



In [2]:
import tensorflow as tf

model=tf.keras.models.load_model('files/value_conn.h5')

Dowload the file *ch07util.py* from the book's GitHub repository and place it in the folder /Desktop/ai/utils/ on your computer. In it, we have defined a *prediction_eval()* function to generate an evaluation of the game state: 

```python
def position_eval(env,model):
    # obtain the current state, reshape it      
    state=env.state.reshape(-1,7,6,1)
    pred=model.predict(state,verbose=0)
    # prob(current player wins)-prob(opponent wins)
    evaluation=pred[0][1]-pred[0][2]
    return evaluation 
```

## 1.2. Play A Game with the Position Evaluations


In [3]:
from copy import deepcopy
from utils.ch07util import position_eval

def eval_move(env,model):
    # create a dictionary to hold all values
    values={}
    # iteratre through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move
        env_copy=deepcopy(env)
        s,r,d,_=env_copy.step(m)
        # evaluate the hypothetical game state
        value=position_eval(env_copy,model)
        # add value to the dictionary
        if env.turn=="red":
            values[m]=round(value,5)
        # multiply value by -1 for yellow
        else:
            values[m]=round(-value,5)
    # choose the move with the highest evaluation    
    action = max(values,key=values.get)        
    return action, values

In [4]:
from utils.conn_simple_env import conn

env=conn()
state=env.reset()  
print(f"the current state is \n{state.T[::-1]}") 
while True:
    action, values=eval_move(env,model)
    print(f"evaluations of future moves are\n{values}")   
    print(f"the red player chose column {action}")
    state, reward, done, info=env.step(action)
    if done: 
        print(f"the current state is \n{state.T[::-1]}")
        print("the red player won")
        break    
    # the opponent chooses random moves   
    action=int(input("what's your move?"))
    print(f"the yellow player chose column {action}")
    state, reward, done, info=env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done: 
        if reward==-1:
            print("the yellow player won")
        else:
            print("game over, it's a tie")
        break 

the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]
evaluations of future moves are
{1: 0.57383, 2: 0.58247, 3: 0.52876, 4: 0.86937, 5: 0.7459, 6: 0.12794, 7: 0.09338}
the red player chose column 4
what's your move?4
the yellow player chose column 4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0  1  0  0  0]]
evaluations of future moves are
{1: 0.92464, 2: 0.96153, 3: 0.97883, 4: 0.94346, 5: 0.98155, 6: 0.96366, 7: 0.94616}
the red player chose column 5
what's your move?4
the yellow player chose column 4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0  1  1  0  0]]
evaluations of future moves are
{1: 0.91422, 2: 0.9949, 3: 0.99994, 4: 0.95631, 5: 0.97798, 6: 0.99123, 7: 0.97749}
the red player chose column 

## 1.3. The Position Evaluation Function vs Random Moves
To have a better understanding of how intelligent the position evaluation function is, we simulate 100 games. The opponent chooses random moves, as follows:

In [5]:
results=[]
for i in range(100):
    env=conn()
    state=env.reset() 
    if i%2==0:
        action=env.sample()
        state, reward, done, info=env.step(action)
    while True:
        action, values=eval_move(env,model)
        state, reward, done, info=env.step(action)
        if done: 
            results.append(abs(reward))
            break     
        action=env.sample()
        state, reward, done, info=env.step(action)        
        if done: 
            results.append(-abs(reward))
            break  

In [6]:
# count how many times the evaluation agent won
wins=results.count(1)
print(f"the evaluation agent won {wins} games")
# count how many times the evaluation agent lost
losses=results.count(-1)
print(f"the evaluation agent lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game was tied {ties} times")  

the evaluation agent won 97 games
the evaluation agent lost 3 games
the game was tied 0 times


# 2. MINIMAX WITH POSITION EVALUATION IN CONNECT FOUR

## 2.1. The eval_payoff_conn() Function

The *eval_payoff_conn()* function is defined as follows. It's saved in the file *h07util.py* that you just downloaded. 

```python
def eval_payoff_conn(env,model,reward,done,depth,alpha,beta):
    # if the game has ended after the previous player's move
    if done:
        # if it's not a tie
        if reward!=0:
            return -1
        else:
            return 0
    # If the maximum depth is reached, assume tie game
    if depth==0:
        if env.turn=="red":
            return position_eval(env,model) 
        else:
            return -position_eval(env,model)    
    if alpha==None:
        alpha=-2
    if beta==None:
        beta=-2
    if env.turn=="red":
        best_payoff = alpha
    if env.turn=="yellow":
        best_payoff = beta         
    # iterate through all possible moves
    for m in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m)  
        # If I make this move, what's the opponent's response?
        opponent_payoff=eval_payoff_conn(env_copy,model,\
                             reward,done,depth-1,alpha,beta)
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        if my_payoff > best_payoff:        
            best_payoff = my_payoff
            if env.turn=="red":
                alpha=best_payoff
            if env.turn=="yellow":
                beta=best_payoff       
        if alpha>=-beta:
            break        
    return best_payoff 
```

## 2.2. The MiniMax_conn_eval() Function


```python
def MiniMax_conn_eval(env,model,depth=3):
    values={} 
    # iterate through all possible next moves
    for m in env.validinputs:
        # make a hypothetical move and see what happens
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(m) 
        # If current player wins with m, take it.
        if done and reward!=0:
            return m 
        # See what's the best response from the opponent
        opponent_payoff=eval_payoff_conn(env_copy,\
                             model,reward,done,depth,-2,-2)  
        # Opponent's payoff is the opposite of your payoff
        my_payoff=-opponent_payoff 
        values[m]=my_payoff
    # pick the move with the highest value       
    best_move=max(values,key=values.get)
    return best_move  
```

## 3. Test Minimax with Position Evaluations in Connect Four


## 3.1 Play Aganist the Evaluation-Augmented MiniMax 


In [7]:
from utils.ch07util import MiniMax_conn_eval

# Initiate the game environment
env=conn()
state=env.reset()   
# Play a full game manually
while True:
    action = MiniMax_conn_eval(env,model)   
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.T[::-1]}")
    if done: 
        print("Player red won!")
        break    
    action = int(input("What's your move, player yellow?")) 
    print(f"Player yellow chose column {action}")
    state, reward, done, info = env.step(action)
    print(f"Current state is \n{state.T[::-1]}")
    if done: 
        if reward==-1:
            print("Player yellow won!")
        else:
            print("Game over, it's a tie!")
        break     

Current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
What's your move, player yellow?4
Player yellow chose column 4
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0  1  0  0  0]]
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  1  1  0  0  0]]
What's your move, player yellow?4
Player yellow chose column 4
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  1  1  0  0  0]]
Current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  0  0 -1  0  0  0]
 [ 0  1  1  1  0  0  0]]
What's your move, player yellow?5
Player yellow chose column 5
Current state is 
[[ 0  0  0  0  0  0 

## 3.2 Effectiveness of Position Evaluations


In [8]:
from utils.ch06util import MiniMax_conn

results=[]
for i in range(10):
    state=env.reset() 
    if i%2==0:
        action=MiniMax_conn(env,depth=3)    
        state,reward,done,_=env.step(action)
    while True:
        action=MiniMax_conn_eval(env,model,depth=3) 
        state,reward,done,_=env.step(action)
        if done: 
            results.append(abs(reward))
            break 
        action=MiniMax_conn(env,depth=3) 
        state,reward,done,_=env.step(action)
        if done: 
            results.append(-abs(reward))
            break

In [9]:
# count how many times MiniMax with evaluation won
wins=results.count(1)
print(f"MiniMax with evaluation won {wins} games")
# count how many times MiniMax with evaluation lost
losses=results.count(-1)
print(f"MiniMax with evaluation lost {losses} games")
# count tie games
ties=results.count(0)
print(f"the game has tied {ties} times")          

MiniMax with evaluation won 7 games
MiniMax with evaluation lost 2 games
the game has tied 1 times
