# Chapter 11: Deep Learning Game Strategies

In this chapter, you’ll continue with what you learned in Chapter 10 and apply deep learning to another game: Connect Four. 

You'll use similated games as input data to feed into a deep neural network. The neural network consists of an input layer, some hidden layers, and an output layer. The output layer has three neurons, representing the three possible game outcomes: a win, a loss, or a tie. Essentially we are conducting a multi-category classification problem. The neural network we create includes both dense layers and a convolutional layer. You'll learn to treat the Connect Four game board as a two-dimensional image and extract spatial features from the board (four game pieces in a row horizontally, vertically, or diagonally) and associate these features with game outcome. After the model is trained, you'll use it to design game strategies to play Connect Four. At each step of the game, you'll look at all possible next moves. The model predicts the probability of winning the game with each hypothetical move. You'll pick the move with the highest probability of winning the game for the current player.

Finally, you'll test the game strategies against the rule-based AI players and see how strong the deep learning game strategies are. 

***
$\mathbf{\text{Create a subfolder for files in Chapter 11}}$<br>
***
We'll put all files in Chapter 11 in a subfolder /files/ch11. Run the code in the cell below to create the subfolder.

***

In [1]:
import os

os.makedirs("files/ch11", exist_ok=True)

## 1. Deep Learning Game Strateies in Connect Four
In this section, you’ll learn how to use deep neural network to train intelligent game strategies for Connect Four. In particular, you’ll use the convolutional neural network that you used in image classification to train the game strategy. By treating the game board as a two-dimensional graph instead of a one-dimensional vector, you’ll greatly improve the intelligence of your game strategies.

You’ll learn how to prepare data to train the model, how to interpret the prediction from the model. How to use the prediction to play games, and how to check the efficacy of your strategies.

## 1.1. Summarize the Deep Learning Game Strategy
Here is a summary of what we’ll do to train the game strategy:

1.	We’ll let two computer players automatically play a game with random moves, and record the whole game history. The game history will contain all the game board positions from the very first move to the very last move.
2.	We then associate each board position with a game outcome (win, tie, or lose). The game board position is similar to features X in our image classification problem, and the outcome is similar to labels y in our classification problem.
3.	We’ll simulate 1,000,000 games. By using the histories of the games and the corresponding outcomes as Xs and ys, we feed the data into a Deep Neural Networks model. After the training is done, we have a trained model.
4.	We can now use the trained model to play a game. At each move of the game, we look at all possible next moves, and feed the hypothetical game board into the pretained model. The model will tell you the probabilities of win, lose, and tie.
5.	You select the move that the model predicts with the highest chance of winning.


## 1.2. Simulate Connect Four Games
You’ll learn how to generate data to train the DNN. The logic is as follows: you’ll generate 100,000 games in which both players use random moves. You’ll then record the board positions of all intermediate steps and the eventual outcomes of each board position (win, lose, or tie). 

First, let's simulate one game. The code in the cell below accomplishes that.

In [2]:
from utils.conn_simple_env import conn
import time
import random
import numpy as np
from pprint import pprint

# Initiate the game environment
env=conn()

# Define the one_game() function
def one_game():
    history = []
    state=env.reset()   
    while True:   
        action = random.choice(env.validinputs)  
        new_state, reward, done, info = env.step(action)
        history.append(np.array(new_state).reshape(7,6))
        if done:
            break
    return history, reward

# Simulate one game and print out results
history, outcome = one_game()
pprint(history)
pprint(outcome)        

[array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0]]),
 array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0]]),
 array([[ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0]]),
 array([[ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1, -1,  0,  0,  0,  0]]),
 array([[ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
      

Note here we convert the game board to a 7 by 6 array so it's easy for you to see the positions of the game pieces. 

Now let's simulate 100,000 games and save the data.

In [3]:
# simulate the game 100000 times and record all games
results = []        
for x in range(100000):
    history, outcome = one_game()
    # Associate each board with the game outcome
    for board in history:
        results.append((outcome, board))    

Now let's save the data on your computer for later use

In [4]:
import pickle
# save the simulation data on your computer
with open('files/ch11/games_conn100K.p', 'wb') as fp:
    pickle.dump(results,fp)
# read the data and print out the first 10 observations       
with open('files/ch11/games_conn100K.p', 'rb') as fp:
    games = pickle.load(fp)
pprint(games[:10])

[(1,
  array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0]])),
 (1,
  array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1, -1,  0,  0,  0,  0]])),
 (1,
  array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1, -1,  0,  0,  0,  0]])),
 (1,
  array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1, -1,  0,  0,  0,  0]])),
 (1,
  array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
     

Each observation has two values. The first is the outcome, in the form of -1, 0, or 1. The second is the game board position as a 7 by 6 numpy array. The data seem to have been stored correctly. 

We have the data we need. You’ll learn how to train the model next.

# 2. A Deep Neural Network for Connect Four

We'll use Keras to create a deep neural network to train game strategies in Connect Four. Compared to the neural network we used in Chapter 10 to train game stategies for Tic Tac Toe, only a few small changes.

## 2.1. Create the Connect Four Model
The model we created below has one convolutional layer and several dense layers.  

In [5]:
from random import choice
import pickle
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense, Conv2D, Flatten
from tensorflow.keras.models import Sequential
import numpy as np

model = Sequential()
model.add(Conv2D(filters=128, 
kernel_size=(4,4),padding="same",activation="relu",
                 input_shape=(7,6,1)))
model.add(Flatten())
model.add(Dense(units=64, activation="relu"))
model.add(Dense(units=64, activation="relu"))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
                   optimizer='adam', 
                   metrics=['accuracy'])

We first use a convoluttional layer with 128 filters. The kernel size is 4 by 4. This is different from teh kernel size we used in chapter 10. Since in the connecgt four gaem, we need four pieces in a row to win the game, so we use a four by four kernel to scan over the game board to identify spatial patterns. In particular, the kernetl will identify four piecdes in a row and assocate them wti hte game outcome. 

We then flatten the output form the convolutional layer to a vector and feed it to two hidden dense layers with 64 neurons each. The output layer has three neurons, representing there possible game outcomes: a win, a tie, or a loss. The softmax activation ensures that the proabilities add up to 100%. 

## 2.2. Train the Deep Learning Model for Connect Four
We'll train the deep neural network we created in the last section. We first preprocess the data so that we can feed them into the mocdel

The outcome data is a variable with three possible values: -1, 0, and 1. We'll convert them into one-hot variables so that the deep neural network can process. 

In [6]:
import tensorflow as tf

labels=[0,1,-1]
one_hot=tf.keras.utils.to_categorical(labels,3)
print(one_hot)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In the example above, we have three labels: 0, 1, and -1. They represent a tie, a win for Player X, and a loss for Player X (i.e., a win for Player O).

We can use the *to_categorical()* method in TensorFlow to change them into one-hot variables. The second argument in the *to_categorical()* method, 3, indicates the depth of the one-hot variable. This means each one-hot variable will be a vector with a length of 3, with value 1 in one position and 0
in all others.

A tie, which has an initial label of 0, now becomes a one-hot label: a 3-value
vector [1, 0, 0]. The first value (i.e., index 0) is turned on as 1, and the other two values are turned off as 0. Similarly, a win for Player X, which has a label of 1 originally, now becomes a one-hot label of [0, 1, 0]. The second value (i.e., index 1) is turned on as 1, and the rest are turned off as 0. By the same logic, a loss for Player X, with an original value of -1, is now represented by
[0, 0, 1]. 

Next, we load up the simulated game data and convert them into Xs and ys so that we can feed them into the deep neural network:

In [7]:
with open('files/ch11/games_conn100K.p', 'rb') as fp:
    tttgames=pickle.load(fp)

boards = []
outcomes = []
for game in tttgames:
    boards.append(game[1])
    outcomes.append(game[0])

X = np.array(boards).reshape((-1, 7, 6, 1))
# one_hot encoder, three outcomes: -1, 0, and 1
y = tf.keras.utils.to_categorical(outcomes, 3)

Next, we train the model for 100 epochs and save the model on the computer.

In [8]:
# Train the model for 100 epochs
model.fit(X, y, epochs=100, verbose=0)
model.save('files/ch11/trained_conn.h5')

It takes several hours to train the model since we have close to a million observations. The trained model is saved on your computer. 

## 3. Use the Trained Model to Play Connect Four
Next, we’ll use the strategy to play a game. 

The first player will use the best move from the trained model. The second player will randomly select a move. 

## 3.1. Best Moves Based on the Trained Model
First, we'll define a *best_move_red()* function for the red player. The function will go over each move hypothetically, and use the trained deep neural network to predict the probability of the red player winning the game. The function returns the move with the highest chance of the red player winning.

What the function does is as follows:
1.	Look at the current board.
2.	Look at all possible next moves, and add a move to the current board to form a hypothetical board.
3.	Use the pretained model to predict the chance of the red player winning with the hypothetical board.
4.	Choose the move that produces the highest chance of winning. 

In [9]:
def best_move_red(env):
    # if there is only one valid move, take it
    if len(env.validinputs)==1:
        return env.validinputs[0]
    # Set the initial value of bestoutcome        
    bestoutcome=-2;
    bestmove=None    
    #go through all possible moves hypothetically 
    for move in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(move)
        state=state.reshape(-1,7,6,1)
        prediction=reload.predict(state, verbose=0)
        # output is prob(red wins) - prob(yellow wins)
        win_lose_dif=prediction[0][1]-prediction[0][2]
        if win_lose_dif>bestoutcome:
            # Update the bestoutcome
            bestoutcome = win_lose_dif
            # Update the best move
            bestmove = move
    return bestmove

Similarly, we'll define a *best_move_yellow()* function for Player 2. The function will go over each move hypothetically, and use the trained deep neural network to predict the probability of the yellow player winning the game. The function returns the move with the highest chance of winning for Player 2.

In [10]:
def best_move_yellow(env):
    # if there is only one valid move, take it
    if len(env.validinputs)==1:
        return env.validinputs[0]
    # Set the initial value of bestoutcome        
    bestoutcome=-2;
    bestmove=None    
    #go through all possible moves hypothetically 
    for move in env.validinputs:
        env_copy=deepcopy(env)
        state,reward,done,info=env_copy.step(move)
        state=state.reshape(-1,7,6,1)
        prediction=reload.predict(state, verbose=0)
        # output is prob(yellow wins) - prob(red wins)
        win_lose_dif=prediction[0][2]-prediction[0][1]
        if win_lose_dif>bestoutcome:
            # Update the bestoutcome
            bestoutcome = win_lose_dif
            # Update the best move
            bestmove = move
    return bestmove

## 3.2. Play A Game with the Trained Model
Now let's use the best move functions to choose moves for the red player and play a game against random moves. 

In [11]:
from utils.conn_simple_env import conn
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf

reload=tf.keras.models.load_model('files/ch11/trained_conn.h5')


# Initiate the game environment
env=conn()
state=env.reset()   
while True:
    # Use the best_move() function to select the next move
    action = best_move_red(env)
    print(f"Player red has chosen action={action}")    
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==1:
            print(f"Player red has won!") 
        else:
            print(f"It's a tie!") 
        break   
    action = random.choice(env.validinputs)
    print(f"Player yellow has chosen action={action}")    
    state, reward, done, info = env.step(action)
    print(f"the current state is \n{state.T[::-1]}")
    if done:
        if reward==-1:
            print(f"Player yellow has won!") 
        else:
            print(f"It's a tie!") 
        break     

Player red has chosen action=4
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
Player yellow has chosen action=7
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0 -1]]
Player red has chosen action=3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  1  1  0  0 -1]]
Player yellow has chosen action=3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  0  0  0  0]
 [ 0  0  1  1  0  0 -1]]
Player red has chosen action=5
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  0  0  0  0]
 [ 0  0  1  1  1  0 -1]]
Player yellow has chosen action=5
the current 

The red player has connect four horizontally and won the game. 

# 4. Test the Efficacy of the DNN Model
Next, we’ll test how often the DNN trained game strategy wins against a player who makes random moves. 
The following script does that:

## 5.1. Deep Learning versus Random Moves
We'll see how the deep learning game strategy fairs against a random-move agent. We simulate 100 games. If the deep learning agent wins, we record an outcome of 1. Otherwise, we record an outcome of -1. 

In [12]:
results=[]
for i in range(100):
    state=env.reset() 
    if i%2==0:
        action=random.choice(env.validinputs)
        state, reward, done, info = env.step(action)
    while True:
        if env.turn=="red":
            action = best_move_red(env) 
        else:
            action = best_move_yellow(env)    
        state, reward, done, info = env.step(action)
        if done:
            # result is 1 if the DL agent wins
            if reward!=0:
                results.append(1) 
            else:
                results.append(0)    
            break  
        action = random.choice(env.validinputs)   
        state, reward, done, info = env.step(action)
        if done:
            # result is -1 if the DL agent loses
            if reward!=0:
                results.append(-1) 
            else:
                results.append(0)    
            break

Among 50 games, the deep learning agent moves. In the remaining 50 games, the random-move agent goes first. This way, no player has a first-mover's advantage. We first create an empty list *results*. Whenever the deep learning agent wins, we append a value of 1 to the list. Otherwise we add an element of -1 to the list.

Next, we count how many times the deep learning agent has won:

In [13]:
# count how many times the MCTS agent won
wins=results.count(1)
print(f"the deep learning agent has won {wins} games")
# count how many times the MCTS agent lost
losses=results.count(-1)
print(f"the deep learning agent has lost {losses} games")         
# count how many times the game ties
losses=results.count(0)
print(f"the game has tied {losses} times") 

the deep learning agent has won 100 games
the deep learning agent has lost 0 games
the game has tied 0 times


The deep learning agent wins all 100 games. So the deep learning game strategy works really well!

## 4.2. Deep Learning versus Think-Three-Steps-Ahead AI
Next, we see how the deep learning agent fairs against the think-three-steps-ahead AI agent that we developed in Chapter 5. 

In [14]:
from utils.ch05util import conn_think3
# Initiate the game environment
env=conn()
results=[]
for i in range(100):
    state=env.reset() 
    if i%2==0:
        action=conn_think3(env)
        state, reward, done, info = env.step(action)
    while True:
        if env.turn=="red":
            action = best_move_red(env) 
        else:
            action = best_move_yellow(env)    
        state, reward, done, info = env.step(action)
        if done:
            # result is 1 if the deep learning agent wins
            if reward!=0:
                results.append(1) 
            else:
                results.append(0)    
            break  
        action = action=conn_think3(env)   
        state, reward, done, info = env.step(action)
        if done:
            # result is -1 if the deep learning agent loses
            if reward!=0:
                results.append(-1) 
            else:
                results.append(0)    
            break
        


We test 100 games and in 50 of them, we let the think-three-steps-ahead agent go first. In the other 50 games, the deep learning agent moves first. We record game outcomes in a list results. If the deep learning agent wins, we record an outcome of 1 in the list results. If the deep learning agent loses, we record an outcome of -1. If the game is tied, we record an outcome of 0.

Next, we check how many times the deep learning agent has won:

In [15]:
# count how many times the MCTS agent won
wins=results.count(1)
print(f"the deep learning agent has won {wins} games")
# count how many times the MCTS agent lost
losses=results.count(-1)
print(f"the deep learning agent has lost {losses} games")         
# count how many times the game ties
losses=results.count(0)
print(f"the game has tied {losses} times") 

the deep learning agent has won 84 games
the deep learning agent has lost 16 games
the game has tied 0 times


Results show that the deep learning agent has won 84 games and lost 16 games out of 100 games. So the deep learning game strategy works really well and seems to be better than a think-three-steps-ahead agent. 