# Chapter 12: Deep Learning Game Strategies: Connect Four

In this chapter, you’ll combine what you have learned in Chapters 5 to 10 to design deep learning game strategies for the Connect Four game. You'll first create a game environment for Connect Four with all the features and methods of a typical OpenAI Gym game environment. The game environment also has a graphical interface. 

You'll use similated games as input data to feed into a deep neural network. After the model is trained, you'll use it to play games. At each step of the game, you'll look at all possible next moves. The model predicts the probability of winning the game with each hypothetical move. You'll pick the move with the highest probability of winning the game.

Finally, you'll animate the decision making process. You'll use the deep learning game strategy to play a full game. At each step, the animation will show the game board on the left, and the probability of winning for each next move on the right. The best move will be highlighted, as follows:
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_DL_steps.gif" />

***
$\mathbf{\text{Create a subfolder for files in Chapter 12}}$<br>
***
We'll put all files in Chapter 12 in a subfolder /files/ch12. Run the code in the cell below to create the subfolder.

***

In [1]:
import os

os.makedirs("files/ch12", exist_ok=True)

## 1. Create the Connect Four Game Environment
We'll create a Connect Four game environment, using the ***turtle*** library to draw game boards. We’ll create all the features and methods that a typical OpenAI Gym environment has. 

### 1.1. Use A Python Class to Represent the Environment
We’ll create a Python class to represent the Connect Four game environment. The class will have various attributes, variables, and methods to replicate those in a typical OpenAI Gym game environment. 

#### Attributes
Specifically, our self-made Connect Four game environment will have the following attributes:
 
*	action_space: an attribute that provides the space of all actions that can be taken by the agent. The action space will have seven values, 1 to 7. This represents the 7 columns a player can drop discs in.
*	observation_space: an attribute that provides the list of all possible states in the environment. We'll use a numpy array with 7 rows and 6 columns to represent the 42 cells on a game board.
*	state: an attribute indicating which state the agent is currently in. Each of the 42 cells can take values -1 (occupied by player Yellow), 0 (empty), or 1 (occupied by player Red).
*	action: an attribute indicating the action taken by the agent. The action is a number between 1 and 7.
*	reward: an attribute indicating the reward to the agent because of the action taken by the agent. The reward is 0 in each step, unless a player has won the game, in which case the winner has a reward of 1 and the loser a reward of -1. 
*	done: an attribute indicating whether the game has ended. This happens when one player wins or if the game is tied.
*	info: an attribute that provides information about the game. We'll set it as an empty string "". 

#### Methods
Our self-made Connect Four game environment will have a few methods as well:
 
*	reset() is a method to set the game environment to the initial (that is, the starting) state. All cells on the board will be empty.
*	render() is a method showing the current state of the environment graphically.
*	step() is a method that returns the new state, the reward, the value of *done* variable, and the varibale *info* based on the action taken by the agent.
*	sample() is a method to randomly choose an action from the action space.
*	close() is a method to end the game environment.

### 1.2. Create A Local Module for the Connect Four Game
We'll create a local module for the Connect Four game and place it inside the local package for this book: the package ***utils*** that we have created in Chapter 10.

Now let's code in a self-made Connect Four game environment using a Python class. Save the script in the cell below as *Connect4_env.py* in the folder *utils* you created in Chapter 10. Alternatively, you can download it from my GitHub repository. 

In [None]:
import turtle as t
from random import choice
import numpy as np
import time

# Define an action_space helper class
class action_space:
    def __init__(self, n):
        self.n = n
    def sample(self):
        num = np.random.choice(range(self.n))
        # covert 0-6 to 1-7 to avoid confusion 
        action = 1+num
        return action
    
# Define an obervation_space helper class    
class observation_space:
    def __init__(self, row, col):
        self.shape = (row, col)

class conn():
  
    def __init__(self): 
        # use the helper action_space class
        self.action_space=action_space(7)
        # use the helper observation_space class
        self.observation_space=observation_space(7,6)
        self.info=""   
        # The x-coordinates of the center of the 7 columns
        self.xs = [-300,-200,-100,0,100,200,300]
        # The y-coordinates of the center of the 6 rows
        self.ys = [-250,-150,-50,50,150,250]  
        self.showboard=False  
        self.game_piece=None 
            
    def reset(self):  
        # The X player moves first
        self.turn = "red"
        # Create a list of valid moves
        self.validinputs = [1,2,3,4,5,6,7]
        # Create a list of lists to track game pieces
        self.occupied = [list(),list(),list(),list(),list(),list(),list()]
        # Tracking the state
        self.state=np.array([[0,0,0,0,0,0],
                            [0,0,0,0,0,0],
                            [0,0,0,0,0,0],
                            [0,0,0,0,0,0],
                            [0,0,0,0,0,0],
                            [0,0,0,0,0,0],
                            [0,0,0,0,0,0]])
        self.done=False
        self.reward=0     
        return self.state        
        
    # step() function: place piece on board and update state
    def step(self, inp):
        # Remember the current game piece
        self.game_piece=[inp-1, len(self.occupied[inp-1]), self.turn]        
        # update the state: red is 1 and yellow is -1
        self.state[inp-1][len(self.occupied[inp-1])]=2*(self.turn=="red")-1       
        # Add the move to the occupied list 
        self.occupied[inp-1].append(self.turn)

        # Update the list of valid moves
        if len(self.occupied[inp-1]) == 6 and inp in self.validinputs:
            self.validinputs.remove(inp)  
        # check if the player has won the game
        if self.win_game(inp) == True:
            self.done=True
            # reward is 1 if red won; -1 if yellow won
            self.reward=2*(self.turn=="red")-1
            self.validinputs=[]
        # If all cellls are occupied and no winner, it's a tie
        elif len(self.validinputs) == 0:
            self.done=True
            self.reward=0
        else:
            # Give the turn to the other player
            if self.turn == "red":
                self.turn = "yellow"
            else:
                self.turn = "red"             
        return self.state, self.reward, self.done, self.info
                     
    # Determine if a player has won the game
    # Define a horizontal4() function to check connecting 4 horizontally
    def horizontal4(self, x, y):
        win = False
        for dif in (-3, -2, -1, 0):
            try:
                if self.occupied[x+dif][y] == self.turn\
                and self.occupied[x+dif+1][y] == self.turn\
                and self.occupied[x+dif+2][y] == self.turn\
                and self.occupied[x+dif+3][y] == self.turn\
                and  x+dif >= 0:
                    win = True            
            except IndexError:
                pass
        return win         
    # Define a vertical4() function to check connecting 4 vertically
    def vertical4(self, x, y):
        win = False
        try:
            if self.occupied[x][y] == self.turn\
            and self.occupied[x][y-1] == self.turn\
            and self.occupied[x][y-2] == self.turn\
            and self.occupied[x][y-3] == self.turn\
            and y-3 >= 0:
                win = True     
        except IndexError:
            pass
        return win   
    # Define a forward4() function to check connecting 4 diagonally in / shape
    def forward4(self, x, y):
        win = False
        for dif in (-3, -2, -1, 0):
            try:
                if self.occupied[x+dif][y+dif] == self.turn\
                and self.occupied[x+dif+1][y+dif+1] == self.turn\
                and self.occupied[x+dif+2][y+dif+2] == self.turn\
                and self.occupied[x+dif+3][y+dif+3] == self.turn\
                and x+dif >=  0 and y+dif >= 0:
                    win = True            
            except IndexError:
                pass
        return win     
    # Define a back4() function to check connecting 4 diagonally in \ shape
    def back4(self, x, y):
        win = False
        for dif in (-3, -2, -1, 0):
            try:
                if self.occupied[x+dif][y-dif] == self.turn\
                and self.occupied[x+dif+1][y-dif-1] == self.turn\
                and self.occupied[x+dif+2][y-dif-2] == self.turn\
                and self.occupied[x+dif+3][y-dif-3] == self.turn\
                and x+dif >=  0 and y-dif-3 >= 0:
                    win = True            
            except IndexError:
                pass
        return win         
    
    # Define a win_game() function to check if someone wins the game
    def win_game(self, inp):
        win = False
        x = inp-1
        y = len(self.occupied[inp-1])-1
        # Check all winning possibilities
        if self.vertical4(x,y)==True:
            win = True
        if self.horizontal4(x,y)==True:
            win = True
        if self.forward4(x,y)==True:
            win = True
        if self.back4(x,y)==True:
            win = True
        return win

    def display_board(self):
        # Set up the screen
        try:
            t.setup(730,680, 10, 70)
        except:
            t.setup(730,680, 10, 70)
        t.hideturtle()
        t.tracer(False)
        t.title("Connect Four in Turtle Graphics")
        # Draw frame
        t.pensize(5)
        t.up()
        t.goto(-350,-300)
        t.down()
        t.begin_fill()
        t.color("black", "blue")
        t.forward(700)
        t.left(90)
        t.forward(600)
        t.left(90)
        t.forward(700)
        t.left(90)
        t.forward(600)
        t.left(90)
        t.end_fill()
        t.up()
        # Write column numbers on the board
        colnum = 1
        for x in range(-300, 350, 100):
            t.goto(x,300)
            t.write(colnum,font = ('Arial',20,'normal'))
            t.goto(x,-330)
            t.write(colnum,font = ('Arial',20,'normal'))
            colnum +=  1          
        # Show white cells
        for col in range(7):
            for row in range(6):
                t.up()
                t.goto(self.xs[col],self.ys[row])
                t.dot(80,"white") 
        t.update()                  
        # Create a second turtle to show disc falling
        self.fall = t.Turtle()
        self.fall.up()
        self.fall.hideturtle()
        
    def render(self):
        if self.showboard==False:
            self.display_board()
            self.showboard=True

        if self.game_piece is not None:
            # Show the disc fall from the top
            col, row, c = self.game_piece
            if row<7:
                for i in range(6,row+1,-1):
                    self.fall.goto(self.xs[col],self.ys[i-1])
                    self.fall.dot(80,c)
                    t.update()
                    time.sleep(0.05)
                    self.fall.clear()
            # Go to the cell and place a dot of the player's color
            t.up()
            t.goto(self.xs[col],self.ys[row])
            t.dot(80,c)            
            t.update() 
        
    def close(self):
        time.sleep(1)
        try:
            t.bye()
        except t.Terminator:
            print('exit turtle')

If you run the above cell, nothing will happen. The class simply creates a game environment. We need to initiate the game environment and start playing using Python programs, just as you do with an OpenAI Gym game environment. We'll do that in the next subsection.

### 1.3. Verify the Custom-Made Game Environment
Next, we'll check the attributes and methods of the self-made game environment and make sure it has all the elements that are provided by a typical OpenAI Gym game environment. 

First we'll initiate the game environment and show the game board.

In [33]:
from utils.Connect4_env import conn

env = conn()
env.reset()                    
env.render()

You should see a separate turtle window, with a game board as follows: 
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_start.png" />

If you want to close the game board window, use the *close()* method, like so:

In [39]:
env.close()

Next, we'll check the attributes of the environment such as the observation space and action space. 

In [35]:
env=conn()
# check the action space
number_actions = env.action_space.n
print("the number of possible actions are", number_actions)
# sample the action space ten times
print("the following are ten sample actions")
for i in range(10):
   print(env.action_space.sample())
# check the shape of the observation space
print("the shape of the observation space is", env.observation_space.shape)

the number of possible actions are 7
the following are ten sample actions
7
1
4
1
7
2
4
6
4
6
the shape of the observation space is (7, 6)


The meanings of the actions in this game are as follows
* 1: Placing a game piece in column 1
* 2: Placing a game piece in column 2
* ...
* 7: Placing a game piece in column 7


The state space is a matrix with 7 columns and 6 rows: 
* 0 means the cell is empty; 
* -1 means the cell is occupied by the yellow player; 
* 1 means the cell is occupied by the red player.

## 2. Play Games in the Connect Four Environment
Next, we'll play games in the custom-made environment. 

### 2.1. Play a full game

Here we'll play a full game, by randomly choosing an action from the action space each step.

In [None]:
import time
import random
from utils.Connect4_env import conn

# Initiate the game environment
env = conn()
state=env.reset()   
env.render()
# Play a full game manually
while True:
    print(f"the current state is \n{np.array(state).T[::-1]}")    
    action = random.choice(env.validinputs)
    time.sleep(1)
    print(f"Player red has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_state).T[::-1]}")
        if reward==1:
            print(f"Player red has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is \n{np.array(new_state).T[::-1]}")    
    action = random.choice(env.validinputs)
    time.sleep(1)
    print(f"Player yellow has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_new_state).T[::-1]}")
        if reward==-1:
            print(f"Player yellow has won!") 
        else:
            print(f"It's a tie!") 
        break
    else: 
        # play next round
        state=new_new_state
    
env.close()      

Note that the outcome is different each time you run it because the actions are randomly chosen.

### 2.2. Play the Game Manually
Next, you’ll learn how to manually interact with the Connect Four game. You'll use the key board to enter a number between 1 and 7. The following lines of code show you how.

In [44]:
import time
import random
from utils.Connect4_env import conn

# Initiate the game environment
env = conn()
state=env.reset()   
env.render()
print('enter a number between 1 and 7')
# Play a full game manually
while True:
    print(f"the current state is \n{np.array(state).T[::-1]}")    
    action = int(input("Player red, what's your move?"))
    time.sleep(1)
    print(f"Player red has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_state).T[::-1]}")
        if reward==1:
            print(f"Player red has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is \n{np.array(new_state).T[::-1]}")    
    action = random.choice(env.validinputs)
    time.sleep(1)
    print(f"Player yellow has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_new_state).T[::-1]}")
        if reward==-1:
            print(f"Player yellow has won!") 
        else:
            print(f"It's a tie!") 
        break
    else: 
        # play next round
        state=new_new_state
    
env.close()      

enter a number between 1 and 7
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]
Player red, what's your move?4
Player red has chosen action=4
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
Player yellow has chosen action=5
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1 -1  0  0]]
Player red, what's your move?4
Player red has chosen action=4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0  0  1 -1  0  0]]
Player yellow has chosen action=7
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0  0  1 -1  0 -1]]
Player red, what's your mo

I am the red player, and I have won by connecting four pieces vertically in column 4.

## 3. Train the Deep Learning Game Stratey
In this section, you’ll learn how to use a deep neural network to train intelligent game strategies for Connect Four. In particular, you’ll use the convolutional neural network that you used in image classification to train the game strategy. By treating the game board as a two-dimensional graph instead of a one-dimensional vector, you’ll greatly improve the intelligence of your game strategies.

You’ll learn how to prepare data to train the model; how to interpret the predictions from the model; how to use the prediction to play games; and how to check the efficacy of your strategies.

### 3.1. A Summary of the Deep Learning Game Strategy
Here is a summary of what we’ll do to train the game strategy:

1.	We’ll let two computer players automatically play a game with random moves, and record the whole game history. The game history will contain all the game board positions from the very first move to the very last move.
2.	We then associate each board position with a game outcome (win, tie, or lose). The game board position is similar to features X in our image classification problem, and the outcome is similar to labels y in our classification problem.
3.	We’ll simulate 100,000 games. By using the histories of the games and the corresponding outcomes as Xs and ys, we feed the data into a Deep Neural Networks model. After the training is done, we have a trained model.
4.	We can now use the trained model to play a game. At each move of the game, we look at all possible next moves, and feed the hypothetical game board into the pretained model. The model will tell you the probabilities of win, lose, and tie.
5.	You select the move that the model predicts with the highest chance of winning.


### 3.2. Simulate Games
You’ll learn how to generate data to train the DNN. The logic is as follows: you’ll generate 100,000 games in which both players use random moves. You’ll then record the board positions of all intermediate steps and the eventual outcomes of each board position (win, lose, or tie). 

First, let's simulate one game. The code in the cell below accomplishes that.

In [None]:
from utils.Connect4_env import conn
import time
import random
import numpy as np
from pprint import pprint

# Initiate the game environment
env=conn()

# Define the one_game() function
def one_game():
    history = []
    state=env.reset()   
    while True:   
        action = random.choice(env.validinputs)  
        new_state, reward, done, info = env.step(action)
        history.append(np.array(new_state).reshape(7,6))
        if done:
            break
    return history, reward

# Simulate one game and print out results
history, outcome = one_game()
pprint(history)
pprint(outcome)        

Note here we have converted the game board to a 7 by 6 array so it's easy for you to see the positions of the game pieces. 

Now let's simulate 100,000 games and save the data.

In [49]:
# simulate the game 100000 times and record all games
results = []        
for x in range(100000):
    history, outcome = one_game()
    # Note here I associate each board with the game outcome
    for board in history:
        results.append((outcome, board))    


Now let's save the data on your computer for later use

In [50]:
import pickle
# save the simulation data on your computer
with open('files/ch12/games_conn100K.p', 'wb') as fp:
    pickle.dump(results,fp)
# read the data and print out the first 10 observations       
with open('files/ch12/games_conn100K.p', 'rb') as fp:
    games = pickle.load(fp)
pprint(games[:10])

[(1,
  array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0]])),
 (1,
  array([[-1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0]])),
 (1,
  array([[-1,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0]])),
 (1,
  array([[-1,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0],
       [-1,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0,  0]])),
 (1,
  array([[-1,  0,  0,  0,  0,  0],
       [ 1,  1,  0,  0,  0,  0],
     

Each observation has two values. The first is the outcome, in the form of -1, 0, or 1. The second is the game board position as a 7 by 6 numpy array. The data seem to have been stored correctly. 

We now have the data we need. You’ll learn how to train the model next.

### 3.3. Train Your Connect Four Game Strategy Using A Deep Neural Network
The following neural network trains the game strategy using the data you just created.


In [None]:
from random import choice
import pickle
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense, Conv2D, Flatten
from tensorflow.keras.models import Sequential
import numpy as np
      
with open('files/ch12/games_conn100K.p', 'rb') as fp:
    games=pickle.load(fp)

boards = []
outcomes = []
for game in games:
    boards.append(game[1])
    outcomes.append(game[0])

X = np.array(boards).reshape((-1, 7, 6, 1))
# one_hot encoder, three outcomes: -1, 0, and 1
y = to_categorical(outcomes, num_classes=3)

model = Sequential()
model.add(Conv2D(filters=128, kernel_size=(4, 4),padding="same", 
                 activation="relu", input_shape=(7,6,1)))
model.add(Flatten())
model.add(Dense(units=64, activation="relu"))
model.add(Dense(units=64, activation="relu"))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
                   optimizer='adam', 
                   metrics=['accuracy'])

# Train the model for 100 epochs
model.fit(X, y, epochs=100, verbose=1)
model.save('files/ch12/trained_conn100K.h5')

The model is now trained. Let's test how good it is.

## 4. Use the Trained Model to Play Games
Next, we’ll use the trained model to play a game. 

The red player will use the best move from the trained model. The yellow player will randomly select a move. 

### 4.1. The Best Move Based on the Trained Deep Neural Network
First, we'll define a *best_move()* function for player X. The function takes a board position as its first argument, and a list of possible next moves as its second argument. We also need the list *occupied* to calculate which row the falling piece will land.

The function will go over each move hypothetically, and use the trained deep neural network to predict the probability of the red player winning the game. The function returns the move with the highest chance of winning.

We define a best_move() function for the computer to find best moves. 
What the computer does is as follows:
1.	Look at the current board.
2.	Look at all possible next moves, and add each move to the current board to form a hypothetical board
3.	Use the pretained model to predict the chance of winning with the hypothetical board
4.	Choose the move that produces the highest chance of winning. 

In [56]:
def best_move(board, valids, occupied):
    # if there is only one valid move, take it
    if len(valids)==1:
        return valids[0]
    # Set the initial value of bestoutcome        
    bestoutcome = -2;
    bestmove=None    
    #go through all possible moves hypothetically to predict outcome
    for col in valids:
        tooccupy=deepcopy(board)
        row = 1+len(occupied[col-1])
        tooccupy[col-1][row-1]=1
        prediction=reload.predict(np.array(tooccupy).reshape((-1, 7, 6, 1)),verbose=0)
        p_win=prediction[0][1]
        if p_win>bestoutcome:
            # Update the bestoutcome
            bestoutcome = p_win
            # Update the best move
            bestmove = col
    return bestmove

Now let's use the *best_move()* function to choose moves for the red player and play a game. The yellow player picks random moves.

In [66]:
from utils.Connect4_env import conn
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf

# You can either use your own trained model, or the one I provided in GitHub
#reload = tf.keras.models.load_model('files/ch12/trained_conn100K.h5')
reload = tf.keras.models.load_model('files/ch12/trained_conn_model_padding.h5')

# Initiate the game environment
env=conn()
state=env.reset()   
env.render()

print("enter a move in the form of 1 to 7")

# Play a full game manually
while True:
    print(f"the current state is \n{np.array(state).T[::-1]}") 
    # Use the best_move() function to select the next move
    action = best_move(state, env.validinputs, env.occupied)
    print(f"Player red has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_state).T[::-1]}") 
        if reward==1:
            print(f"Player red has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is \n{np.array(new_state).T[::-1]}")    
    action = random.choice(env.validinputs)
    print(f"Player yellow has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"the current state is \n{np.array(new_new_state).T[::-1]}") 
        if reward==-1:
            print(f"Player yellow has won!") 
        else:
            print(f"It's a tie!") 
        break
    else: 
        # play next round
        state=new_new_state
    

enter a move in the form of 1 to 7
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]
Player red has chosen action=4
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
Player yellow has chosen action=3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  1  0  0  0]]
Player red has chosen action=4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
Player yellow has chosen action=5
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0 -1  1 -1  0  0]]
Player red has chosen action=4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0

In [65]:
env.close()

The computer player will look at each possible next move, and add that move to the current board to form a hypothetical board. We feed the hypothetical board into the model to make predictions. The prediction will have three values: the probability of tying, player 1 winning, and player 2 winning. The computer will choose the move with the highest probability of the red player winning the game. 

Here is one example of the eventual outcome:

<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_win.png" /> 

### 4.2. Test the Efficacy of the DNN Model
Next, we’ll test how often the DNN trained game strategy wins against a player who makes random moves. 
The following script does that:

In [68]:
from utils.Connect4_env import conn
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf

#reload = tf.keras.models.load_model('files/ch12/trained_conn10K.h5')
reload = tf.keras.models.load_model('files/ch12/trained_conn_model_padding.h5')

# Initiate the game environment
env=conn()

def test_one_conn():
    state=env.reset()   
    while True:
        action = best_move(state, env.validinputs, env.occupied)  
        new_state, reward, done, info = env.step(action)
        if done:
            break 
        action = random.choice(env.validinputs)
        new_new_state, reward, done, info = env.step(action)
        if done:
            break
        else: 
            # play next round
            state=new_new_state
    return reward    

#repeat the game 1000 times and record all game outcomes
results=[]        
for x in range(1000):
    result=test_one_conn()
    results.append(result)    

#print out the number of winning games
print("the number of winning games is", results.count(1))

#print out the number of tying games
print("the number of tying games is", results.count(0))

#print out the number of losing games
print("the number of losing games is", results.count(-1))                 

the number of winning games is 1000
the number of tying games is 0
the number of losing games is 0


The trained model has won all 1000 games. 

## 5. Animate the Deep Learning Process
In this section, we'll create an animation to show how the agent makes a decision by getting the best move from the trained deep neural network at each step.

### 5.1. Print Out Probabilities of Winning for Each Next Move
In each stage of the game, we'll first draw the game board on the left of the screen. The red player will look at all possible next moves and use the trained deep neural network to predict the probability of winning for each hypothetical next move. We'll draw the probabilities on the right. Finally, we'll highlight the action with the highest probability of winng. The action is red player's next move. We'll repeat this step by step until the game ends. 

This animation will let us look under the hood and understand how deep learning can help us design intelligent game strategies.

The next script will play a full game and record the game board and the winning probabilities in each step of the game.

In [3]:
from utils.Connect4_env import conn
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf
import turtle as t

reload = tf.keras.models.load_model('files/ch12/trained_conn_model_padding.h5')

# Initiate the game environment
env=conn()
state=env.reset()  
step=0
try:
    ts=t.getscreen() 
except t.Terminator:
    ts=t.getscreen()
t.hideturtle()
env.render()
ts.getcanvas().postscript(file=f"files/ch12/conn_step{step}.ps")

# Create a list to record game history
history=[]

def best_move(board, valids, occupied):
    # if there is only one valid move, take it
    if len(valids)==1:
        return valids[0]
    # Set the initial value of bestoutcome        
    bestoutcome = -2;
    bestmove=None 
    # record winning probabilities for all hypothetical moves
    p_wins={}    
    #go through all possible moves hypothetically to predict outcome
    for col in valids:
        tooccupy=deepcopy(board)
        row = 1+len(occupied[col-1])
        tooccupy[col-1][row-1]=1
        prediction=reload.predict(np.array(tooccupy).reshape((-1, 7, 6, 1)),verbose=0)
        p_win=prediction[0][1]
        p_wins[col]=p_win
        if p_win>bestoutcome:
            # Update the bestoutcome
            bestoutcome = p_win
            # Update the best move
            bestmove = col
    return bestmove, p_wins

# Play a full game 
while True:
    print(f"the current state is \n{np.array(state).T[::-1]}")  
    bestmove,p_wins=best_move(state, env.validinputs, env.occupied)
    action=bestmove
    print(f"Player red has chosen action={action}")  
    old_state=deepcopy(state)
    new_state, reward, done, info = env.step(action)
    history.append([old_state,p_wins,action,deepcopy(new_state),done])
    env.render()
    step += 1      
    ts.getcanvas().postscript(file=f"files/ch12/conn_step{step}.ps")
    if done:
        print(f"the current state is \n{np.array(new_state).T[::-1]}") 
        if reward==1:
            print(f"Player red has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is \n{np.array(new_state).T[::-1]}")   
    action = random.choice(env.validinputs)
    print(f"Player yellow has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    step += 1      
    ts.getcanvas().postscript(file=f"files/ch12/conn_step{step}.ps")
    if done:
        print(f"the current state is \n{np.array(new_new_state).T[::-1]}") 
        if reward==-1:
            print(f"Player yellow has won!") 
        else:
            print(f"It's a tie!") 
        break
    else: 
        # play next round
        state=new_new_state
env.close()    

the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]]
Player red has chosen action=4
the current state is 
[[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0]]
Player yellow has chosen action=3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  1  0  0  0]]
Player red has chosen action=4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
Player yellow has chosen action=3
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0 -1  1  0  0  0]
 [ 0  0 -1  1  0  0  0]]
Player red has chosen action=4
the current state is 
[[ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0]
 [ 0  0  0  0  0  

In [4]:
p_wins_step0=history[0][1]
for key, value in p_wins_step0.items():
    print(f"If the red player chooses action {key}, the probability of winning is {value:.4f}.")

If the red player chooses action 1, the probability of winning is 0.3281.
If the red player chooses action 2, the probability of winning is 0.3718.
If the red player chooses action 3, the probability of winning is 0.4599.
If the red player chooses action 4, the probability of winning is 0.4793.
If the red player chooses action 5, the probability of winning is 0.4599.
If the red player chooses action 6, the probability of winning is 0.4495.
If the red player chooses action 7, the probability of winning is 0.4096.


The above results show that the probability of the red player winning the game is the highest, at 47.93%, if action 4 is taken. That's why the red player chooses column 4 in the first move. 

You can also print out the the probability of the red player winning the game in the next rounds, but I'll leave that for you to finish.

Let's save the game history data for later use. Run the code in the following cell:

In [8]:
import pickle

# save the game history on your computer
with open('files/ch12/conn_game_history.p','wb') as fp:
    pickle.dump((history, step),fp)

### 5.2. Animate the Whole Game
Next, you'll combine the pictures created in the last subsection into an animation. As a result, you'll see the game board step by step for the whole game.

In [6]:
import imageio
from PIL import Image

frames=[]
for i in range(step+1):
    im = Image.open(f"files/ch12/conn_step{i}.ps")
    frame=np.asarray(im)
    frames.append(frame) 
imageio.mimsave("files/ch12/conn_steps.gif", frames, fps=1) 

If you open the file conn_steps.gif, you'll see the following: 
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_steps.gif"/>

The animation shows the game board at each stage of the game. 

### 5.3. Animate the Decision Making
Next, we'll animate the decision making process of the red player in each stage of the game. We'll draw the probabilities of the red player winning the game for each hypothetical move. We'll highlight the move with the highest probability of winning the game. We'll animate this step by step until the game ends. 

In [9]:
import tensorflow as tf
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
import pickle
import os
from PIL import Image
import numpy as np

# load up the history data
history, num_step = pickle.load(open('files/ch12/conn_game_history.p', 'rb'))
# remember the best moves 
bests = []
for item in history:
    bests.append(item[2])
# Generate pictures
for stage in range(len(history)):
    fig = plt.figure(figsize=(10,6), dpi=200)
    ax = fig.add_subplot(111) 
    ax.set_xlim(0,10)
    ax.set_ylim(-2.5, 3.5)
    #plt.grid()
    plt.axis("off")
    plt.savefig(f"files/ch12/conn_stage{stage*2}step1.png") 
    xys = [[(4,-2.3),(2,0)],           
       [(4,-1.4),(2,0)],
       [(4,-0.5),(2,0)],
       [(4,0.4),(2,0)],
       [(4,1.3),(2,0)],
       [(4,2.2),(2,0)],
       [(4,3.1),(2,0)]]
    for xy in xys:
        ax.annotate("",xy=xy[0],xytext=xy[1],
        arrowprops=dict(arrowstyle = '->', color = 'g', linewidth = 2))  
    # add rectangle to plot
    ax.add_patch(Rectangle((0,-0.6), 2, 1.3,
                     facecolor = 'b',alpha=0.1))
    plt.text(0.2,-0.5,"Deep\nNeural\nNetwork",fontsize=20)        
    for m in range(7):
        plt.text(4.1, 3.1-0.9*m, f"Column {m+1}, \
        Pr(win)={history[stage][1].get(m+1,0):.4f}", fontsize=20, color="r")  
   

    plt.savefig(f"files/ch12/conn_stage{stage*2}step2.png") 
    
    # highlight the best action
    ax.add_patch(Rectangle((4,3.85-bests[stage]*0.9),
                           6, 0.5,facecolor = 'b',alpha=0.5))     
    plt.savefig(f"files/ch12/conn_stage{stage*2}step3.png")     
    plt.close(fig)

The above script highlights the decision making proces of red player. For example, if you open the file conn_stage0step3.png, you'll see the following picture.
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_stage0step3.png" /> It shows the probabilities of the red player winning the game with each hypothetical move. In particular, the proability is 47.93% if the red player chooses Column 4. The choice is highlighted in blue, and that is also the move made by the red player as a result. 

Next, we'll combine the pictures into an animation to show the decision-making process of the red player.

In [11]:
from PIL import Image
import imageio
import numpy as np

frames=[]

for stage in range(len(history)):
    for step in [1,2,3]:
        im = Image.open(f"files/ch12/conn_stage{stage*2}step{step}.png")
        f1=np.asarray(im)
        frames.append(f1)  
imageio.mimsave('files/ch12/conn_DL_probs.gif', frames, fps=2)

If you open the file conn_DL_probs.gif, you'll see the animation as follows.
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_DL_probs.gif" /> Note that your results will likely be different from mine due to the random nature of the game.

### 5.4. Animate Game Board Positions and the Decision Making
Next, we'll combine the game board positions and the decision making process of the red player in each stage of the game. On the left of the screen, we'll draw the game board. On the right of the screen, we'll draw the probabilities of red player winning the game for each hypothetical move. We'll animate this step by step until the game ends. 

In [12]:
for i in range(num_step+1):
    im = Image.open(f"files/ch12/conn_step{i}.ps")
    fig, ax=plt.subplots(figsize=(6,6), dpi=200)
    newax = fig.add_axes([0,0,1,1])
    newax.imshow(im)
    newax.axis('off')
    ax.set_xlim(-3,-3)
    ax.set_ylim(-3,-3)
    plt.axis("off")
    #plt.grid()
    plt.savefig(f"files/ch12/conn_step{i}plt.png")
    plt.close(fig)

frames=[]

for stage in range(len(history)):
    for step in [1,2,3]:
        im = Image.open(f"files/ch12/conn_step{stage*2}plt.png")
        f0=np.asarray(im)
        im = Image.open(f"files/ch12/conn_stage{stage*2}step{step}.png")
        f1=np.asarray(im)
        fs = np.concatenate([f0,f1],axis=1)
        frames.append(fs)
        if step==0:
            frames.append(fs)            
im = Image.open(f"files/ch12/conn_step{num_step}plt.png")
f0=np.asarray(im)
im = Image.open("files/ch12/conn_stage0step1.png")
f1=np.asarray(im)
fs = np.concatenate([f0,f1],axis=1)
frames.append(fs)
frames.append(fs)

imageio.mimsave('files/ch12/conn_DL_steps.gif', frames, fps=2)

  ax.set_xlim(-3,-3)
  ax.set_ylim(-3,-3)


If you open the gif file, you'll see the following animation:
<img src="https://gattonweb.uky.edu/faculty/lium/ml/conn_DL_steps.gif"/>