# Chapter 11: Deep Learning Game Strategies: Tic Tac Toe

In this chapter, you’ll combine what you have learned in Chapters 5 to 9 to design deep learning game strategies for the Tic Tac Toe game. You'll first create a game environment for Tic Tac Toe with all the features and methods of a typical OpenAI Gym game environment. The game environment also has a graphical interface. 

You'll use similated games as input data to feed into a deep neural network. After the model is trained, you'll use it to play games. At each step of the game, you'll look at all next moves. The model predicts the probability of winning the game with each hypothetical move. You'll pick the move with the highest probability of winning the game.

Finally, you'll animate the decision making process. You'll use the deep learning game strategy to play a full game. At each step, the animation will show the game board on the left, and the probability of winning for each next move on the right. The best move will be highlighted, as follows:
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_DL_steps.gif"/>

***
$\mathbf{\text{Create a subfolder for files in Chapter 11}}$<br>
***
We'll put all files in Chapter 11 in a subfolder /files/ch11. Run the code in the cell below to create the subfolder.

***

In [1]:
import os

os.makedirs("files/ch11", exist_ok=True)

## 1. Create the Tic Tac Toe Game Environment
We'll create a Tic Tac Toe game environment, using the ***turtle*** library to draw game boards. We’ll create all the features and methods that a typical OpenAI Gym environment has. 

### 1.1. Use A Python Class to Represent the Environment
We’ll create a Python class to represent the Tic Tac Toe game environment. The class will have various attributes, variables, and methods to replicate those in a typical OpenAI Gym game environment. 

#### Attributes
Specifically, our self-made Tic Tac Toe game environment will have the following attributes:
 
*	action_space: an attribute that provides the space of all actions that can be taken by the agent. The action space will have nine values, 1 to 9. We use 1 to 9 instead of 0 to 8 to avoid confusion.
*	observation_space: an attribute that provides the list of all possible states in the environment. We'll use a numpy array with 9 values to represent the nine cell on a game board.
*	state: an attribute indicating which state the agent is currently in. Each of the nine cells can take values -1 (occupied by player O), 0 (empty), or 1 (occupied by player X).
*	action: an attribute indicating the action taken by the agent. The action is a number between 1 and 9.
*	reward: is an attribute indicating the reward to the agent because of the action taken by the agent. The reward is 0 in each step, unless a player has won the game, in which case the winner has a reward of 1 and the lose a reward of -1. 
*	done: an attribute indicating whether the game has ended. This happens when one player wins or if the game is tied.
*	info: an attribute that provides information about the game. We'll set it as an empty string "". 

#### Methods
our self-made Tic Tac Toe game environment will have a few methods as well:
 
*	reset() is a method to set the game environment to the initial (that is, the starting) state. All cells on the board will be empty.
*	render() is a method showing the current state of the environment graphically.
*	step() is a method that returns the new state, the reward, the value of *done* variable, and the varibale *info* based on the action taken by the agent.
*	sample() is a method to randomly choose an action from all the action space.
*	close() is a method to end the game environment, including stop displaying the graph of the current state of the game board.

### 1.2. Create A Local Module for the Tic Tac Toe Game
We'll create a local module for the Tic Tac Toe game and place it inside the local package for this book: the package ***utils*** that we have created in Chapter 10.

Now let's code in a self-made Tic Tac Toe game environment using a Python class. Save the code in the cell below as *TicTacToe_env.py* in the folder *utils* you created in Chapter 10. Alternatively, you can download it from my GitHub repository. 

In [None]:
import turtle as t
from random import choice
import numpy as np
import time

# Define an action_space helper class
class action_space:
    def __init__(self, n):
        self.n = n
    def sample(self):
        num = np.random.choice(range(self.n))
        # covert to 1 to 9 in string format 
        action = str(1+num)
        return action
    
# Define an obervation_space helper class    
class observation_space:
    def __init__(self, n):
        self.shape = (n,)

class ttt():
    def __init__(self): 
        # use the helper action_space class
        self.action_space=action_space(9)
        # use the helper observation_space class
        self.observation_space=observation_space(9)
        self.info=""  
        self.showboard=False          
        # Create a dictionary to map cell number to coordinates
        self.cellcenter = {'1':(-200,-200), '2':(0,-200), '3':(200,-200),
                    '4':(-200,0), '5':(0,0), '6':(200,0),
                    '7':(-200,200), '8':(0,200), '9':(200,200)} 

    def reset(self):  
        # The X player moves first
        self.turn = "X"
        # Count how many rounds played
        self.rounds = 1
        # Create a list of valid moves
        self.validinputs = list(self.cellcenter.keys())
        # Create a dictionary of moves made by each player
        self.occupied = {"X":[],"O":[]}
        # Tracking the state
        self.state=np.array([0,0,0,0,0,0,0,0,0])
        self.done=False
        self.reward=0     
        return self.state        
        
    # step() function: place piece on board and update state
    def step(self, inp):
        # Add the move to the occupied list 
        self.occupied[self.turn].append(inp)
        # update the state: X is 1 and O is -1
        self.state[int(inp)-1]=2*(self.turn=="X")-1
        # Disallow the move in future rounds
        self.validinputs.remove(inp) 
        # check if the player has won the game
        if self.win_game() == True:
            self.done=True
            # reward is 1 if X won; -1 if O won
            self.reward=2*(self.turn=="X")-1
            self.validinputs=[]
        # If all cellls are occupied and no winner, it's a tie
        elif self.rounds == 9:
            self.done=True
            self.reward=0
            self.validinputs=[]
        else:
            # Counting rounds
            self.rounds += 1
            # Give the turn to the other player
            if self.turn == "X":
                self.turn = "O"
            else:
                self.turn = "X"             
        return self.state, self.reward, self.done, self.info
                     
    # Determine if a player has won the game
    def win_game(self):
        win = False
        if '1' in self.occupied[self.turn] and '2' in self.occupied[self.turn] and '3' in self.occupied[self.turn]:
            win = True
        if '4' in self.occupied[self.turn] and '5' in self.occupied[self.turn] and '6' in self.occupied[self.turn]:
            win = True
        if '7' in self.occupied[self.turn] and '8' in self.occupied[self.turn] and '9' in self.occupied[self.turn]:
            win = True
        if '1' in self.occupied[self.turn] and '4' in self.occupied[self.turn] and '7' in self.occupied[self.turn]:
            win = True
        if '2' in self.occupied[self.turn] and '5' in self.occupied[self.turn] and '8' in self.occupied[self.turn]:
            win = True
        if '3' in self.occupied[self.turn] and '6' in self.occupied[self.turn] and '9' in self.occupied[self.turn]:
            win = True
        if '1' in self.occupied[self.turn] and '5' in self.occupied[self.turn] and '9' in self.occupied[self.turn]:
            win = True
        if '3' in self.occupied[self.turn] and '5' in self.occupied[self.turn] and '7' in self.occupied[self.turn]:
            win = True
        return win

    def display_board(self):
        # Set up the screen
        try:
            t.setup(630,630,10,70) 
        except t.Terminator:
            t.setup(630,630,10,70)   
        t.tracer(False)
        t.hideturtle()
        t.bgcolor("azure")
        t.title("Tic-Tac-Toe in Turtle Graphics")
        # Draw horizontal lines and vertical lines to form grid
        t.pensize(5)
        t.color('blue')
        for i in (-300,-100,100,300):  
            t.up()
            t.goto(i,-300)
            t.down()
            t.goto(i,300)
            t.up()
            t.goto(-300,i)
            t.down()
            t.goto(300,i)
            t.up()
        # Go to the center of each cell, write down the cell number
        t.color('red')
        for cell, center in list(self.cellcenter.items()):
            t.goto(center[0]-80,center[1]-80)
            t.write(cell,font = ('Arial',20,'normal'))

    def render(self):
        if self.showboard==False:
            self.display_board()
            self.showboard=True   
        # Place X or O in occupied cells
        t.color('light gray')
        if len(self.occupied["X"])>0:
            for x in self.occupied["X"]:
                t.up()
                t.goto(self.cellcenter[x][0]-60,self.cellcenter[x][1]-60)
                t.down()               
                t.goto(self.cellcenter[x][0]+60,self.cellcenter[x][1]+60)
                t.up()
                t.goto(self.cellcenter[x][0]-60,self.cellcenter[x][1]+60)
                t.down()               
                t.goto(self.cellcenter[x][0]+60,self.cellcenter[x][1]-60)
                t.up()    
                t.update()
        if len(self.occupied["O"])>0:                
            for o in self.occupied["O"]:
                t.up()
                t.goto(self.cellcenter[o])
                t.dot(160,"light gray") 
                t.update()

    def close(self):
        time.sleep(1)
        try:
            t.bye()
        except t.Terminator:
            print('exit turtle')

If you run the above cell, nothing will happen. The class simply creates a game environment. We need to initiate the game environment and start playing using Python programs, just as you do with an OpenAI Gym game environment. We'll do that in the next subsection.

### 1.3. Verify the Custom-Made Game Environment
Next, we'll check the attributes and methods of the self-made game environment and make sure it has all the elements that are provided by a typical OpenAI Gym game environment. 

First we'll initiate the game environment and show the game board.

In [1]:
from utils.TicTacToe_env import ttt

env = ttt()
env.reset()                    
env.render()

You should see a separate turtle window, with a game board as follows: 
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_start.png" />

If you want to close the game board window, use the *close()* method, like so:

In [2]:
env.close()

Next, we'll check the attributes of the environment such as the observation space and action space. 

In [4]:
env=ttt()
# check the action space
number_actions = env.action_space.n
print("the number of possible actions are", number_actions)
# sample the action space ten times
print("the following are ten sample actions")
for i in range(10):
   print(env.action_space.sample())
# check the shape of the observation space
print("the shape of the observation space is", env.observation_space.shape)

the number of possible actions are 9
the following are ten sample actions
9
1
5
5
7
8
6
3
8
5
the shape of the observation space is (9,)


The meanings of the actions in this game as follows
* 1: Placing a game piece in cell 1
* 2: Placing a game piece in cell 2
* ...
* 9: Placing a game piece in cell 9
The state space is a vector with 9 values: 
* 0 means it's empty; 
* -1 means it's occupied by player O; 
* 1 means it's occupied by player X.

## 2. Play Games in the Tic Tac Toe Environment
Next, we'll play games in the custom-made environment. You'll learn to save each game board as a picture. Finally, you'll record all game boards in a full game, and convert them into an animation.

### 2.1. Play a full game

Here we'll play a full game, by randomly choosing an action from the action space each step.

In [4]:
import time
import random
from utils.TicTacToe_env import ttt

# Initiate the game environment
env = ttt()
state=env.reset()   
env.render()
# Play a full game manually
while True:
    print(f"the current state is state={state}")    
    action = random.choice(env.validinputs)
    time.sleep(1)
    print(f"Player X has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        if reward==1:
            print(f"Player X has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is state={new_state}")    
    action = random.choice(env.validinputs)
    time.sleep(1)
    print(f"Player O has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"Player O has won!") 
        break
    else: 
        # play next round
        state=new_new_state
    
env.close()      

the current state is state=[0 0 0 0 0 0 0 0 0]
Player X has chosen action=4
the current state is state=[0 0 0 1 0 0 0 0 0]
Player O has chosen action=7
the current state is state=[ 0  0  0  1  0  0 -1  0  0]
Player X has chosen action=5
the current state is state=[ 0  0  0  1  1  0 -1  0  0]
Player O has chosen action=8
the current state is state=[ 0  0  0  1  1  0 -1 -1  0]
Player X has chosen action=9
the current state is state=[ 0  0  0  1  1  0 -1 -1  1]
Player O has chosen action=6
the current state is state=[ 0  0  0  1  1 -1 -1 -1  1]
Player X has chosen action=2
the current state is state=[ 0  1  0  1  1 -1 -1 -1  1]
Player O has chosen action=3
the current state is state=[ 0  1 -1  1  1 -1 -1 -1  1]
Player X has chosen action=1
Player X has won!


Note that the outcome is different each time you run it because the actions are randomly chosen.

### 2.2. Play the Game Manually
Next, you’ll learn how to manually interact with the Tic Tac Toe game. You'll use the key board to enter a number between 1 and 9. The following lines of code show you how.

In [7]:
# Initiate the game environment
print('''
enter 0 for left, 1 for down
2 for right, and 3 for up
''')
env=ttt()
state=env.reset()   
env.render()

print("enter a move in the form of 1 to 9")

# Play a full game manually
while True:
    print(f"the current state is state={state}")    
    action = input("Player X, what's your move?\n")
    print(f"Player X has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        if reward==1:
            print(f"Player X has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is state={new_state}")    
    action = input("Player O, what's your move?\n")
    print(f"Player O has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"Player O has won!") 
        break
    else: 
        # play next round
        state=new_new_state
    
env.close()


enter 0 for left, 1 for down
2 for right, and 3 for up

enter a move in the form of 1 to 9
the current state is state=[0 0 0 0 0 0 0 0 0]
Player X, what's your move?
5
Player X has chosen action=5
the current state is state=[0 0 0 0 1 0 0 0 0]
Player O, what's your move?
7
Player O has chosen action=7
the current state is state=[ 0  0  0  0  1  0 -1  0  0]
Player X, what's your move?
9
Player X has chosen action=9
the current state is state=[ 0  0  0  0  1  0 -1  0  1]
Player O, what's your move?
1
Player O has chosen action=1
the current state is state=[-1  0  0  0  1  0 -1  0  1]
Player X, what's your move?
4
Player X has chosen action=4
the current state is state=[-1  0  0  1  1  0 -1  0  1]
Player O, what's your move?
6
Player O has chosen action=6
the current state is state=[-1  0  0  1  1 -1 -1  0  1]
Player X, what's your move?
2
Player X has chosen action=2
the current state is state=[-1  1  0  1  1 -1 -1  0  1]
Player O, what's your move?
8
Player O has chosen action=8
the cu

## 3. Train the Deep Learning Game Stratey
In this section, you’ll learn how to use deep neural network to train intelligent game strategies for Tic Tac Toe. In particular, you’ll use the convolutional neural network that you used in image classification to train the game strategy. By treating the game board as a two-dimensional graph instead of a one-dimensional vector, you’ll greatly improve the intelligence of your game strategies.

You’ll learn how to prepare data to train the model, how to interpret the prediction from the model. How to use the prediction to play games, and how to check the efficacy of your strategies.

### 3.1. A Summary of the Deep Learning Game Strategy
Here is a summary of what we’ll do to train the game strategy:

1.	We’ll let two computer players automatically play a game with random moves, and record the whole game history. The game history will contain all the game board positions from the very first move to the very last move.
2.	We then associate each board position with a game outcome (win, tie, or lose). The game board position is similar to features X in our image classification problem, and the outcome is similar to labels y in our classification problem.
3.	We’ll simulate 1,000,000 games. By using the histories of the games and the corresponding outcomes as Xs and ys, we feed the data into a Deep Neural Networks model. After the training is done, we have a trained model.
4.	We can now use the trained model to play a game. At each move of the game, we look at all possible next moves, and feed the hypothetical game board into the pretained model. The model will tell you the probabilities of win, lose, and tie.
5.	You select the move that the model predicts with the highest chance of winning.


### 3.2. Simulate Games
You’ll learn how to generate data to train the DNN. The logic is as follows: you’ll generate 1,000,000 games in which both players use random moves. You’ll then record the board positions of all intermediate steps and the eventual outcomes of each board position (win, lose, or tie). 

First, let's simulate one game. The code in the cell below accomplishes that.

In [11]:
from utils.TicTacToe_env import ttt
import time
import random
import numpy as np
from pprint import pprint

# Initiate the game environment
env=ttt()

# Define the one_game() function
def one_game():
    history = []
    state=env.reset()   
    while True:   
        action = random.choice(env.validinputs)  
        new_state, reward, done, info = env.step(action)
        history.append(np.array(new_state).reshape(3,3))
        if done:
            break
    return history, reward

# Simulate one game and print out results
history, outcome = one_game()
pprint(history)
pprint(outcome)        

[array([[0, 0, 0],
       [0, 0, 0],
       [1, 0, 0]]),
 array([[ 0, -1,  0],
       [ 0,  0,  0],
       [ 1,  0,  0]]),
 array([[ 0, -1,  0],
       [ 1,  0,  0],
       [ 1,  0,  0]]),
 array([[ 0, -1, -1],
       [ 1,  0,  0],
       [ 1,  0,  0]]),
 array([[ 0, -1, -1],
       [ 1,  0,  0],
       [ 1,  1,  0]]),
 array([[ 0, -1, -1],
       [ 1,  0, -1],
       [ 1,  1,  0]]),
 array([[ 1, -1, -1],
       [ 1,  0, -1],
       [ 1,  1,  0]])]
1


Note here we convert the game board to a 3 by 3 array so it's easy for you to see the positions of the game pieces. 

Now let's simulate 1,000,000 games and save the data.

In [17]:
# simulate the game 1000000 times and record all games
results = []        
for x in range(100000):
    history, outcome = one_game()
    # Note here I associate each board with the game outcome
    for board in history:
        results.append((outcome, board))    


Now let's save the data on your computer for later use

In [18]:
import pickle
# save the simulation data on your computer
with open('files/ch11/games_ttt10K.p', 'wb') as fp:
    pickle.dump(results,fp)
# read the data and print out the first 10 observations       
with open('files/ch11/games_ttt10K.p', 'rb') as fp:
    games = pickle.load(fp)
pprint(games[:10])

[(-1, array([[0, 0, 0],
       [0, 0, 1],
       [0, 0, 0]])),
 (-1, array([[ 0,  0,  0],
       [ 0, -1,  1],
       [ 0,  0,  0]])),
 (-1, array([[ 0,  0,  0],
       [ 0, -1,  1],
       [ 0,  1,  0]])),
 (-1, array([[ 0,  0,  0],
       [ 0, -1,  1],
       [-1,  1,  0]])),
 (-1, array([[ 0,  0,  1],
       [ 0, -1,  1],
       [-1,  1,  0]])),
 (-1, array([[ 0,  0,  1],
       [ 0, -1,  1],
       [-1,  1, -1]])),
 (-1, array([[ 0,  1,  1],
       [ 0, -1,  1],
       [-1,  1, -1]])),
 (-1, array([[-1,  1,  1],
       [ 0, -1,  1],
       [-1,  1, -1]])),
 (0, array([[1, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])),
 (0, array([[ 1,  0, -1],
       [ 0,  0,  0],
       [ 0,  0,  0]]))]


The first six observations are from teh first game in which player O won by occupying cells 7, 8, and 9. Therefore you see -1 as the first element of the first six observations. The data are stored correctly. 

We have the data we need. You’ll learn how to train the model next.

### 3.3. Train Your Tic Tac Toe Game Strategy Using Deep Neural Network
The following neural network trains the game strategy using the data you just created.


In [19]:
from random import choice
import pickle
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense, Conv2D, Flatten
from tensorflow.keras.models import Sequential
import numpy as np
      
with open('files/ch11/games_ttt10K.p', 'rb') as fp:
    tttgames=pickle.load(fp)

boards = []
outcomes = []
for game in tttgames:
    boards.append(game[1])
    outcomes.append(game[0])

X = np.array(boards).reshape((-1, 3, 3, 1))
# one_hot encoder, three outcomes: -1, 0, and 1
y = to_categorical(outcomes, num_classes=3)

model = Sequential()
model.add(Conv2D(filters=128, 
kernel_size=(3,3),padding="same",activation="relu",input_shape=(3,3,1)))
model.add(Flatten())
model.add(Dense(units=64, activation="relu"))
model.add(Dense(units=64, activation="relu"))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy',
                   optimizer='adam', 
                   metrics=['accuracy'])
  
# Train the model for 100 epochs
model.fit(X, y, epochs=100, verbose=1)
model.save('files/ch11/trained_ttt10K.h5')

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

You can simulate a larger number of games, but it will take longer to train.

## 4. Use the Trained Model to Play Games
Next, we’ll use the strategy to play a game. 

The player X will use the best move from the trained model. Player O will randomly select a move. 

### 4.1. Best Move Based on the Trained Deep Neural Network
First, we'll define a *best_move()* function for player X. The function takes a board position as its first argument, and a list of possible next moves as its second argument. 

The function will go over each move hypothetically, and use the trained deep neural network to predict the probability of player X winning the game. The function returns the move with the highest chance of winning.

We define a best_move() function for the computer to find best moves. 
What the computer does is as follows:
1.	Look at the current board.
2.	Look at all possible next moves, and add each move to the current board to form a hypothetical board
3.	Use the pretained model to predict the chance of winning with the hypothetical board
4.	Choose the move that produces the highest chance of winning. 

In [25]:
def best_move(board, valids):
    # if there is only one valid move, take it
    if len(valids)==1:
        return valids[0]
    # Set the initial value of bestoutcome        
    bestoutcome = -1;
    bestmove=None    
    #go through all possible moves hypothetically to predict outcome
    for move in valids:
        tooccupy=deepcopy(board).reshape(9,)
        tooccupy[int(move)-1]=1
        prediction=reload.predict(np.array(tooccupy).reshape(-1, 3,3,1), verbose=0)
        win_lose_dif=prediction[0][1]-prediction[0][2]
        if win_lose_dif>bestoutcome:
            # Update the bestoutcome
            bestoutcome = win_lose_dif
            # Update the best move
            bestmove = move
    return bestmove

Now let's use the *best_move()* function to choose moves for player X and play a game.

In [26]:
from utils.TicTacToe_env import ttt
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf

reload = tf.keras.models.load_model('files/ch11/trained_ttt10K.h5')


# Initiate the game environment
env=ttt()
state=env.reset()   
env.render()

print("enter a move in the form of 1 to 9")

# Play a full game manually
while True:
    print(f"the current state is state={state}") 
    # Use the best_move() function to select the next move
    action = best_move(state, env.validinputs)
    print(f"Player X has chosen action={action}")    
    new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        if reward==1:
            print(f"Player X has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is state={new_state}")    
    action = random.choice(env.validinputs)
    print(f"Player O has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    if done:
        print(f"Player O has won!") 
        break
    else: 
        # play next round
        state=new_new_state
    

enter a move in the form of 1 to 9
the current state is state=[0 0 0 0 0 0 0 0 0]
Player X has chosen action=5
the current state is state=[0 0 0 0 1 0 0 0 0]
Player O has chosen action=7
the current state is state=[ 0  0  0  0  1  0 -1  0  0]
Player X has chosen action=6
the current state is state=[ 0  0  0  0  1  1 -1  0  0]
Player O has chosen action=1
the current state is state=[-1  0  0  0  1  1 -1  0  0]
Player X has chosen action=4
Player X has won!


The computer player will look at each possible next move, and add that move to the current board to form a hypothetical board, tooccupy. We reshape the hypothetical board into a (3, 3, 1) shape to feed into the model to make predictions. 
There prediction will have three values: the probability of tying, player 1 winning, and player 2 winning. The computer will choose the move with the highest probability of winning the game. 
If we assume the computer player plays second, we simply use the third probability instead of the second probability in the prediction. 
If you play a game with the computer, you’ll find it’s impossible to win.
Here is one example of the eventual outcome:


Player X uses the best moves recommended by the trained model and wins the game by occupying celss 4, 5, and 6, as shown in this picture.
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_win_screen.png" /> 

### 4.2. Test the Efficacy of the DNN Model
Next, we’ll test how often the DNN trained game strategy wins against a player who makes random moves. 
The following script does that:

In [32]:

from utils.TicTacToe_env import ttt
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf

reload = tf.keras.models.load_model('files/ch11/trained_ttt10K.h5')

def best_move(board, valids):
    # if there is only one valid move, take it
    if len(valids)==1:
        return valids[0]
    # Set the initial value of bestoutcome        
    bestoutcome = -1;
    bestmove=None    
    #go through all possible moves hypothetically to predict outcome
    for move in valids:
        tooccupy=deepcopy(board).reshape(9,)
        tooccupy[int(move)-1]=1
        prediction=reload.predict(np.array(tooccupy).reshape(-1, 3,3,1), verbose=0)
        win_lose_dif=prediction[0][1]-prediction[0][2]
        if win_lose_dif>bestoutcome:
            # Update the bestoutcome
            bestoutcome = win_lose_dif
            # Update the best move
            bestmove = move
    return bestmove


env=ttt()

def test_one_game():
    state=env.reset()   
    while True:
        # Use the best_move() function to select the next move
        action = best_move(state, env.validinputs)   
        new_state, reward, done, info = env.step(action)
        if done:
            break 
        action = random.choice(env.validinputs)   
        new_new_state, reward, done, info = env.step(action)
        if done:
            break
        else: 
            # play next round
            state=new_new_state
    return reward

#repeat the game 1000 times and record all game outcomes
results=[]        
for x in range(1000):
    result=test_one_game()
    results.append(result)    

#print out the number of winning games
print("the number of winning games is", results.count(1))

#print out the number of tying games
print("the number of tying games is", results.count(0))

#print out the number of losing games
print("the number of losing games is", results.count(-1))                 

the number of winning games is 984
the number of tying games is 8
the number of losing games is 8


The player wins the game with the shortest possible path. So the deep learning game strategy works in the self-made environment as well!!!

## 5. Animate the Deep Learning Process
In this section, we'll create an animation to show how the agent makes a decision by getting the best move from the trained deep neural network at each step.

### 5.1. Print Out Probabilities of Winning for Each Next Move
In each stage of the game, we'll first draw the game board on the left of the screen. The player X will look at all possible next moves and use the trained deep neural network to predict the probability of winning for each hypothetical next move. We'll draw the probabilities on right. Finally, we'll highlight the action with the highest probability of winng. The action is player X's next move. We'll repeat this step by step until the game ends. 

This animation will let us look under the hood and understand how deep learning can help us design intelligent game strategies.

The next script will play a full game and record the game board and the winning probabilities in each step of the game.

In [2]:
from utils.TicTacToe_env import ttt
import time
import random
from copy import deepcopy
import numpy as np
import tensorflow as tf
import turtle as t

reload = tf.keras.models.load_model('files/ch11/trained_ttt100K.h5')

# Initiate the game environment
env=ttt()
state=env.reset()  
step=0
try:
    ts=t.getscreen() 
except t.Terminator:
    ts=t.getscreen()
t.hideturtle()
env.render()
ts.getcanvas().postscript(file=f"files/ch11/ttt_step{step}.ps")

# Create a list to record game history
history=[]

def best_move(board, valids):
    # if there is only one valid move, take it
    if len(valids)==1:
        return valids[0]
    # Set the initial value of bestoutcome        
    bestoutcome=-1;
    bestmove=None  
    # record winning probabilities for all hypothetical moves
    p_wins={}
    #go through all possible moves hypothetically to predict outcome
    for move in valids:
        tooccupy=deepcopy(board).reshape(9,)
        tooccupy[int(move)-1]=1
        prediction=reload.predict(np.array(tooccupy).reshape(-1, 3,3,1),verbose=0)
        p_win=prediction[0][1]
        p_wins[move]=p_win
        if p_win>bestoutcome:
            # Update the bestoutcome
            bestoutcome = p_win
            # Update the best move
            bestmove = move
    return bestmove, p_wins

# Play a full game 
while True:
    print(f"the current state is \n{state}") 
    bestmove,p_wins=best_move(state, env.validinputs)
    action=bestmove
    print(f"Player X has chosen action={action}")  
    old_state=deepcopy(state)
    new_state, reward, done, info = env.step(action)
    history.append([old_state,p_wins,action,deepcopy(new_state),done])
    env.render()
    step += 1      
    ts.getcanvas().postscript(file=f"files/ch11/ttt_step{step}.ps")
    if done:
        if reward==1:
            print(f"Player X has won!") 
        else:
            print(f"It's a tie!") 
        break
    print(f"the current state is state={new_state}")    
    action = random.choice(env.validinputs)
    print(f"Player O has chosen action={action}")    
    new_new_state, reward, done, info = env.step(action)
    env.render()
    step += 1      
    ts.getcanvas().postscript(file=f"files/ch11/ttt_step{step}.ps")
    if done:
        print(f"Player O has won!") 
        break
    else: 
        # play next round
        state=new_new_state
env.close()    

the current state is 
[0 0 0 0 0 0 0 0 0]
Player X has chosen action=5
the current state is state=[0 0 0 0 1 0 0 0 0]
Player O has chosen action=8
the current state is 
[ 0  0  0  0  1  0  0 -1  0]
Player X has chosen action=1
the current state is state=[ 1  0  0  0  1  0  0 -1  0]
Player O has chosen action=4
the current state is 
[ 1  0  0 -1  1  0  0 -1  0]
Player X has chosen action=9
Player X has won!


Before making the first move, the Player X has nine hypothetical next moves: 1 to 9. The trained neural network tells us what's the probabiltiy of winning the game for each hypothetical move. We can print out the nine probabilities by using the code below.

In [3]:
p_wins_step0=history[0][1]
for key, value in p_wins_step0.items():
    print(f"If Player X chooses action {key}, the probability of winning is {value:.4f}.")

If Player X chooses action 1, the probability of winning is 0.6460.
If Player X chooses action 2, the probability of winning is 0.5623.
If Player X chooses action 3, the probability of winning is 0.6474.
If Player X chooses action 4, the probability of winning is 0.5680.
If Player X chooses action 5, the probability of winning is 0.7345.
If Player X chooses action 6, the probability of winning is 0.5654.
If Player X chooses action 7, the probability of winning is 0.6453.
If Player X chooses action 8, the probability of winning is 0.5629.
If Player X chooses action 9, the probability of winning is 0.6471.


The above results show that the probability of Player X winning the game is the highest, at 73.45%, if action 5 is taken. That's why Player X occupies cell 5 in the first move. 

When making the second move, Player X faces seven choices. We can also print out the probability of winning for each move as follows:

In [4]:
p_wins_step1=history[1][1]
for key, value in p_wins_step1.items():
    print(f"If Player X chooses action {key}, the probability of winning is {value:.4f}.")

If Player X chooses action 1, the probability of winning is 0.8206.
If Player X chooses action 2, the probability of winning is 0.6159.
If Player X chooses action 3, the probability of winning is 0.7762.
If Player X chooses action 4, the probability of winning is 0.7598.
If Player X chooses action 6, the probability of winning is 0.7974.
If Player X chooses action 7, the probability of winning is 0.7929.
If Player X chooses action 9, the probability of winning is 0.7860.


The above results show that the probability of Player X winning the game is the highest, at 82.06%, if action 1 is taken. That's why Player X occupies cell 1 in the second move. 

You can also print out the the probability of Player X winning the game when making the third move, but I'll leave that for you to finish.

Let's save the game history data for later use. Run the code in the following cell:

In [11]:
import pickle

# save the game history on your computer
with open('files/ch11/ttt_game_history.p','wb') as fp:
    pickle.dump(history,fp)

### 5.2. Animate the Whole Game
Next, you'll combine the pictures created in the last subsection into an animation. As a result, you'll see the game board step by step for the whole game.

In [8]:
import imageio
from PIL import Image

frames=[]
for i in range(step+1):
    im = Image.open(f"files/ch11/ttt_step{i}.ps")
    frame=np.asarray(im)
    frames.append(frame) 
imageio.mimsave("files/ch11/ttt_steps.gif", frames, fps=1) 

If you open the file ttt_steps.gif, you'll see the following: 
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_steps.gif"/>

The animation shows the game board at each stage of the game. 

### 5.3. Animate the Decision Making
Next, we'll animate the decision making process of Player X in each stage of the game. We'll draw the probabilities of Player X winning the game for each hypothetical move. We'll highlight the move with the highest probability of winning the game. We'll animate this step by step until the game ends. 

In [12]:
import tensorflow as tf
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
import pickle
import os
from PIL import Image
import numpy as np

# load up the history data
history = pickle.load(open('files/ch11/ttt_game_history.p', 'rb'))
# remember the best moves 
bests = [5, 1, 9]
# Generate pictures
for stage in range(3):
    fig = plt.figure(figsize=(10,10), dpi=200)
    ax = fig.add_subplot(111) 
    ax.set_xlim(0,8)
    ax.set_ylim(-4.5, 3.5)
    #plt.grid()
    plt.axis("off")
    plt.savefig(f"files/ch11/ttt_stage{stage*2}step1.png") 
    xys = [[(4,-4.1),(2,0)],
       [(4,-3.2),(2,0)],           
       [(4,-2.3),(2,0)],           
       [(4,-1.4),(2,0)],
       [(4,-0.5),(2,0)],
       [(4,0.4),(2,0)],
       [(4,1.3),(2,0)],
       [(4,2.2),(2,0)],
       [(4,3.1),(2,0)]]
    for xy in xys:
        ax.annotate("",xy=xy[0],xytext=xy[1],
        arrowprops=dict(arrowstyle = '->', color = 'g', linewidth = 2))  
    # add rectangle to plot
    ax.add_patch(Rectangle((0,-0.6), 2, 1.3,
                     facecolor = 'b',alpha=0.1))
    plt.text(0.2,-0.5,"Deep\nNeural\nNetwork",fontsize=20)        
    for m in range(9):
        plt.text(4.1, 3.1-0.9*m, f"Cell {m+1}, \
        Pr(win)={history[stage][1].get(str(m+1),0):.4f}", fontsize=20, color="r")  
   

    plt.savefig(f"files/ch11/ttt_stage{stage*2}step2.png") 
    
    # highlight the best action
    ax.add_patch(Rectangle((4,3.85-bests[stage]*0.9),
                           3.5, 0.5,facecolor = 'b',alpha=0.5))     
    plt.savefig(f"files/ch11/ttt_stage{stage*2}step3.png")     
    plt.close(fig)

The above script highlights the decision making proces of Player X. For example, if you open the file ttt_stage4step3.png, you'll see the following picture.
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_stage4step3.png" /> It shows the probabilities of Player X winning the game with each hypothetical move. In particular, the proability is 100% if Player X chooses Cell 9. The cell is highlighted in blue, and that is also the move made by Player X as a result. 

Next, we'll combine the pictures into an animation to show the decision-making process of Player X.

In [14]:
from PIL import Image
import imageio
import numpy as np

frames=[]

for stage in [0, 2, 4]:
    for step in [1,2,3]:
        im = Image.open(f"files/ch11/ttt_stage{stage}step{step}.png")
        f1=np.asarray(im)
        frames.append(f1)  
imageio.mimsave('files/ch11/ttt_DL_probs.gif', frames, fps=2)

If you open the file ttt_DL_probs.gif, you'll see the animation as follows.
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_DL_probs.gif" /> 

### 5.4. Animate Game Board Positions and the Decision Making
Next, we'll combine the game board positions and the decision making process of Player X in each stage of the game. On the left of the screen, we'll draw the game board. On the right of the screen, we'll draw the probabilities of Player X winning the game for each hypothetical move. We'll animate this step by step until the game ends. 

In [16]:
for i in range(6):
    im = Image.open(f"files/ch11/ttt_step{i}.ps")
    fig, ax=plt.subplots(figsize=(10,10), dpi=200)
    newax = fig.add_axes([0,0,1,1])
    newax.imshow(im)
    newax.axis('off')
    ax.set_xlim(-5,-5)
    ax.set_ylim(-5,-5)
    plt.axis("off")
    #plt.grid()
    plt.savefig(f"files/ch11/ttt_step{i}plt.png")
    plt.close(fig)

frames=[]

for stage in [0, 2, 4]:
    for step in [1,2,3]:
        im = Image.open(f"files/ch11/ttt_step{stage}plt.png")
        f0=np.asarray(im)
        im = Image.open(f"files/ch11/ttt_stage{stage}step{step}.png")
        f1=np.asarray(im)
        fs = np.concatenate([f0,f1],axis=1)
        frames.append(fs)
        if step==0:
            frames.append(fs)            
im = Image.open("files/ch11/ttt_step5plt.png")
f0=np.asarray(im)
im = Image.open("files/ch11/ttt_stage4step1.png")
f1=np.asarray(im)
fs = np.concatenate([f0,f1],axis=1)
frames.append(fs)
frames.append(fs)

imageio.mimsave('files/ch11/ttt_DL_steps.gif', frames, fps=2)

  ax.set_xlim(-5,-5)
  ax.set_ylim(-5,-5)


If you open the gif file, you'll see the following animation:
<img src="https://gattonweb.uky.edu/faculty/lium/ml/ttt_DL_steps.gif"/>