#  Debug neural MCTS

---

Author: S. Menary [sbmenary@gmail.com]

Date  : 2023-01-15, last edit 2023-01-19

Brief : Debug behaviour of bot using a neural network bot with Monte Carlo Tree Search (MCTS)

---

## Imports

---

In [1]:
##=====================================##
##  All imports should be placed here  ##
##=====================================##

##  Python core libs
import pickle, sys, time

##  PyPI libs
import numpy as np
from matplotlib import pyplot as plt

##  Local packages
from connect4.utils    import DebugLevel
from connect4.game     import BinaryPlayer, GameBoard, GameResult
from connect4.MCTS     import Node_NeuralMCTS, PolicyStrategy
from connect4.bot      import Bot_NeuralMCTS, Bot_VanillaMCTS
from connect4.parallel import generate_from_processes
from connect4.neural   import load_model
from connect4.methods  import get_training_data_from_bot_game


In [2]:
##=====================================##
##  Print version for reproducibility  ##
##=====================================##

print(f"{'Python'    .rjust(12)} version is {sys.version}")
print(f"{'Numpy'     .rjust(12)} version is {np.__version__}")


      Python version is 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
       Numpy version is 1.23.2


In [3]:
##============================##
##  Set global config values  ##
##============================##

model_idx = 6
model_name = f"../models/.neural_model_v{model_idx}.h5"

print(f"Using model: {model_name}")


Using model: ../models/.neural_model_v6.h5


##  Test neural model MCTS

- Test that we can propagate values and make decisions correctly with neural MCTS
- Find a good value for the duration parameter, (smallest value that allows us to make stable posteriors)
- Cannot run these cells when doing regular run, since tf cannot be used in main process before spawning children


In [4]:
##============================##
##  Perform a few MCTS steps  ##
##============================##

##  Create game board
game_board = GameBoard()
print(f"\nInitial game board:\n{game_board}")

##  Create a root node at the current game state
model      = load_model(model_name)
root_node  = Node_NeuralMCTS(game_board, params=[model, 1.], label="ROOT")

##  Print the initial value tree (should be a ROOT node with no children)
print("Initial tree:")
print(root_node.tree_summary())
print()

##  Perform several MCTS steps with a HIGH debug level
root_node.multi_step_MCTS(num_steps=20, max_sim_steps=-1, discount=0.99, debug_lvl=DebugLevel.MEDIUM)

##  Print the updated value tree 
print("Updated tree:")
print(root_node.tree_summary())
print()



Initial game board:
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Initial tree:
> [0: ROOT] N=0, T=0.000, E=nan, Q=-inf
     > None
     > None
     > None
     > None
     > None
     > None
     > None

Running MCTS step 0
Select unvisited action X:6
Simulation using prior value 0.1297
Node X:6 with parent=X, N=0, T=0.00 receiving score 0.13
Node ROOT with parent=NONE, N=0, T=0.00 receiving score 0.00

Running MCTS step 1
Select unvisited action X:5
Simulation using prior value 0.2291
Node X:5 with parent=X, N=0, T=0.00 receiving score 0.23
Node ROOT with parent=NONE, N=1, T=0.00 receiving score 0.00

Running MCTS step 2
Select unvisited action X:4
Simulation using prior value 0.0829
Node X:4 with parent=X, N=0, 

In [5]:
##==========================================##
##  Play a game and generate training data  ##
##==========================================##

model_inputs, posteriors, values = get_training_data_from_bot_game(
    model, duration=1, discount=0.99, num_random_moves=2, base_policy=PolicyStrategy.NOISY_POSTERIOR_POLICY, 
    debug_lvl=DebugLevel.LOW)


Using bot <connect4.bot.Bot_NeuralMCTS object at 0x1440a9570>
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting uniformly random action
Action values are:  0.008   -0.070  -0.090  0.075   0.014   0.078   0.019 
Visit counts are:   2       12      12      168     18      55      3     
Selecting action 1
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | [31mX[0m | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting uniformly random action
Action values are:  -0.019  -0.143  -0.160  -0.060

In [6]:
##====================================================##
##  Check the data generated by the game is sensible  ##
##====================================================##

for inp, pos, val in zip(model_inputs, posteriors, values) :
    print(inp[:,:,0], ",  posterior="+"  ".join([f"{x:.2f}" for x in pos]), f",  value = {val[0]:.3f}")

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]] ,  posterior=0.01  0.04  0.04  0.62  0.07  0.20  0.01 ,  value = 0.923
[[ 0  0  0  0  0  0]
 [-1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]] ,  posterior=0.01  0.18  0.04  0.25  0.02  0.00  0.49 ,  value = -0.932
[[ 0  0  0  0  0  0]
 [ 1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [-1  0  0  0  0  0]] ,  posterior=0.00  0.04  0.06  0.88  0.00  0.00  0.00 ,  value = 0.941
[[ 0  0  0  0  0  0]
 [-1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [-1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 1  0  0  0  0  0]] ,  posterior=0.01  0.00  0.52  0.23  0.14  0.04  0.05 ,  value = -0.951
[[-1  0  0  0  0  0]
 [ 1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 1  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [ 0  0  0  0  0  0]
 [-1  0  0  0  0  0]] ,  posterior=0.0

In [None]:
##===================================================================================================##
##  Use MCTS to search for an optimal action, and compare the prior policy/value with the posterior  ##
##===================================================================================================##

game_board = GameBoard()
bot = Bot_NeuralMCTS(model, policy_strategy=PolicyStrategy.GREEDY_POSTERIOR_POLICY)

while not game_board.get_result() :
    player = game_board.to_play
    action = bot.choose_action(game_board, duration=5, discount=0.99, debug_lvl=DebugLevel.LOW)
    print("Prior policy was :  " + "  ".join([f"{c:.2f}" for c in bot.root_node.child_priors]))
    print("Prior values were:  " + "  ".join([f"{player.value*c.prior_value:.2f}" for c in bot.root_node.children]))
    game_board.apply_action(action)
    print(game_board)


Selecting greedy action from posterior policy 0.01 0.03 0.02 0.57 0.04 0.30 0.01
Action values are:  -0.018  -0.082  -0.188  0.036   -0.072  0.025   -0.033
Visit counts are:   11      30      19      504     36      267     11    
Selecting action 3
Prior policy was :  0.02  0.11  0.13  0.50  0.12  0.11  0.02
Prior values were:  0.04  -0.05  0.08  0.17  0.08  0.23  0.13
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior policy 0.00 0.00 0.00 0.99 0.00 0.00 0.00
Action values are:  -0.408  -0.300  -0.113  0.030   -0.132  -0.336  -0.405
Visit counts are:   1       1       3       860     2       1       1     
Selecting action 3
Prior policy was :  0.00  0.00  0.01  0.97  0.0

Selecting greedy action from posterior policy 0.01 0.97 0.00 0.00 0.01 0.01 0.00
Action values are:  -0.147  0.023   -0.561  -0.384  -0.277  -0.261  -0.182
Visit counts are:   5       854     1       1       12      5       1     
Selecting action 1
Prior policy was :  0.03  0.75  0.02  0.02  0.13  0.05  0.01
Prior values were:  0.29  0.11  -0.56  -0.38  -0.25  -0.41  -0.18
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | [34mO[0m | . | [31mX[0m | . | . | . |
| . | [31mX[0m | . | [34mO[0m | . | . | . |
| . | [34mO[0m | . | [31mX[0m | . | . | . |
| [31mX[0m | [34mO[0m | . | [34mO[0m | . | . | . |
| [31mX[0m | [34mO[0m | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior policy 0.05 0.86 0.01 0.04 0.01 0.01 0.01
Action values are:  -0.082  0.028   -0.715  -0.110  -0.387  -0.335  -0.300
Visit counts are:   45      771     8 

Selecting greedy action from posterior policy 0.10 0.00 0.02 0.00 0.79 0.04 0.04
Action values are:  -0.215  -0.325  -0.193  -0.251  -0.240
Visit counts are:   95      23      752     38      40    
Selecting action 4
Prior policy was :  0.13  0.00  0.12  0.02  0.54  0.10  0.09
Prior values were:  0.53  -0.68  0.01  0.01  -0.18
+---+---+---+---+---+---+---+
| . | [31mX[0m | . | [34mO[0m | . | . | . |
| . | [34mO[0m | . | [31mX[0m | . | [34mO[0m | . |
| . | [31mX[0m | . | [34mO[0m | [31mX[0m | [31mX[0m | . |
| . | [34mO[0m | . | [31mX[0m | [34mO[0m | [34mO[0m | . |
| [31mX[0m | [34mO[0m | . | [34mO[0m | [31mX[0m | [31mX[0m | . |
| [31mX[0m | [34mO[0m | . | [31mX[0m | [31mX[0m | [34mO[0m | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior policy 0.13 0.00 0.03 0.00 0.57 0.25 0.00
Action values are:  0.147   -0.015  0.164   0.341   -0.338
Vi