# Connect 4

---

Author: S. Menary [sbmenary@gmail.com]

Date  : 2023-01-18, last edit 2023-01-18

Brief : Play a bot using Monte Carlo Tree Search (MCTS) with a neural policy/value function.

---

## Imports

In [1]:
###
###  All imports should be placed here
###

##  Python core libs
import sys, time

##  PyPI libs
import numpy as np

##  Local packages
from connect4.utils    import DebugLevel
from connect4.game     import BinaryPlayer, GameBoard
from connect4.MCTS     import PolicyStrategy
from connect4.bot      import Bot_NeuralMCTS
from connect4.neural   import load_model
from connect4.parallel import MonitorThread, WorkerThread, kill_threads


In [2]:
###
###  Print version for reproducibility
###

print(f"Python version is {sys.version}")
print(f"Numpy  version is {np.__version__}")

Python version is 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
Numpy  version is 1.23.2


##  MCTS

The `Bot_NeuralMCTS` object is used to apply bot actions using MCTS with a loaded model.

##  Create a bot

In [3]:
###
###  Load a model
###

model_name = "../models/.neural_model_v2.h5"
model = load_model(model_name)


In [4]:
###
###  Ask bot to play a move
###

##  Create game board
game_board = GameBoard()
print(game_board)

##  Use bot to search for an optimal action
bot = Bot_NeuralMCTS(model, policy_strategy=PolicyStrategy.GREEDY_PRIOR_VALUE)
bot.take_move(game_board, duration=0, debug_lvl=DebugLevel.LOW)

##  Show updated game state
print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior values
Action values are:  N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Visit counts are:   N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Selecting action 4
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | [31mX[0m | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE


## Play a game

Play a game of connect 4 against our bot!

Just add new calls to `game_board.apply_action(column_index)` to play a move in column `column_index`, and `bot.take_move(game_board, duration)` to play a bot move in response. Turning up the `duration` parameter will improve the bot by allowing it to search for longer.

In [5]:
##  Create a new game

game_board = GameBoard()
bot        = Bot_NeuralMCTS(model, policy_strategy=PolicyStrategy.GREEDY_PRIOR_POLICY)
print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE


In [6]:
##  Play a move in column index 3

game_board.apply_action(3)
print(game_board)

if not game_board.get_result() :
    bot.take_move(game_board, duration=0, debug_lvl=DebugLevel.LOW)
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Visit counts are:   N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Selecting action 3
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [34mO[0m | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE



... and so on until the game is complete!


## Bot-only game

Let's watch the bot play itself!

In [7]:
#  Play a bot game!

game_board = GameBoard()
print(game_board)

result = game_board.get_result()
while not result :
    bot.take_move(game_board, duration=1, debug_lvl=DebugLevel.LOW)
    prior_values = np.array([x.prior_value for x in bot.root_node.children])
    if bot.root_node.player == BinaryPlayer.O : prior_values = -prior_values
    print("Prior values:  " + "  ".join([f"{x:.3f}" for x in prior_values]))
    result = game_board.get_result()
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  0.090   0.004   -0.035  0.315   -0.228  -0.090  -0.089
Visit counts are:   1       1       1       165     2       1       1     
Selecting action 3
Prior values:  0.090  0.004  -0.035  0.159  0.166  -0.090  -0.089
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  -0.596  -0

Selecting greedy action from prior policy
Action values are:  -0.998  -0.958  -0.651  0.056   -0.337  -0.973  0.375 
Visit counts are:   1       2       3       5       1       1       153   
Selecting action 1
Prior values:  -0.998  -0.973  -0.427  0.629  -0.337  -0.973  0.997
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | [34mO[0m | [34mO[0m | . | . | . |
| . | . | [31mX[0m | [31mX[0m | . | . | . |
| . | [31mX[0m | [31mX[0m | [34mO[0m | . | . | . |
| . | [34mO[0m | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  -0.995  0.385   0.422   0.448   -0.687  -0.460  0.396 
Visit counts are:   1       123     11      16      1       1       32    
Selecting action 1
Prior values:  -0.995  0.747  0.942  0.685  -0.687  -0.460  0.404
+---+-

Selecting greedy action from prior policy
Action values are:  -0.078  -0.232  0.123   0.027   -0.084
Visit counts are:   15      6       122     31      12    
Selecting action 0
Prior values:  -0.319  -0.275  -0.054  0.402  -0.656
+---+---+---+---+---+---+---+
| . | . | [34mO[0m | [34mO[0m | . | . | . |
| . | [31mX[0m | [34mO[0m | [31mX[0m | . | . | . |
| . | [31mX[0m | [34mO[0m | [34mO[0m | . | [31mX[0m | . |
| . | [34mO[0m | [31mX[0m | [31mX[0m | . | [34mO[0m | . |
| . | [31mX[0m | [31mX[0m | [34mO[0m | . | [31mX[0m | . |
| [34mO[0m | [34mO[0m | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  0.015   -0.367  0.028   0.213   -0.115
Visit counts are:   16      3       19      135     13    
Selecting action 6
Prior values:  -0.038  -0.133  -0.783  0.294  -0.101

Selecting greedy action from prior policy
Action values are:  -0.509  -0.117  -0.651  -0.556
Visit counts are:   10      528     12      6     
Selecting action 4
Prior values:  -0.941  -0.925  -0.743  -0.744
+---+---+---+---+---+---+---+
| . | [31mX[0m | [34mO[0m | [34mO[0m | . | . | . |
| . | [31mX[0m | [34mO[0m | [31mX[0m | . | [34mO[0m | [34mO[0m |
| . | [31mX[0m | [34mO[0m | [34mO[0m | [31mX[0m | [31mX[0m | [31mX[0m |
| [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m | [34mO[0m | [34mO[0m | [34mO[0m |
| [31mX[0m | [31mX[0m | [31mX[0m | [34mO[0m | [34mO[0m | [31mX[0m | [31mX[0m |
| [34mO[0m | [34mO[0m | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Action values are:  -0.199  -0.072  0.153   0.139 
Visit counts are:   42      57      405     119   
Selecting a