#  Play bot (neural MCTS)

---

Author: S. Menary [sbmenary@gmail.com]

Date  : 2023-01-18, last edit 2023-01-19

Brief : Play a game of Connect 4 against bot that uses MCTS with a learned policy/value function.

---

## Imports

- Import key Python and PyPI packages and print their versions for reproducibility.
- Import the required Connect 4 game, bot and utility objects from our framework

---

In [1]:
##=====================================##
##  All imports should be placed here  ##
##=====================================##

##  Python core libs
import sys

##  PyPI libs
import numpy as np
import tensorflow as tf

##  Local packages
from connect4.utils  import DebugLevel
from connect4.game   import GameBoard
from connect4.MCTS   import PolicyStrategy
from connect4.bot    import Bot_NeuralMCTS
from connect4.neural import load_model


In [2]:
##======================================##
##  Print versions for reproducibility  ##
##======================================##

print(f"    Python version is {sys.version}")
print(f"     Numpy version is {np.__version__}")
print(f"Tensorflow version is {tf.__version__}")


    Python version is 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
     Numpy version is 1.23.2
Tensorflow version is 2.11.0


---

## Play a game vs bot

Play a game of connect 4 against our bot!

The `GameBoard` object is used as a Connect 4 game environment. It may be updated using a command like `game_board.apply_action(column_index)`. We can create a simple ASCII representation of our game board using `print(game_board)`. See `help(GameBoard)` for more useful manipulation methods.

The `Bot_NeuralMCTS` object is used to apply bot actions using neural MCTS. Call `bot.take_move(game_board, duration)` to run MCTS for `duration` seconds and then play a bot move. Turning up the `duration` parameter will improve the bot by allowing it to search for longer.

The option `PolicyStrategy.GREEDY_POSTERIOR_POLICY` commands the bot to choose the action with the highest posterior policy estimated using MCTS. To act greedily over the action values instead, use `PolicyStrategy.GREEDY_POSTERIOR_VALUES`. For a stochastic approach, use `PolicyStrategy.SAMPLE_POSTERIOR_POLICY`.

You may also skip the MCTS entirely by using `duration=0` and selecting actions using the _prior_ evaluations. In particular:
- `PolicyStrategy.GREEDY_PRIOR_POLICY` means we select the action that maximises the policy evaluated using a single neural network pass.
- `PolicyStrategy.GREEDY_PRIOR_VALUES` means we select the action that maximises the value evaluated using a single neural network pass on each of the child nodes.
- `PolicyStrategy.SAMPLE_PRIOR_POLICY` means we sample from the policy evaluated using a single neural network pass.

---

In [3]:
##=========================================##
##  Load our neural policy/value function  ##
##=========================================##

##  Configure which model to load
model_name = "../models/test_run/neural_model_v7.h5"

##  Load this model
model = load_model(model_name)


In [8]:
##=====================##
##  Create a new game  ##
##=====================##

game_board = GameBoard()
bot        = Bot_NeuralMCTS(model, policy_strategy=PolicyStrategy.GREEDY_PRIOR_POLICY)
print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE


---

Now play moves until the game is complete!

---

In [25]:
##===============##
##  Take moves!  ##
##===============##

##  Select our move
column_idx = 1

##  Apply our move as Player X
game_board.apply_action(column_idx)
print(game_board)

##  Apply the responding bot move move as Player O
if not game_board.get_result() :
    bot.take_move(game_board, duration=0, debug_lvl=DebugLevel.LOW)
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | [31mX[0m | [31mX[0m | [31mX[0m | [31mX[0m | . | . |
| . | [34mO[0m | [34mO[0m | [31mX[0m | [34mO[0m | . | . |
| . | [34mO[0m | [34mO[0m | [34mO[0m | [31mX[0m | . | . |
| . | [31mX[0m | [31mX[0m | [31mX[0m | [34mO[0m | . | . |
| . | [34mO[0m | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: X


---

## Bot-vs-bot game

Let's watch the bot play itself!

---

In [26]:
##========================##
##  Play bot-vs-bot game  ##
##========================##


##  Set up and display game
game_board = GameBoard()
result     = game_board.get_result()
print(game_board)

##  Keep playing moves until the game concludes
while not result :
    bot.take_move(game_board, duration=0, debug_lvl=DebugLevel.LOW)
    result = game_board.get_result()
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Prior policy is 0.02 0.02 0.02 0.86 0.02 0.02 0.02
Action values are:  N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Visit counts are:   N/A     N/A     N/A     N/A     N/A     N/A     N/A   
Selecting action 3
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from prior policy
Prior policy is 0.02 0.02 0.37 0.20 0.33 0.02