#  Play bot (vanilla MCTS)

---

Author: S. Menary [sbmenary@gmail.com]

Date  : 2023-01-03, last edit 2023-01-19

Brief : Play a game of Connect 4 against bot that uses simple MCTS with no machine learning.

---

## Imports

- Import key Python and PyPI packages and print their versions for reproducibility.
- Import the required Connect 4 game, bot and utility objects from our framework

---

In [1]:
##=====================================##
##  All imports should be placed here  ##
##=====================================##

##  Python core libs
import sys

##  PyPI libs
import numpy as np
import tensorflow as tf

##  Local packages
from connect4.utils import DebugLevel
from connect4.game  import GameBoard
from connect4.MCTS  import PolicyStrategy
from connect4.bot   import Bot_VanillaMCTS


In [2]:
##======================================##
##  Print versions for reproducibility  ##
##======================================##

print(f"    Python version is {sys.version}")
print(f"     Numpy version is {np.__version__}")
print(f"Tensorflow version is {tf.__version__}")


    Python version is 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:25:29) [Clang 14.0.6 ]
     Numpy version is 1.23.2
Tensorflow version is 2.11.0


---

## Play a game vs bot

Play a game of connect 4 against our bot!

The `GameBoard` object is used as a Connect 4 game environment. It may be updated using a command like `game_board.apply_action(column_index)`. We can create a simple ASCII representation of our game board using `print(game_board)`. See `help(GameBoard)` for more useful manipulation methods.

The `Bot_VanillaMCTS` object is used to apply bot actions using vanilla MCTS. Call `bot.take_move(game_board, duration)` to run MCTS for `duration` seconds and then play a bot move. Turning up the `duration` parameter will improve the bot by allowing it to search for longer.

The option `PolicyStrategy.GREEDY_POSTERIOR_VALUE` commands the bot to choose the action that maximises the action values estimated using MCTS. To act greedily over the posterior policy instead, use `PolicyStrategy.GREEDY_POSTERIOR_POLICY`. For a stochastic approach, use `PolicyStrategy.SAMPLE_POSTERIOR_POLICY`.

---

In [3]:
##=====================##
##  Create a new game  ##
##=====================##

game_board = GameBoard()
bot        = Bot_VanillaMCTS(policy_strategy=PolicyStrategy.GREEDY_POSTERIOR_VALUE)
print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE


---

Now play moves until the game is complete!

---

In [14]:
##===============##
##  Take moves!  ##
##===============##

##  Select our move
column_idx = 2

##  Apply our move as Player X
game_board.apply_action(column_idx)
print(game_board)

##  Apply the responding bot move move as Player O
if not game_board.get_result() :
    bot.take_move(game_board, duration=5, debug_lvl=DebugLevel.LOW)
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | [31mX[0m | . | . | . | . |
| . | [34mO[0m | [34mO[0m | [34mO[0m | . | . | . |
| . | [31mX[0m | [31mX[0m | [34mO[0m | [34mO[0m | [31mX[0m | . |
| [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m | [31mX[0m | [34mO[0m | . |
| [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m | [34mO[0m | [34mO[0m | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior values
Action values are:  -0.091  0.631   0.640   0.690   1.000   0.687   1.000 
Visit counts are:   22      130     139     168     1661    166     1660  
Selecting action 4
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | [31mX[0m | . | . | . | . |
| . | [34mO[0m | [34mO[0m | [34mO[0m | [34mO[0m | . | . |
| . | [31mX[0m | [31mX[0m | [34mO[0m | [34mO[0m | [31mX[0m | . |
| [31mX[0m | [34mO[0m | [31mX

---

## Bot-vs-bot game

Let's watch the bot play itself!

---

In [15]:
##========================##
##  Play bot-vs-bot game  ##
##========================##


##  Set up and display game
game_board = GameBoard()
result     = game_board.get_result()
print(game_board)

##  Keep playing moves until the game concludes
while not result :
    bot.take_move(game_board, duration=5, debug_lvl=DebugLevel.LOW)
    result = game_board.get_result()
    print(game_board)


+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior values
Action values are:  -0.111  0.200   0.191   0.298   -0.014  0.258   0.047 
Visit counts are:   54      155     152     265     69      186     86    
Selecting action 3
+---+---+---+---+---+---+---+
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | . | . | . | . |
| . | . | . | [31mX[0m | . | . | . |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior values
Action values are:  -0.227  -0.311  -0.193  -0.220  -0.470  -0.330  -0.277
Visit counts 

Selecting greedy action from posterior values
Action values are:  -0.488  -0.696  -0.593  -0.050  -0.488  -0.520  -0.506
Visit counts are:   82      46      59      974     82      75      77    
Selecting action 3
+---+---+---+---+---+---+---+
| . | . | . | [31mX[0m | . | . | . |
| . | . | . | [34mO[0m | . | . | . |
| . | . | . | [34mO[0m | . | . | . |
| . | . | [34mO[0m | [34mO[0m | [31mX[0m | . | . |
| . | . | [31mX[0m | [31mX[0m | [31mX[0m | [34mO[0m | . |
| . | . | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior values
Action values are:  -0.038  -0.562  -0.105  0.182   0.023   -0.100
Visit counts are:   156     32      114     665     213     120   
Selecting action 4
+---+---+---+---+---+---+---+
| . | . | . | [31mX[0m | . | . | . |
| . | . | . | [34mO[0m | . | . | . |
| . | . | . | [34mO[

Selecting greedy action from posterior values
Action values are:  -0.852  -0.938  -0.895  -0.942
Visit counts are:   2337    802     1275    765   
Selecting action 0
+---+---+---+---+---+---+---+
| . | . | . | [31mX[0m | [31mX[0m | . | [34mO[0m |
| . | . | . | [34mO[0m | [34mO[0m | . | [31mX[0m |
| . | . | [34mO[0m | [34mO[0m | [34mO[0m | . | [34mO[0m |
| [31mX[0m | . | [34mO[0m | [34mO[0m | [31mX[0m | . | [31mX[0m |
| [34mO[0m | . | [31mX[0m | [31mX[0m | [31mX[0m | [34mO[0m | [31mX[0m |
| [31mX[0m | . | [34mO[0m | [31mX[0m | [34mO[0m | [31mX[0m | [31mX[0m |
+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
+---+---+---+---+---+---+---+
Game result is: NONE
Selecting greedy action from posterior values
Action values are:  0.933   -0.500  -0.268  -0.462
Visit counts are:   6205    24      41      26    
Selecting action 0
+---+---+---+---+---+---+---+
| . | . | . | [31mX[0m | [31mX[0m | . | [34mO[0m |
| . | . | . | 