# ConnectX - MCTS Bitboard + Bitsquares Heuristic

Monty Carlo Tree Search using an precached Object Oriented Tree + Numpy Bitboard Bitshifting Game Model + Bitsquares Heuristic + Precache training



# Bitboard

Focusing attention back on raw performance resulted in a bitboard implementation. 
A 84 bit number was used, divided into 2 x 42 bit subsections. 
The first half the bitboard stored if a square was empty or contained a piece (0==empty, 1==filled).
The second half the bitboard stored which player's token was in each square (0==p1, 1==p2).

The gameover/utility function was implemented by creating bitmasks for all 69 possible win lines,
for fast iteration to see if all squares where filled, and then if so where they all filled by the same player. 

A simple but fast heuristic was designed that emulated the gameover/utility methodology, 
but also matched on empty squares, signifying the number of potential connect4 lines that could be created.

I have written a tutorial on the vectorized bitshifting implementation used in this kernel:
- https://www.kaggle.com/jamesmcguigan/connectx-vectorized-bitshifting-tutorial



# Monty Carlo Tree Search

Leaderboard Scores: 
- **1075 (top 6%)** | [MontyCarloPure + Cached Data](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16816641)
- **1065 (top 7%)** | [MontyCarloPure](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16779472)

Source:
- [agents/MontyCarlo/MontyCarloPure.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/agents/MontyCarlo/MontyCarloPure.py)

Whereas Minimax / Negamax performs a breadth first search of the game state tree, Monty Carlo selectively expands
the tree deeper in areas where success is more probable. Tree/graph expansion follows a similar shape to A* search. 

The algorithm starts with a root node with unexpanded children.

Expansion happens in two parts. First is node selection, where is the tree is traversed from the root node, 
choosing the child node with the highest UCB score, which includes an additional term to encourage exploration.
When an unexpanded leaf node is reached, a random simulation of the game is run from that position 
with the score and total counts backpropergated along the tree path.

Expansions are repeatedly run until the timeout expires, and the child of root node with the highest score is chosen 
as the returned as the agent action.


# Monty Carlo Heuristic Search

Leaderboard Scores: 
- **1110 (top 4%)** | [MontyCarloHeuristic + BitboardGameoversHeuristic + Cached Data](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16832763)
- **1110 (top 4%)** | [MontyCarloHeuristic + BitboardGameoversHeuristic + Cached Data](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16832763)
- **1070 (top 6%)** | [MontyCarloHeuristic + BitboardGameoversHeuristic](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16783457)

Source:
- [agents/MontyCarlo/MontyCarloPure.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/agents/MontyCarlo/MontyCarloPure.py)

This approach replaces random simulation (producing an integer score of 0|1) with a sigmoided version of 
`bitboard_gameovers_heuristic()` (producing a floating point score between 0 and 1). 

Scaling the sigmoid by factor of 6 (division), produced the highest winrate compared to other numbers. 
This means a heuristic value of +6 would return a sigmoid score of +0.73, which is pretty winning. 
Smaller differences in heuristic score would result in a value much closer to 0.5 draw. 
If a terminal state in the game tree is reached, an integer score of 0|1 is returned.

Compared to random simulation, a heuristic provides more indepth knowledge about a position, and it is significantly 
faster to compute than running a full game simulation. This means many more expansions can be run within the same time limit.




# Cached Data
- [util/base64_file.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/util/base64_file.py)
- [core/PersistentCacheAgent.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/core/PersistentCacheAgent.py)

Much of the MontyCarlo runtime is spent expanding the tree and computing simulations/heuristics. 
Some of this could be precomputed to extend the depth of search possible during the 8 second turn timer.

The state tree needed to be persisted to disk, in a pickle.zip.base64 format suitable for embedding as text 
in a python script, then reloaded upon initialization. 

Several hours of localhost runtime where spent playing the MontyCarlo agents against themselves and other agents
including hundreds of matches against random_agent. This effectively generated an opening book and precached engdame 
values for some expected lines of play.  

In theory, kaggle allows 100Mb of submission data, but in practice data files larger than 5-10Mb 
cause kaggle submission errors. The original datafile generated by the above process was 47Mb which was too large.
By pruning the tree of any nodes that had not been fully expanded, the filesize was reduced to a workable 5Mb.   

This cached datafile significantly improved winrates on the leaderboard, both for Random Simulation 
and Heuristic versions of Monty Carlo Tree Search.



# Bitsquares Heuristic 

Leaderboard Scores: 
- **(1120)** | [AlphaBetaBitboard + bitsquares_heuristic(reward_bitsquares=1.75)](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16964089)
- **(1120)** | [MontyCarloTreeSearch + bitsquares_heuristic(reward_bitsquares=1.75, sigmoid_width=6.0)](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16964089)
- **(1140)** | [MontyCarloTreeSearch + bitsquares_heuristic(reward_bitsquares=1.75, sigmoid_width=7.0) + 1.25MB precache](https://www.kaggle.com/c/connectx/submissions?dialog=episodes-submission-16964089)



Code:
- [heuristics/BitsquaresHeuristic.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/heuristics/BitsquaresHeuristic.py)
 
This takes a similar approach to Bitboard Gameovers Heuristic, in that it checks for every possible connect 4
bitmask, containing only player owned or empty squares that is not blocked by opponent pieces.

The score is based counting the number of bits in each line and squaring the bitcount. 

Hyperparameter tuning discovered that using a power of 1.75 rather than 2 improved the winrate against
`bitboard_gameovers_heuristic()` from 48% to 94%, without using any `double_attack_score` logic.

Optimal Reward Values given: **n ** 1.75**:
- 1-in-a-row = 1 
- 2-in-a-row = 3.4
- 3-in-a-row = 6.8
- 4-in-a-row = inf


I have tried writing more advanced heuristics, taking into account the odd/even height of specific squares. These however have not performed quite as well in practice. 
The main issue is the additional CPU performance cost of the heuristic, which is getting called in a tight loop. Investing more of out CPU budget into a heuristic score for each node, means that MCTS will be able to run less simulations and expand less nodes, thus allowing a cheap heuristic win against a better but more expensive heuristic. The bitsquares heuristic is much faster than simulating an entire game using random agent. 
- [heuristics/BitboardOddEvenHeuristic.py](https://github.com/JamesMcGuigan/ai-games/blob/master/games/connectx/heuristics/BitboardOddEvenHeuristic.py)


A future ideas for research is to combine MCTS with a heuristic designed using a neural network trained through self-play. This mirrors the underlying design of AlphaZero.
In preperation for this I have attempted to implement the `is_gameover()` function in pytorch.
- https://www.kaggle.com/jamesmcguigan/connectx-implementing-functions-in-pytorch/

# Buildchain

So here we can just fetch the commit from the github repo, and compile via kaggle_compile.py.
- https://www.kaggle.com/jamesmcguigan/kaggle-compile-py-python-ide-to-kaggle-compiler

NOTE: Kaggle notebooks only allow a maximum 1Mb of code in a notebook, so using precached base64 data cannot be simply be copy/pasted into a notebook, but must be generated via a runtime command.

We can take the existing saved data from the github repo, and make use of additional compute time on the Kaggle servers by rerunning the training script for additional runtime.

In [None]:
!apt-get install time dos2unix -y -qq  # NOTE: -qq requires using apt-get not apt

In [None]:
!pip install fastcache

In [None]:
!rm -rf /ai-games/
!git clone https://github.com/JamesMcGuigan/ai-games.git /ai-games/
#!git checkout e275d53b40ce499ee670b0c1dd00bef17235affe
!cd /ai-games/; git log -n1 

# Codebase

This can also be viewed on github: https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx

In [None]:
!cd /ai-games/games/connectx; /ai-games/kaggle_compile.py agents/MontyCarlo/MontyCarloBitsquares.py | dos2unix | sed 's/^# *@njit/@numba.njit/g' > /MontyCarloBitsquares.compiled.py

In [None]:
import IPython

def display_source(code):
    def _jupyterlab_repr_html_(self):
        from pygments import highlight
        from pygments.formatters import HtmlFormatter

        fmt = HtmlFormatter(style='tango')  # https://overiq.com/pygments-tutorial/#style
        style = "<style>{}\n{}</style>".format(
            fmt.get_style_defs(".output_html"), fmt.get_style_defs(".jp-RenderedHTML")
        )
        return style + highlight(self.data, self._get_lexer(), fmt)

    # Replace _repr_html_ with our own version that adds the 'jp-RenderedHTML' class
    # in addition to 'output_html'.
    IPython.display.Code._repr_html_ = _jupyterlab_repr_html_
    return IPython.display.Code(data=code, language="python3")

In [None]:
display_source('/MontyCarloBitsquares.compiled.py')


# Tests
## Unit Tests
- [./heuristics/LibertiesHeuristic_test.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/./heuristics/LibertiesHeuristic_test.py)
- [./core/ConnectXBBN_test.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/./core/ConnectXBBN_test.py)
- [./core/ConnectX_test.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/./core/ConnectX_test.py)
- [./core/ConnectXBitboard_test.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/./core/ConnectXBitboard_test.py)
- [./util/base64_file_test.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/./util/base64_file_test.py)

Unit tests validate that individual functions work as expected.


## Integration Tests
- [tests/test_board_positions.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/tests/test_board_positions.py)
- [tests/test_can_win_against.py](https://github.com/JamesMcGuigan/ai-games/tree/master/games/connectx/tests/test_can_win_against.py)

Integration testing can be applied to AI algorithms by giving them game puzzles to solve, 
especially in positions where a human can verify that there is only one (or two) winning/losing moves.

The simplest is being one move away from connect 4 and seeing if the agent can either find the winning move, 
or block the opponent from that square. More complicated positions include being able to spot a double attack, 
which requires a Minimax search depth of 4, or knowing which column to play during an endgame. 

A second form of integration tests is a live matchup against the inbuilt kaggle agents: random and negamax.
Any leaderboard worthy agent should be able score a near 100% winrate against these opponents, 
so a logic mistake in the algorithm will show up as a test failure.

In [None]:
# Disable tests for other agents
!perl -p -i -e 's/^[# ]*(.*(?<!MontyCarloBitsquares)\(\).*)$/#   $1/' /ai-games/games/connectx/tests/fixtures/agents.py
!cd /ai-games/games/connectx/; find /ai-games/games/connectx/tests -type f -name '*.py' | xargs -I{} bash -c "echo -e '\n\n### {}\n\n'; cat {}" > /kaggle/working/tests.py

In [None]:
display_source('/kaggle/working/tests.py')

In [None]:
!cd /ai-games/games/connectx; pytest

# Precache Training

Rerun the training loop to generate additional cached data. 

Ideally should be on a timer, but coincidently the current training config takes 8.3 hours, which is almost perfect.

BUGFIX: `base64_file_load() Exception: No module named 'agents'`: There is a issue with pickle being unable to re-resolve modules after being compiled into a single script. Thus as a workaround, the training script itself should be run through kaggle_compile.py and the training done within a single-file namespace. This can be solved by using dill rather than pickle.

In [None]:
# Copy datafile from previous notebook run | comment out prefer github datafiles
!cp -f /kaggle/input/*/data/*base64.py /ai-games/games/connectx/data/

In [None]:
### NOTE: we are using dill rather than pickle now, so we should (hopefully) be able to train outside of kaggle_compile.py
# !cd /ai-games/games/connectx; python3 /ai-games/kaggle_compile.py ./training_montycarlo.py > /kaggle/working/training.py
# !perl -p -i -e 's/^(for timeout in).*$/$1 [0.25]:/' /kaggle/working/training.py                  # Shorten training times during development
# !cd /ai-games/games/connectx; time -p python3 /kaggle/working/training.py | grep 'save\|load';   # CWD still relative to data directory

!cd /ai-games/games/connectx; time -p python3 ./training_montycarlo.py | grep 'save\|load';        # CWD still relative to data directory

# Submission

This is how we export to a submissions.py file via [kaggle_compile.py](https://www.kaggle.com/jamesmcguigan/kaggle-compile-py-python-ide-to-kaggle-compiler).

NOTE: kaggle_compile.py assumes all codebase import statements are relative to the current directory.

In [None]:
# BUGFIX: windows lineendings when generated inside kaggle notebook
!dos2unix /ai-games/games/connectx/data/*base64.py 2> /dev/null

In [None]:
!cd /ai-games/games/connectx; /ai-games/kaggle_compile.py agents/MontyCarlo/MontyCarloBitsquares.py ./data/MontyCarloBitsquaresNode_base64.py > /kaggle/working/submission.py

# BUGFIX: AttributeError: module 'numba' has no attribute 'config'  
# NOTE:   This doesn't happen when running kaggle_compile.py on localhost, only when running from inside an notebook
!perl -p -i -e 's/^(import numba|bitboard_type)/#$&/' /kaggle/working/submission.py

!ls -lah /kaggle/working/submission.py

Test we have no compilation errors

BUG: Notebook is currently experiencing out of memory errors on commit when training is included, so removing the runtime element of this notebook

In [None]:
%run submission.py

Export data files to home directory to be downloadble from kaggle notebook output

In [None]:
!cp -rf /ai-games/games/connectx/data /kaggle/working/
!rm -f  /kaggle/working/data/__init__.py

In [None]:
from kaggle_environments import evaluate, make, utils

%load_ext autoreload
%autoreload 2

# Versus Self

In [None]:
# ### Play against yourself without an ERROR or INVALID.
# ### Note: The first episode in the competition will run this to weed out erroneous agents.

env = make("connectx", debug=True)
env.run(["/kaggle/working/submission.py", "/kaggle/working/submission.py"])
print("\nEXCELLENT SUBMISSION!" if env.toJSON()["statuses"] == ["DONE", "DONE"] else "MAYBE BAD SUBMISSION?")
env.render(mode="ipython", width=500, height=450)

# Versus Negamax

In [None]:
env = make("connectx", debug=True)
env.run(["negamax", "/kaggle/working/submission.py"])
print("\nEXCELLENT SUBMISSION!" if env.toJSON()["statuses"] == ["DONE", "DONE"] else "MAYBE BAD SUBMISSION?")
env.render(mode="ipython", width=500, height=450)

# Versus Human

Under leaderboard game conditions, the computer has 8 seconds per move, but this seems very slow from a human interaction standpoint. Thus the timer vs human play has been reduced to 2 seconds.

NOTE: this only seems to work inside the Kaggle Editor, and not in the published HTML version

In [None]:
# env = make("connectx")
# env.configuration.timeout = 2  # Don't make a human wait 8 seconds between moves
 
# print('Human plays first against the computer')
# env.play([None, MontyCarloBitsquares()], width=500, height=450)  

# print('Human plays second against the computer')
# env.play([MontyCarloBitsquares(), None], width=500, height=450)  