In [1]:
!pip install git+https://github.com/mattdeak/mcts.git

[33mSkipping mcts as it is not installed.[0m
Processing /notebooks
Collecting sortedcontainers>=1.5.9 (from mcts==0.4)
  Downloading https://files.pythonhosted.org/packages/cb/53/fe764fc8042e13245b50c4032fb2f857bc1e502aaca83063dcdf6b94d223/sortedcontainers-2.0.4-py2.py3-none-any.whl
Collecting logwood>=3.1.0 (from mcts==0.4)
  Downloading https://files.pythonhosted.org/packages/1c/7c/1294695c7f53d6101adb88c98aacb8cfa27a68adf29debc129b7a51d88e5/logwood-3.1.0.tar.gz
Collecting keras>=2.1.4 (from mcts==0.4)
[?25l  Downloading https://files.pythonhosted.org/packages/34/7d/b1dedde8af99bd82f20ed7e9697aac0597de3049b1f786aa2aac3b9bd4da/Keras-2.2.2-py2.py3-none-any.whl (299kB)
[K    100% |################################| 307kB 6.3MB/s ta 0:00:01
[?25hCollecting xxhash>=1.0.1 (from mcts==0.4)
[?25l  Downloading https://files.pythonhosted.org/packages/f5/34/86bb696206293afc33b62aaa72443f2c8344048b72a1e67c3cfe6caca1dc/xxhash-1.2.0-cp35-cp35m-manylinux1_x86_64.whl (47kB)
[K    100% |#######

# Step 1: Build Environment

In [2]:
from mcts.environments import TicTacToe, DotsAndBoxes

env = DotsAndBoxes()

In [3]:
env.board()

.    .    .    .    .
                    
.    .    .    .    .
                    
.    .    .    .    .
                    
.    .    .    .    .
                    
.    .    .    .    .
Player 1 Score is 0
Player 2 Score is 0



# Step 2: Build Neural Network
I've built some utility scripts to aid in this. All that's required for a working model is to have both a policy output and a value output. We'll use the `load_zeronet` utility to load a neural-net architecture similar to the AlphaGo Zero architecture.

In [4]:
import tensorflow as tf
import keras.backend as K
from keras.models import load_model
from mcts.nn.utils import load_zeronet

from mcts.nn.model import Model
keras_model = load_zeronet(env.state.shape, env.action_space, lr=0.001, residual_layers=2)
mcts_model = Model(keras_model) # Takes a Keras/TF Model

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



# Step 3: Configuring Policies
There are a couple different types to choose from, but only a couple are required for MCTS to run.
1. Selection - Policy that chooses an action during the selection phase of MCTS
2. Expansion - Policy that expands a leaf node in MCTS.
3. Update - Policy that determines how nodes are updated at the end of a MCTS.
4. Action - Policy that chooses what action to play based on results of MCTS.

Building a config file is pretty straightforward! Just use a json-like structure. You can check the supported policies by running the command below.

A simple config dictionary is shown below. If you want to add keyword arguments, which some policies take, just use add `_kwargs` after the policy type and put the keyword arguments in a dictionary.

In [5]:
config = {
    'model' : mcts_model,
    'action' : 'proportional-to-visit-count',
    'selection' : 'puct',
    'selection_kwargs' : {'C' : 1.14},
    'expansion' : 'neural',
    'update' : 'value'
}

### Building the MCTS

If you don't care about actually training your model, then you can build the MCTS with a config dictionary. Just specify the policy _type_ as the key and the policy object as the value.
You can check the supported policy types by using `mcts.SUPPORTED_POLICY_TYPES`

In [6]:
from mcts.mcts import MCTS

m = MCTS(env, calculation_time=3)
m.build(config)

In [7]:
m.act()
env.board()

.    .    .    .    .
                    
.----.    .    .    .
                    
.    .    .    .    .
                    
.    .    .    .    .
                    
.    .    .    .    .
Player 1 Score is 0
Player 2 Score is 0



# Step 4: Building the Replay Table, Trainer, Evaluator and Terminal Callback
However, we don't have a pretrained neural net. In order to _train_ the neural net, we'll need some extra classes.
1. A Replay Table - This is just data storage for our training data,
2. An Evaluator - This class lets us pit old models against new models in a tournament. This is how we determine if the model we're training is ready to take over in guiding the MCTS.
3. A Trainer - This class handles the legwork in actually training the neural net.

The trainer we'll be using is `StagedModelTrainer` - this will load game results into a replay table and, once a certain number of games have been reached, train the model and evaluate it.

### The Replay Table
The replay table stores the training data. In order to format itself efficiently, it needs the dimensions of the state space and action space. 

In [8]:
from mcts.nn.replay import BasicReplay
from keras.callbacks import TensorBoard
replay = BasicReplay(env.state.shape, env.action_space, capacity=10000)

You can save a replay table to a file in its current state by using the `save()` method. This comes in handy if you want to keep all the data your MCTS generates.

In [9]:
import os
#os.mkdir('replay')
#os.mkdir('replay/dotsandboxes')
replay.save('replay/dotsandboxes/test')

You can load the saved model by using the `load_replay` function.

In [10]:
from mcts.nn.replay import load_replay
replay2 = load_replay('replay/dotsandboxes/test')

### The Evaluator
The evaluator is used to determine if one MCTS model is better than another. The NNEvaluator runs a tournament between two identical MCTS trees with the exception that one is using a different neural network. 
To instantiate the evaluator, we only need a config dictionary that contains some MCTS policies and our model.

We'll just use the `most_visited` action policy here for demonstration. This action policy will just choose the action that has been explored the most.

In [11]:
from mcts.policies.action import MostVisited
from mcts.evaluators import NNEvaluator

evaluation_config = {
    'model' : mcts_model,
    'selection' : 'puct',
    'expansion' : 'neural',
    'update' : 'value',
    'action' : 'most-visited'
}
    
evaluator = NNEvaluator(env, evaluation_config)

Usually we won't have to run the evaluator manually, since the Trainer will handle that. However, if we ever __do__ want to run the evaluator manually, we can simply use the `.evaluate()` method. The NNEvaluator takes `incumbent_model` and `challenger_model`. We'll just test this briefly using the exact same model to see how it works.

In [12]:
results = evaluator.evaluate(mcts_model, mcts_model, games=1)
results.winner

'Challenger'

### The Trainer
The trainer is the thing that actually allows you to train a neural net with MCTS. To instantiate it, we require:
1. The game environment
2. The config for our mcts (including the model)
3. A replay table
4. An evaluator
5. Any Keras Callbacks that we want. We'll use tensorboard here. (optional)
6. A model directory. The staged model trainer will save our model every time it get updated. If no model directory is specified, then it just won't save the model. (optional)
7. A replay directory. The trainer will save the replay table to this directory at the end of every "data generation" stage. (optional)

Here, we're going to use a `StagedModelTrainer`. This trainer will go through three stages to train the neural network.

The first stage is our data generation stage. In this stage, trainer will have the MCTS play against itself and record data from all the games. The data is stored in the replay table.

The second stage is the training stage. In this stage, the trainer will use the data in the replay table to train the neural network.

The third stage is the evaluation stage. In this stage, the trainer will evaluate the pre-training model and the post-training model. If the post-training model does better, then the model is updated and the cycle repeats.


In [13]:
from mcts.nn.trainers import StagedModelTrainer
from keras.callbacks import TensorBoard

# Make directories to save information
#os.mkdir('models')
#os.mkdir('models/dotsandboxes')

trainer = StagedModelTrainer(env, config, replay, evaluator,  
                             model_dir='models/dotsandboxes',
                             replay_dir='replay/dotsandboxes')

# 5. Initiate Self-Play
You can simply use the `trainer.train()` method. Just set the number of games you want to play and it'll do the rest!

In [14]:
trainer.train(epochs=2, generation_steps=2, training_steps=10, evaluation_steps=1)

[1;37m[1534980394.1441956][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Starting epoch 0[0m
[1;37m[1534980394.1510608][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Entering Generation Phase[0m
[1;37m[1534980394.1529856][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Playing Generation Game 0[0m
[1;37m[1534980434.2182453][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Playing Generation Game 1[0m
[1;37m[1534980474.3330061][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Saving Replay Table to replay/dotsandboxes/replay0[0m
[1;37m[1534980474.426401][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Entering Training Phase[0m


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[1;37m[1534980476.839971][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Entering Evaluation Phase[0m
[1;37m[1534980554.9165542][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Challenger model wins - updating model...[0m
[1;37m[1534980554.9599597][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Saving Model to models/dotsandboxes/model0[0m
[1;37m[1534980555.0970201][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Starting epoch 1[0m
[1;37m[1534980555.097556][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Entering Generation Phase[0m
[1;37m[1534980555.0979748][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Playing Generation Game 0[0m
[1;37m[1534980595.225479][localhost

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[1;37m[1534980635.6362994][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Entering Evaluation Phase[0m
[1;37m[1534980715.780857][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Challenger model wins - updating model...[0m
[1;37m[1534980715.7909248][localhost][/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py][StagedModelTrainer][INFO] Saving Model to models/dotsandboxes/model1[0m


As of version 0.4, only the StagedModelTrainer has been implemented. There are plans for more sophisticated training methods in future versions.