## Demonstration on the usage of MCTS library

#### Advanced usage of using a previously saved predictor and a web-based predictor

As the predictor only needs to provide a "get" method, many implementations are possible. Here demonstrate the generic way of using the interface.

In [1]:
# import libraries
from MCTS.utils import *
from MCTS.Agent import *
from MCTS.MCTS import *

### Import Game Rule

In [2]:
import rules.Othello as Othello
OthelloGame   = Othello.OthelloGame   # shorthand
OthelloHelper = Othello.OthelloHelper # shorthand

### Import Data Structure

In [3]:
class OthelloDataNode(GameDataNode):
    def __init__(self, name, Game=OthelloGame, player=1):
        super().__init__(Game=Game, name=name, player=player)
    # end def

    def str_encode(self):
        import hashlib, shlex
        '''encode into a string representation'''
        temp = []
        for state in self.state:
            data_str = ' '.join(shlex.split(str(state.data)))
            temp.append(str({'data': data_str, 'player': self.player}))
        # end for
        return hashlib.md5(str(temp).encode()).hexdigest()
    # end def
# end class

### Import and Inherit Agent class

In [4]:
# import agent class here
from MCTS.Agent import Agent
class OthelloAgent(Agent):
    def __init__(self, predictor):
        super(OthelloAgent, self).__init__(
            Game = OthelloGame,
            predictor = predictor
        ) # end super
    # end def
# end class

### Simulation parameter setting

Parameters need to set for both players. We use the same set of values here.

In [5]:
state_memory_n = 3
tree_depth   = 3
allowed_time = 10

# for player P
tree_depth_p   = tree_depth
allowed_time_p = allowed_time

# for player Q
tree_depth_q   = tree_depth
allowed_time_q = allowed_time

### Agent initialization
Initialize "intelligent" agents to play the game.

#### Use a previously saved object (*.dill) as the predictor

In [6]:
import dill
filename = './test_predictors/greedy_predictor.dill'
with open(filename, 'rb') as fin:
    rand_predictor = dill.load(fin)
# end with

In [7]:
# illustrate that predictor has a "get" method
callable(rand_predictor.get)

True

Make an agent with the predictor

In [8]:
agent_p = Agent(OthelloGame, rand_predictor)

#### Define a web-based predictor

In a more general use case, it is possible to define a predictor to request prediction from a web-server. In this flavor, it is possible to parallelize the operation, and the model can be modified real-time at server-side.

To use a test server for this demo, please boot it up with
```
$ cd test_predictors/; 
$ ../venv/bin/python greedy_pred_server.py
```

In [9]:
# make sure the server is running
import requests
r = requests.get('http://localhost:8080')
print(r.content)

b'This is a MCTS predictor interface server. Use /get to predict the next state.'


In [10]:
class WebPredictor:
    def __init__(self, url, name=None):
        self.url  = url
        self.name = name
    # end def

    def get(self, state):
        import json
        import base64
        import dill # use dill to serialize object here for demo. In production this should be optimize.
        import requests
        
        # encode state
        state_repr = dill.dumps(state)
        state_repr = base64.encodebytes(state_repr).decode('ascii')
        # make prediction request
        r = requests.post(self.url, json={'data':state_repr})
        # note: usually error-catching follows a web-call. Here omitted.
        # decode results
        action_repr = r.json()['data']['action']
        action_repr = base64.decodebytes(action_repr.encode('ascii'))
        action = dill.loads(action_repr)
        return action
    # end def
# end class

In [11]:
url = 'http://localhost:8080/get'
web_predictor = WebPredictor(url=url)

In [12]:
# illustrate that predictor has a "get" method
callable(web_predictor.get)

True

Make an agent with the predictor

In [13]:
agent_q = Agent(OthelloGame, web_predictor)

(note: materials below are the same as previous demo)

### Define a MCTS Runners
For both players P and Q

In [14]:
# Instantiate runners with predictors
runner_p = MCTS_Runner(OthelloDataNode, agent=agent_p)
runner_q = MCTS_Runner(OthelloDataNode, agent=agent_q)

### Instantiate a game record object

In [15]:
game_record = GameRecord(name='tutorial')

#### Define intial state of the game to simulate

In [16]:
new_board = OthelloHelper.new_board()
init_state = [stateType(data=new_board, player=1) for _ in range(state_memory_n)]

Record the first initital state to game record

In [17]:
game_record.append(None, init_state)

### Actually start the playing sequence

Black (player P) goes first. Set runner_p to initial state

In [18]:
runner_p.set_state(init_state)

In [19]:
# no prior node (will explain later)
prior_node = None

Start sampling

In [20]:
sampled_node = runner_p.start_mcts(
  tree_depth   = tree_depth_p, 
  allowed_time = allowed_time_p,
  prior_node   = prior_node
)

Get the child nodes

In [21]:
child_nodes = sampled_node.children

In [22]:
print('number of child nodes: %d' % (len(child_nodes),))
print('namely: %s' % str([_.name for _ in child_nodes]))

number of child nodes: 4
namely: ['(2, 3)', '(3, 2)', '(4, 5)', '(5, 4)']


#### (Advanced) Simulation reusal
The idea is that we should not throw away previously simulations.

In player P's simulation just now, player Q's possible future actions are also already simulated (as a sub-tree). These simulation results can be stored as prior nodes for reuse in the next (or future) steps.

Here we demonstrate only storing the first layer children nodes.

(note: while this technique makes sense for self-play, one might want to consider if player P's simulation result is suitable to be used as prior for player Q's strategy, if they are different.)

Note that to allow this technique to work, every node must have a key to uniquely represent its state, so that it could be stored and identified.

In [23]:
child_nodes[0].str_encode()

'138e357d0df24da3a6ee71b3122a1abd'

In [24]:
for node in child_nodes:
    print('name:', node.name, '|', 'key:', node.str_encode())

name: (2, 3) | key: 138e357d0df24da3a6ee71b3122a1abd
name: (3, 2) | key: 2ab92d0403cf6032cfdeb989c6515fd9
name: (4, 5) | key: d0de1149a3873df6b98960b1835be433
name: (5, 4) | key: 88c62fa98159de64e06e6d5828dbfb02


#### Initialize a repo (orderd dictionary) to record sampled nodes

In [25]:
from collections import OrderedDict
prior_nodes_repo = OrderedDict()

Update to the repo

In [26]:
for _node in child_nodes:
    key = _node.str_encode()
    prior_nodes_repo.update({key:  _node})
# end for

#### choose a node using the tree search result

In [27]:
# by default, choose_most_visited
sel_node = runner_p.choose_node(child_nodes)

In [28]:
sel_node.action

(2, 3)

#### Record the action-state info to record-book

In [29]:
game_record.append(sel_node.action, sel_node.state)

#### Check if end-game has been reached

In [30]:
OthelloGame.get_winner(game_record)

Not yet (winner is None)

### A small summary section here

Confidence of the action will win the game

In [31]:
confidence = sel_node.success / sel_node.visits
print('confidence of winning: %4.1f%%' % (confidence*100,))

confidence of winning: 50.0%


Also have a look of all other options

In [32]:
mcts_prob = {node.action: node.success / node.visits for node in child_nodes}
for action, val in mcts_prob.items():
    print('action=%s confidence=%4.1f%%' % (action, val*100,))

action=(2, 3) confidence=50.0%
action=(3, 2) confidence=36.2%
action=(4, 5) confidence=38.5%
action=(5, 4) confidence=38.8%


### Now to white's turn
Set the state to the selected node (by player P)

In [33]:
runner_q.set_state(sel_node.state)

#### See whether this node has been sampled before

In [34]:
# get the key
key = sel_node.str_encode()

In [35]:
# find in the repo
prior_node = prior_nodes_repo.get(key, None) 

In [36]:
prior_node

OthelloDataNode('/root/(2, 3)', Game=<class 'rules.Othello.OthelloGame'>, action=(2, 3), data={}, player=-1, state=[<MCTS.utils.stateType object at 0x7ff30f8c3240>, <MCTS.utils.stateType object at 0x7ff30f8c32e8>, <MCTS.utils.stateType object at 0x7ff30f800438>], success=55, visits=110)

In [37]:
print(prior_node.print_tree())

-(2, 3): 55/110
 -(2, 2): 10/27
  -(2, 1): 3/6
  -(3, 2): 1/4
  -(4, 5): 7/9
  -(5, 4): 5/8
 -(2, 4): 12/30
  -(1, 5): 3/6
  -(2, 5): 5/7
  -(3, 5): 1/4
  -(4, 5): 4/6
  -(5, 5): 5/7
 -(4, 2): 29/53
  -(5, 1): 7/13
  -(5, 2): 5/12
  -(5, 3): 3/9
  -(5, 4): 5/12
  -(5, 5): 1/7


This node has already been sampled before in previous MCTS run. Use this as the prior node.

Start sampling for player Q

In [38]:
sampled_node = runner_q.start_mcts(
  tree_depth   = tree_depth_q, 
  allowed_time = allowed_time_q,
  prior_node   = prior_node
)

Repeat the same sequence for Q ...

In [39]:
child_nodes = sampled_node.children
for _node in child_nodes:
    key = _node.str_encode()
    prior_nodes_repo.update({key:  _node})
# end for
sel_node = runner_q.choose_most_visited(child_nodes)
game_record.append(sel_node.action, sel_node.state)

#### See if end-game has been reached

In [40]:
winner = OthelloGame.get_winner(game_record)
winner

Not yet


Pass back the state to player P, and repeat again, until end-game reached.

In [41]:
runner_p.set_state(sel_node.state)

In [42]:
## ...