Game playing AIs make use of an abstraction called a game tree. Each node in the tree represents a possible game state. The root of the game tree represents the current state of the game as presented to the AI. Each of the available choices from the root leads to a new child node. Each of these nodes can then be further expanded and so on.
For most games, the game tree is very large, therefore the objective of the AI is to explore a subset of the game tree to determine the best move to make.
Monte-Carlo Tree Search (MCTS) is a kind of new game playing AI developed in 2006. Compared to the traditional Minimax algorithm, MCTS has the advantage that it does not need any domain knowledge, works well on games with large branching factors, and be made to run in a fix amount of time. However, MCTS is unable to discover intricate lines of play that can lead to a win.
Each iteration of MCTS consist of the following four steps:
- selection: move down the game tree until we reach an explored part of the tree
- simulation: "explore" the tree by playing a random game (both players make moves at random until someone loses)
- expansion: add one node to the game tree
- backpropagation: update the nodes visited based on result of the simulation
We execute as many iterations as time permits and then select the best move using a final move selection strategy.
In the selection step, we decide on the part of the tree to be explored. Starting from the root of the tree, we select one of its children and iteratively move down the tree until we reach a node that has an unexplored child. From a given node P, we follow the UCT strategy of selecting the child C of P that maximizes C.v + E x sqrt(ln(P.n) / C.n), where C.v is the value at node C, C.n is the visit count of node P, and E is a constant that is used to vary the trade-off between exploration (exploring more widely) and exploitation (exploring in greater depth the "good" parts of the tree).
In the simulation step, we make use of a simulation strategy to play the game until completion, returning 1 if the AI player wins and -1 otherwise. Since the game engine already eliminates choices that are not useful, our simulation strategy simply selects one of the possible choices randomly with equal probability. We can also make use of some domain knowledge in the simulation step to select promising moves more frequently.
After the simulation, the previously unexplored child now has a visit count of 1 and it is added to the game tree. In each iteration, we expand the game tree by a single node.
From the result of the simulation, we increase the visit count of all nodes we passed through in the selection step and we compute their new value as the average of the results of all simulated games made through this node.
We select the child of the root node that has the highest visit count. Other methods include the Secure Child strategy used in MC-LOA which selects the child C that maximizes C.v + 1/sqrt(C.n)
Root parallelization is a simple and easy way to utilize multiple core. The idea is create several instances of the MCTS algorithm and each one builds up its own game tree independently. At the end, we combine the data gathering for the children of the root and select the best move.
Instead of always playing till the end of the game in a simulation, we can make use of an evaluation function (such as the one developed in the Minimax implementation) to terminate early when the result of the evaluation function is above/below a certain threshold. Alternatively, we can simulate random moves for a certain number of steps (3 turns) and then result of the evaluation function. This helps to reduce the time taken for a single simulation and increase the total number simulations we can execute.