Skip to content
This repository was archived by the owner on Dec 11, 2020. It is now read-only.
This repository was archived by the owner on Dec 11, 2020. It is now read-only.

Answer some questions about batchsize.  #25

@yuandong-tian

Description

@yuandong-tian

Recently we have seen a lot of questions about ELF OpenGo in many forums. Here I try to answer some of them here.

Chinese version here

First, we sincerely thank LeelaZero team to convert our pre-trained v0 model to LeelaZero-compatible format, so that the Go community can verify its strength immediately via LeelaZero, by interactively playing with it. This shows that our experiments are reproducible and could truly benefit the community. We are truly happy with it.

One issue that LeelaZero team found is that OpenGo-v0 might not perform that well when the number of rollouts is small (e.g., 800 or 1600). This is because we use batching in MCTS: the network only receives a batch of rollouts (e.g., 8 or 16) before feed-forwarding. This substantially improves GPU efficiency (in M40 it is like 5.5s -> 1.2s per 1600 rollout), at the price of weakening the strength of the bot, in particular when the number of rollouts is small. This is because MCTS is intrinsically a sequential algorithm, and to maximize its strength, each rollout should be played after all the previous rollouts have been played and the Q values in each node have been updated. On the other hand, batching introduces parallel evaluation and reduces the effective number or total rollouts.

The solution is obviously simple: to reduce the batchsize when the number of rollouts is small. We suggest using batchsize=4 when total number of rollouts are 800 or 1600, which could make the thinking time longer. The default setting batchsize=16 is good only when the total number of rollouts are large (e.g., 80k). Note that larger batchsize might not help. The batchsize can be modified by switches --mcts_rollout_per_batch and --batchsize. Currently please just specify the same number for both switches (this is research code, so you know it).

image

Some people might wonder in our setting what happens for self-play. Indeed, there seems to be a dilemma if we only use 1.6k rollouts for self-play: small batchsize leads to GPU inefficiency, while large batchsize weakens the move. We solve it with ELF-specific design. For a selfplay process we spawn 32 concurrent games and a maximal batchsize of 128. Each concurrent game runs its own MCTS without any batching. When the rollout reaches the leaf, it sends the current game situations to ELF, and ELF dynamically batches game situations from multiple games together and hands the batch to PyTorch for network forwarding. This makes the batchsize a variable. During selfplay, the average batch size is around 90, which is good for overall GPU utility.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions