A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4
Branch: master
Clone or download
JernejHabjan and suragnair Rts game module implementation (#114)
* Added td2020 rts game

* Created documentation

* Finished PyGame

Finished human player for td2020 and pygame presentation.
Fixed canonical board return that now first copies board before changing values

* One Hot Encoding begining

And fixed some pit and exit errors

* Prepared one hot functions

* Implementing one hot encoding - have to fix dimensions in nnet wrapper

* Added Heal and Time Killer end conditions and finished one-hot encoder

* Updated Health decrease documentation

* Some minor time_killer fixes

* Creating board setup function

* Connected to ue4 with get_action module

* Added timeout from ue4

* Fixed tensor errors in ue4 - now its running slow but working

* Added current player input from game

* parsed async connection to ue4

* Attempt of garbage collect when executing script

* Added formula for calculating time killer function

* Code reformat and learning

* Adding stats support part 1

* Mcts no nnet and changed to timeout

* Mcts timer

* Optimized actions

* Removing mcts no nnet because it cannot exist

* Prepared for gathering minerals learning

* Gathering balance fix

* Actions multiple directions each untested

* Tested multiple actions

* Isolating matplotlib

* Updated py get action to not recieve print error

* Get Action on Json input

* Model loaded on startup, saving session variables

Session variables are saved for model, making prediction very fast. Also at the end of game, everything is cleared and destroyed.

* Ready to learn Gathering

* Fixing memory error

* Fixing memory error and writing learn config to file

Learning config is set to ALL with no starting npcs, 20 gold. Initial config is written to learning file in case of crash. In there is also written iteration and memory and time.

saving text examples using pickle is not supported, because that is causing crashing

* Some fixes for learning

* Added tensorboard support

* Gitignore for logs

* Remove unused actions, model selection fix,

* Grapher and comments for code and nnet change

* Graph fixes and grapher folder change

* Config class (not finished)

* Config class finished

User can now simply create and override new config in config.py, where he can only enter changes to config like "initial_board_config" or "player1_encoder"...

* Adding path for python cmd

* Disabling plot model because of pydot and graphviz import

* Gitkeep for temp folder with seperate config class for overrides

* Timeout seperate for each player

* Running different encoders from config for each pit model

* Patch for different encoders in coach

* NNet fix learn and grapher updates

* Updated graphers and prepared for first learn

onehot encoder, num_iters=10, num_eps=10, num_mcts_sims=30, epochs=100

* Some grapher changes while learning

* Update README.md

* Added releases

* Added learn config for encoder comparison

Fixed configuration for example of model comparison

* Updated graphing for 2 encoders and results of comparison

* Some grapher fixes

* Creating clean branch

* Added documentation and models

Documentation for all classes and function has been added, as well as install instructions for models

* Discarded changes other than rts folder

* Commit revert

* Othello revert

* Othello revert1

* Updated game description and contributors
Latest commit 32d5399 Feb 6, 2019


Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch, Keras, TensorFlow and Chainer. An accompanying tutorial can be found here. We also have implementations for GoBang and TicTacToe.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/{pytorch,keras,tensorflow,chainer}/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/{pytorch,keras,tensorflow,chainer}/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Choose your framework and game in main.py.

Docker Installation

For easy environment setup, we can use nvidia-docker. Once you have nvidia-docker set up, we can then simply run:


to set up a (default: pyTorch) Jupyter docker container. We can now open a new terminal and enter:

docker exec -ti pytorch_notebook python main.py


We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations. alt tag

A concise description of our algorithm can be found here.


While the current code is fairly functional, we could benefit from the following contributions:

  • Game logic files for more games that follow the specifications in Game.py, along with their neural networks
  • Neural networks in other frameworks
  • Pre-trained models for different game configurations
  • An asynchronous version of the code- parallel processes for self-play, neural net training and model comparison.
  • Asynchronous MCTS as described in the paper

Contributors and Credits

Thanks to pytorch-classification and progress.