# Introducing AlphaGo Zero
AlphaGo Zero is a computer program developed by DeepMind to play the board game Go. It is the successor to AlphaGo, the first computer program to defeat a professional human Go player, the first to defeat a Go world champion, and arguably the strongest Go player in history. The program became a [stronger player](https://en.wikipedia.org/wiki/Go_ranks_and_ratings#Comparison_of_rank_and_skill_level) than any human, and arguably the strongest Go player in history, after just a few days of self-training.  The cost for the hardware and computing power used to train AlphaGo Zero was estimated at over US$25 million.


This introduction is based on the Deepmind's Nature paper ["Mastering the game of Go without human knowledge"](https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ).

## AlphaGo Zero's differences from AlphaGo and AlphaGo Lee
AlphaGo Zero differs from AlphaGo and AlphaGo Lee in several aspects. First, AlphaGo Zero is trained solely by self-play reinforcement learning, starting from random play, without any supervision or use of human data. Second, it learns to master not just one game, but three: Go, shogi, and chess. Finally, in contrast to all previous game-playing programs, AlphaGo Zero's neural network is trained and evaluated from scratch, starting only from the game rules, with no human data used in any fashion. 

AlphaGo Zero simplified the board representation by only using the black and white stones in the board as features. It also uses a single neural network, rather than separate policy and value networks. The neural network takes as input a representation of the board position as a feature map, and outputs a vector of move probabilities and a scalar value evaluation of the current position. The neural network is trained to predict the probability of winning from each position, and the final move is selected by a Monte Carlo Tree Search (MCTS) algorithm. The MCTS rollouts are no longer executed. The Neural network itself evaluates move look-ahead.

## AlphaGo Zero's training
The training is exclusively done by self-play reinforcement learning, starting from random play, without any supervision or use of human data. From each position $s$ An MCTS search is conducted based on the move probabilities generated by the NN. This search allows the finding of much stronger moves than would normally be found by the neural network alone. The reinforcement learning used is a form of policy iteration procedure, where MCTS is used for both policy evaluation and policy improvement. 

### Policy iteration
Policy iteration is a general reinforcement learning method that iteratively improves a policy $\pi$ until it converges to an optimal policy $\pi^*$. The policy iteration procedure consists of two steps: policy evaluation and policy improvement. In the policy evaluation step, the value function $v_\pi$ of the current policy $\pi$ is computed. In the policy improvement step, the policy is improved by acting greedily with respect to the value function $v_\pi$. The policy iteration procedure is guaranteed to converge to the optimal policy $\pi^*$. 

# AlphaGo Zero's performance
Over the course of the Training, 29 million games of self-play where generated and from these games, the program was able to discover most of the Go knowledge that has been accumulated by human players over thousands of years. AlphaGo Zero reached an Elo rating of 5185, higher than the 4858 rating of AlphaGo Master. In fact AlphaGo Zero was able to beat AlphaGo Master 89 to 11 in a 100-game match to 2h limit in duration.

# AlphaZero's differences from AlphaGo Zero
AlphaZero differs from AlphaGo Zero in several aspects. First, AlphaZero learns to master not just one game, but three: Go, shogi, and chess. Second, AlphaZero learns to master these three games without any human knowledge or data. After only 34 hours of training, AlphaZero was able to defeat AlphaGo Zero 60 games to 40 in a 100-game match to 2h limit in duration. AlphaZero also defeated the world champion chess program Stockfish 8 in a 100-game match to 2h limit in duration. AlphaZero won 28 games, lost none, and drew 72 games.

# Leela Zero
[Leela Zero](https://zero.sjeng.org/) is an open-source, community-based project attempting to replicate the approach of AlphaZero. It is based on the Go engine Leela, and is a fork of the Leela Zero repository. The project started in April 2018, and is led by Belgian programmer Gian-Carlo Pascutto. The project is supported by donations from the community, and its network is open source, available under the GPLv3 license. The project is currently in beta phase, and has been used to run several distributed training runs, some of which have been used to generate a new network.
