### Evaluation of the implemented players

This document explains how the different players have been evaluated, and presents the results of that evaluation.

#### Scoring mechanism

The players are scored using a version of the Elo rating system. This works by assigning to each
player $i$ a score $\beta_i$ such that the probability that player $i$ wins against player $j$ can
be estimates as $w_{ij} = \sigma \left(s (b + \beta_i - \beta_j) \right)$. Here $s$ can be chosen
arbitrarily, but for these evaluations $s$ was chosen to be $s = \frac{\ln(3)}{200}$, such that a
200 point difference results in a 75% win probability for the stronger player. The bias parameter
$b$ is added to account for the fact that in general the player that plays first has a higher chance
of winning. For the offset, the random player has been fixed to a score of $0$.

The scoring was carried out using the `score.py` script. It reads game results generated using the
`evaluate.py` script, and uses PyTorch to find the parameters $\beta$ and $b$ using logistic
regression such that they minimize the binary cross entropy loss (i.e., they maximize the
likelihood) using the Adam optimizer.

For better grounding of the players, the intermediate training results of the AlphaZero algorithm
are also included in the evaluation. They are however grouped into 10k step sizes in order to get
more stable results.

#### Results

The following shows the results of the evaluation.

In [1]:
!python ../score.py --log data/games.log --no-progress --out data/scores.json \
    --fuse 0 1000 2000 5000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000 160000 170000 180000 190000 200000  \
    --fix random=0 \
    -- azero200000 policynn200000 valuenn200000 mcts minimax minimax2 random human


target    azero200 policynn valuenn2     mcts minimax2  minimax    human   random
 azero200      nan  0.50000  0.75000  1.00000  1.00000  1.00000      nan  1.00000
 policynn  0.50000      nan  0.31944  1.00000  0.85714  1.00000  1.00000  1.00000
 valuenn2  0.25000  0.68056      nan  1.00000  1.00000  0.85714  1.00000  1.00000
     mcts  0.00000  0.00000  0.00000      nan  0.87500  1.00000      nan  1.00000
 minimax2  0.00000  0.14286  0.00000  0.12500      nan  0.70000      nan  1.00000
  minimax  0.00000  0.00000  0.14286  0.00000  0.30000      nan      nan  1.00000
    human      nan  0.00000  0.00000      nan      nan      nan      nan  1.00000
   random  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000      nan

pred      azero200 policynn valuenn2     mcts minimax2  minimax    human   random
 azero200  0.50000  0.58255  0.73689  0.95674  0.96770  0.98105  0.99595  0.99994
 policynn  0.41745  0.50000  0.66743  0.94065  0.95550  0.97376  0.99435  0.99992
 valuenn2  0.2