# Content
* [Model Architecture](#Model-Architecture)
* [Optimizations](#Optimizations)
* [DEMO](#DEMO)

## MiniGo

MiniGo is an opensource minimalist Go engine modeled after AlphaGo Zero, which is a system that learns how to play Go at a superhuman level given only the rules of the game.

Brief history about DeepMind's AlphaGo/AlphaGo Zero/AlphaZero:
* AlphaGo(w/ training dataset) --> AlphaGo Zero(wo/ training dataset) --> AlphaZero(General AI for all board games)

<img src="./img/minigo_intro.JPG" width="600"/><figure>AlphaGo Zero</figure>

Reference: https://www.newyorker.com/science/elements/how-the-artificial-intelligence-program-alphazero-mastered-its-games

## Model Architecture
### Overall train loop architecture of MiniGo
<img src="./img/minigo_model_arch.png" width="800"/><figure>MiniGo Model Architecture</figure>

### Deep network architecture based on residual blocks
<img src="./img/minigo_nn_arch.png" width="800"/><figure>Neural Network Architecture</figure>

### MCTS architecture (Select->Expand->Evaluate->Backup/Backpropagation)
<img src="./img/minigo_mcts_arch.png" width="800"/><figure>Monte Carlo Tree Search in MiniGo</figure>

## Optimizations

### Distributed Monte Carlo Tree Search
* Use the rsync tool to synchronize signal, model, flags files
* Make full use of the physical cores of all nodes in HOSTLIST for distributed MCTS

``` bash
for NODE in $HOSTLIST; do 
  if [ $NODE != $ROOTNODE ]; then
    ml_perf/scripts/loop.sh rsync -r --append --delete-before /root/mlperf/results-19/flags root@$NODE:/root/mlperf/results-19/ 2>&1 > /dev/null &
    ml_perf/scripts/loop.sh rsync -r --append --delete-before /root/mlperf/results-19/signal root@$NODE:/root/mlperf/results-19/ 2>&1 > /dev/null &
    ml_perf/scripts/loop.sh rsync -r --append --delete-before /root/mlperf/results-19/models root@$NODE:/root/mlperf/results-19/ 2>&1 > /dev/null &
  fi
done
```

### Enable early stop and evaluate target metric during train loop
``` python
def post_train(state):
    model_path = os.path.join(FLAGS.model_dir, state.train_model_name)
    selfplay_pb = model_path + '.pb'

    if (FLAGS.hostlist != None):
        golden_chunk_dir = FLAGS.golden_chunk_local_dir
    else:
        golden_chunk_dir = FLAGS.golden_chunk_dir

    dual_net.optimize_graph(model_path + '.pb',
        model_path,
        FLAGS.quantization,
        golden_chunk_dir + '/*.zz')

    dst_minigo_file = model_path + '.minigo'
    wait(checked_run([
         'python3', 'convert.py',
         '--flagfile={}'.format(os.path.join(FLAGS.flags_dir, 'architecture.flags')),
         '--input_graph={}'.format(selfplay_pb),
         '--dst={}'.format(dst_minigo_file)]))

    if FLAGS.eval or state.iter_num > 50:
        logging.info('started evaluation minigo model for iter %s of total %s, last iter winrate is %s', state.iter_num, FLAGS.iterations, state.win_rate)
        minigo_model_path=f"/root/mlperf/results-19/models/0000{state.iter_num}.minigo"
        state.win_rate=evaluate_model(minigo_model_path)
        logging.info('finished evaluation minigo model for iter %s, winrate is %s', state.iter_num, state.win_rate)
        if state.win_rate >= FLAGS.winrate:
            logging.info('we have found minigo model better than target!!! iter %s winrate %s', state.iter_num, state.win_rate)
            metric_file = open("../../result/metric.txt", "wt")
            n = metric_file.write(str(state.win_rate))
            metric_file.close()
    
    #resume selfplay
    os.remove(FLAGS.pause)


def evaluate_model(eval_model_path):
    processes = []
    for i, device in enumerate(FLAGS.devices):
        a = i * FLAGS.num_games // len(FLAGS.devices)
        b = (i + 1) * FLAGS.num_games // len(FLAGS.devices)
        num_games = b - a;
        
        env = os.environ.copy()
        env['CUDA_VISIBLE_DEVICES'] = device
        processes.append(checked_run([
            'numactl',
            '--physcpubind={}'.format(i),
            'bazel-bin/cc/eval',
            '--flagfile={}'.format(os.path.join(FLAGS.flags_dir, 'eval.flags')),
            '--eval_model={}'.format(eval_model_path),
            '--target_model={}'.format(FLAGS.target),
            '--sgf_dir={}'.format(FLAGS.sgf_dir),
            '--parallel_games={}'.format(num_games),
            '--eval_device=cpu',
            '--target_device=cpu',
            '--verbose=false'], env, False))
    all_output = wait(processes)

    total_wins = 0
    total_num_games = 0
    for output in all_output:
        lines = output.split('\n')

        eval_stats, target_stats = parse_win_stats_table(lines[-7:])
        num_games = eval_stats.total_wins + target_stats.total_wins
        total_wins += eval_stats.total_wins
        total_num_games += num_games

    win_rate = total_wins / total_num_games
    logging.info('Win rate %s vs %s: %.3f', eval_stats.model_name,
                 target_stats.model_name, win_rate)
    
    return win_rate

def parse_win_stats_table(lines):
    result = []
    while True:
        # Find the start of the win stats table.
        assert len(lines) > 1
        if 'Black' in lines[0] and 'White' in lines[0] and 'passes' in lines[1]:
            break
        lines = lines[1:]

    # Parse the expected number of lines from the table.
    for line in lines[2:4]:
        result.append(WinStats(line))

    return result
```

### Fine-tune MCTS hyperparameters to trade-off exploration and exploitation
* Trade-off MCTS exploration and exploitation:
       If fastplay_frequency > 0, tree search is modified as follows:
         - Each move is either a "low-readout" fast move, or a full, slow move.
         The percent of fast moves corresponds to "fastplay_frequency"
         - A "fast" move will:
           - Reuse the tree
           - Not mix noise in at root
           - Only perform 'fastplay_readouts' readouts.
           - Not be used as a training target.
         - A "slow" move will:
           - Clear the tree (*not* the cache).
           - Mix in dirichlet noise
           - Perform 'num_readouts' readouts.
           - Be noted in the Game object, to be written as a training example.
* Main application class of MiniGo MCTS Selfplayer:
``` cpp
  Selfplayer(
          /* Inference flags */
          std::string n_model,
          int32_t cache_size_mb,
          int32_t cache_shards,
          /* Tree search flags */
          int32_t num_readouts,
          double fastplay_frequency,
          int32_t fastplay_readouts,
          int32_t virtual_losses,
          double dirichlet_alpha,
          double noise_mix,
          double value_init_penalty,
          bool target_pruning,
          double policy_softmax_temp,
          bool allow_pass,
          int32_t restrict_pass_alive_play_threshold,
          /* Threading flags. */
          int32_t num_selfplay_threads,
          int32_t num_parallel_search,
          int32_t num_parallel_inference,
          int32_t num_concurrent_games_per_thread,
          /* Game flags. */
          uint64_t seed,    
          double min_resign_threshold,
          double max_resign_threshold,
          double disable_resign_pct,
          int32_t num_games,
          bool run_forever,
          std::string abort_file,
          /* Output flags. */
          double holdout_pct,
          std::string output_dir,
          std::string holdout_dir,
          std::string sgf_dir,
          bool verbose,
          int32_t num_output_threads,
          /* Sample Fraction for output. */
          double sample_frac);
```


## HPO with SDA (Smart Democratization Advisor)
SDA config
```
model_parameter:
  project: sda
  experiment: minigo
  parameters:
    - name: train_batch_size
      grid:
        - 512
        - 1024
        - 2048
        - 4096
        - 8192
      type: int
  metrics:
  - name: winrate
    objective: maximize
  - name: training_time
    strategy: optimize
    objective: minimize
```

request suggestions from SDA

```python
suggestion = self.conn.experiments(self.experiment.id).suggestions().create()
```

## Framework related optimization
### Enable parallel MCTS(Monte Carlo Tree Search) across all physical cores in the cluster
Remark:
* Use rsync to synchronize signal, trained model, and mcts generated datset
* Enable numa binding to fully utilize all physical cores in the cluster
<img src="./img/minigo_optimized_system_arch.JPG" width="800"/><figure>Optimized MiniGo System Architecture</figure>

### Enable early stop during the train loop to leverage fast converge
Remark:
* 71 iteration number reduced to 55 iteration number
* Baseline
** Fixed iteration number and cannot guarantee convergence
* Our Optimized MiniGo
** Guarantee convergence and can fast converge when meet target metric

### Fine-tuning MCTS hyperparameters
Remark:
* fine-tune MCTS hyperparameters, including concurrent_games_per_thread, fastplay_frequency, num_readouts, fastplay_readouts
* best found MCTS hyperparameters:
** concurrent_games_per_thread=6
** fastplay_frequency=0.82
** num_readouts=600
** fastplay_readouts=60
* MCTS performance without finetune:
<img src="./img/mcts_baseline_speed.JPG" width="800"/><figure>Baseline MCTS</figure>
* MCTS performance with finetune:
<img src="./img/mcts_tuned_speed.JPG" width="800"/><figure>Finetuned MCTS</figure>


# DEMO
* [Environment Setup](#Environment-setup)
* [Launch training](#Launch-training)

## Environment setup

* Firstly, ensure that intel oneapi-hpckit and minigo conda runtime installed on server.
* Secondly, enter AIOK repo directory.
* Thirdly, start the jupyter notebook service.

``` bash
source /opt/intel/oneapi/setvars.sh --force
conda activate minigo
cd e2eAIOK
pip install jupyterlab
jupyter notebook --notebook-dir=./ --ip=0.0.0.0 --port=8888 --allow-root
```
* Now you can visit AIOK MiniGo demo in http://${hostname}:8888/

Remark:
* Since MiniGo is a reinforcement learning model, it generates training dataset in each iteration during train loop and doesn't need dataset. We evaluate winrate with target model(based on MLPerf submission) and final winrate>=0.5, and all our optimizations guarantee that target metric.

* public reference on AlphaGo Zero: https://arxiv.org/abs/1712.01815

* public reference on MiniGo: https://openreview.net/forum?id=H1eerhIpLV

## Launch training

In [3]:
!cd ../../.. && source /opt/intel/oneapi/setvars.sh --force && python run_e2eaiok.py --data_path /root/dataset/minigo --model_name minigo --conf conf/e2eaiok_defaults_minigo_example.conf

 
:: initializing oneAPI environment ...
   bash: BASH_VERSION = 4.4.20(1)-release
   args: Using "$@" for setvars.sh arguments: --force
:: clck -- latest
:: compiler -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: inspector -- latest
:: itac -- latest
:: mpi -- latest
:: tbb -- latest
:: oneAPI environment initialized ::
 
2022-09-15 16:48:11,938 - HYDRO.AI.SDA - INFO - ### Ready to submit current task  ###
2022-09-15 16:48:11,938 - HYDRO.AI.SDA - INFO - Model Advisor created
2022-09-15 16:48:11,938 - HYDRO.AI.SDA - INFO - Start to init sigopt
experiment: minigo
metrics:
- name: winrate
  objective: maximize
- name: training_time
  objective: minimize
  strategy: optimize
observation_budget: 1
parameters:
- grid:
  - 512
  - 1024
  - 2048
  - 4196
  - 8192
  name: train_batch_size
  type: int
project: hydro.ai
2022-09-15 16:48:12,810 - HYDRO.AI.SDA - INFO - model parameter initialized
2022-09-15 16:48:12,810 - HYDRO.AI.SDA - INFO - start to launch training
2022-09-15 16:

Loading: 
Loading: 0 packages loaded
Analyzing: 3 targets (0 packages loaded, 0 targets configured)
INFO: Analysed 3 targets (0 packages loaded, 0 targets configured).
INFO: Found 3 targets...
Loading: 
Loading: 0 packages loaded
Analyzing: 3 targets (0 packages loaded, 0 targets configured)
INFO: Analysed 3 targets (0 packages loaded, 0 targets configured).
INFO: Found 3 targets...
[0 / 1] [-----] BazelWorkspaceStatusAction stable-status.txt
INFO: Elapsed time: 0.336s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
[0 / 1] [-----] BazelWorkspaceStatusAction stable-status.txt
INFO: Elapsed time: 0.374s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
+++ for var_name in flag_dir golden_chunk_dir golden_chunk_tmp_dir holdout_dir log_dir model_dir selfplay_dir sgf_dir work_dir signal_dir
+++ clean_dir 

Make dir /root/mlperf/results-19/data/selfplay/000016/7/2019-12-24-13
Copying 1129 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000016/7/2019-12-24-13 to /root/mlperf/results-19/data/selfplay/000016/7/2019-12-24-13
Make dir /root/mlperf/results-19/data/selfplay/000016/1/2019-12-24-13
Copying 1269 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000016/1/2019-12-24-13 to /root/mlperf/results-19/data/selfplay/000016/1/2019-12-24-13
Make dir /root/mlperf/results-19/data/selfplay/000016/3/2019-12-24-13
Copying 1067 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000016/3/2019-12-24-13 to /root/mlperf/results-19/data/selfplay/000016/3/2019-12-24-13
Make dir /root/mlperf/results-19/data/selfplay/000017/0
Make dir /root/mlperf/results-19/data/selfplay/000017/1
Make dir /root/mlperf/results-19/data/selfplay/000017/2
Make dir /root/mlperf/results-19/data/selfplay/000017/3
Make dir /root/mlperf/results-19/data/selfplay/

Copying 636 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000014/7/2019-12-24-11 to /root/mlperf/results-19/data/selfplay/000014/7/2019-12-24-11
Make dir /root/mlperf/results-19/data/selfplay/000014/1/2019-12-24-11
Make dir /root/mlperf/results-19/data/selfplay/000014/1/2019-12-24-12
Copying 564 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000014/1/2019-12-24-12 to /root/mlperf/results-19/data/selfplay/000014/1/2019-12-24-12
Copying 693 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000014/1/2019-12-24-11 to /root/mlperf/results-19/data/selfplay/000014/1/2019-12-24-11
Make dir /root/mlperf/results-19/data/selfplay/000014/3/2019-12-24-11
Make dir /root/mlperf/results-19/data/selfplay/000014/3/2019-12-24-12
Copying 469 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000014/3/2019-12-24-12 to /root/mlperf/results-19/data/selfplay/000014/3/2019-12-24-12
Copying 605 files from /root/zhe

Make dir /root/mlperf/results-19/data/selfplay/000015/7/2019-12-24-12
Make dir /root/mlperf/results-19/data/selfplay/000015/7/2019-12-24-13
Copying 930 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000015/7/2019-12-24-12 to /root/mlperf/results-19/data/selfplay/000015/7/2019-12-24-12
Copying 172 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000015/7/2019-12-24-13 to /root/mlperf/results-19/data/selfplay/000015/7/2019-12-24-13
Make dir /root/mlperf/results-19/data/selfplay/000015/1/2019-12-24-12
Make dir /root/mlperf/results-19/data/selfplay/000015/1/2019-12-24-13
Copying 1085 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000015/1/2019-12-24-12 to /root/mlperf/results-19/data/selfplay/000015/1/2019-12-24-12
Copying 186 files from /root/zheng/dataset/minigo/checkpoints/mlperf07/data/selfplay/000015/1/2019-12-24-13 to /root/mlperf/results-19/data/selfplay/000015/1/2019-12-24-13
Make dir /root/mlperf/results-1

+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_11.log
++ hostname
++ hostname
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_17.log
++ hostname
++ hostname
++ hostname
++ hostname
++ hostname
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr141/sr141_selfplay_7.log
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_16.log
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_14.log
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_18.log
++ hostname
++ hostname
++ hostname
++ hostname
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_15.log
+ tee /root/mlperf/results-19/logs/selfplay/sr141/sr141_selfplay_14.log
+ tee /root/mlperf/results-19/logs/selfplay/sr142/sr142_selfplay_20.log
+ tee /root/mlperf/results-19/logs/selfplay/sr141/sr141_selfplay_16.log
++ hostname
+ tee /root/mlperf/results-19/logs/selfplay/sr141/sr141_selfplay_17.log
++ ho

[2022-09-15 16:49:00] sample_records finished: 12.389 seconds
[2022-09-15 16:49:00] /tmp/golden_chunks_tmp/000018-*-of-*.tfrecord.zz
[2022-09-15 16:49:00] ['/tmp/golden_chunks_tmp/000018-00000-of-00004.tfrecord.zz', '/tmp/golden_chunks_tmp/000018-00001-of-00004.tfrecord.zz', '/tmp/golden_chunks_tmp/000018-00002-of-00004.tfrecord.zz', '/tmp/golden_chunks_tmp/000018-00003-of-00004.tfrecord.zz']
--helpfull: No such file or directory
[2022-09-15 16:49:00] Running: scp  /tmp/golden_chunks_tmp/000018-00000-of-00004.tfrecord.zz  sr141:/tmp/golden_chunks
[2022-09-15 16:49:00] Running: scp  /tmp/golden_chunks_tmp/000018-00001-of-00004.tfrecord.zz  sr141:/tmp/golden_chunks
[2022-09-15 16:49:00] Running: scp  /tmp/golden_chunks_tmp/000018-00002-of-00004.tfrecord.zz  sr142:/tmp/golden_chunks
[2022-09-15 16:49:00] Running: scp  /tmp/golden_chunks_tmp/000018-00003-of-00004.tfrecord.zz  sr142:/tmp/golden_chunks
[2022-09-15 16:49:01] scp finished: 1.269 seconds
[2022-09-15 16:49:01] scp finished: 1.29

I0915 17:03:43.310590 139882633475904 graph_util_impl.py:524] Converted 77 variables to const ops.INFO:root:Training on 1 records: /tmp/golden_chunks/000018-00000-of-00004.tfrecord.zz to /tmp/golden_chunks/000018-00000-of-00004.tfrecord.zz
INFO:tensorflow:Using config: {'_model_dir': '/root/mlperf/results-19/work_dir', '_tf_random_seed': None, '_save_summary_steps': 128, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': intra_op_parallelism_threads: 17
inter_op_parallelism_threads: 2
gpu_options {
  allow_growth: true
}
, '_keep_checkpoint_max': 100, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 50, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_ma

[2022-09-15 17:03:44] numactl -N 0 -l python3 produce_min_max_log.py --input_graph={} --flagfile=/root/mlperf/results-19/flags/architecture.flags --data_location=/tmp/golden_chunks/*.zz --num_steps=5 --batch_size=16 --random_rotation=True
2022-09-15 17:03:44.995955: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-15 17:03:45.010873: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2600000000 Hz
2022-09-15 17:03:45.017198: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55747f1fc5e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-15 17:03:45.017243: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-1

[2022-09-15 17:03:58] Converted graph file is saved to: /root/mlperf/results-19/models/000018.pb
[2022-09-15 17:03:59] Running: python3  convert.py  --bool_features=1  --conv_width=64  --dst=/root/mlperf/results-19/models/000018.minigo  --fc_width=64  --input_features=mlperf07  --input_graph=/root/mlperf/results-19/models/000018.pb  --input_layout=nhwc  --summary_steps=128  --trunk_layers=6  --value_cost_weight=0.5
[2022-09-15 17:04:01] convert finished: 1.714 seconds
;conv2d_1/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][2.69726038]
;conv2d_2/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][4.28749228]
;conv2d_3/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][3.34066153]
;conv2d_4/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][6.16337395]
;conv2d_5/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][3.46617103]
;conv2d_6/Conv2D_eightbit_requant_range__print__;__requant_min_max:[0][7.54678202]
;conv2d_7/Conv2D_eightbit_req

[2022-09-15 17:19:08] Waiting for 8192 games in /tmp/selfplay/000018 (found 1580)
[2022-09-15 17:19:40] Waiting for 8192 games in /tmp/selfplay/000018 (found 1636)
[2022-09-15 17:20:12] Waiting for 8192 games in /tmp/selfplay/000018 (found 1694)
[2022-09-15 17:20:44] Waiting for 8192 games in /tmp/selfplay/000018 (found 1765)
[2022-09-15 17:21:15] Waiting for 8192 games in /tmp/selfplay/000018 (found 1818)
[2022-09-15 17:21:47] Waiting for 8192 games in /tmp/selfplay/000018 (found 1880)
[2022-09-15 17:22:19] Waiting for 8192 games in /tmp/selfplay/000018 (found 1939)
[2022-09-15 17:22:51] Waiting for 8192 games in /tmp/selfplay/000018 (found 2000)
[2022-09-15 17:23:23] Waiting for 8192 games in /tmp/selfplay/000018 (found 2067)
[2022-09-15 17:23:55] Waiting for 8192 games in /tmp/selfplay/000018 (found 2124)
[2022-09-15 17:24:26] Waiting for 8192 games in /tmp/selfplay/000018 (found 2195)
[2022-09-15 17:24:58] Waiting for 8192 games in /tmp/selfplay/000018 (found 2257)
[2022-09-15 17:2