Skip to content

Leela Training

James Horsfall Thomas edited this page Aug 16, 2019 · 97 revisions

Network Links

[my networks at first used the T# system as on the main training runs, but that got confusing so now they are all renamed J# and an index that typically corresponds to thousands of training steps. e.g. J7-10 would indicate my 7th training run at its 10,000th training step]


Scroll down to find the newest T40B nets - now posting (T40B.2) and B.3 coming soon.


256x20b: T40B.1 is a continuation of T40 training with experimental training conditions. Starting net is 42770. All T40 training games from LR 0.0007 on were combined into one huge training window (about 12M games) and trained with starting LR 0.002 gradually decreasing to 0.00005. The hypothesis is that normal training on progressive windows learns positions from later games better than earlier ones, and that the megawindow approach will learn evenly (though it discards the benefit of reinforcement enjoyed in the parent T40 run). I also converted to a WDL head and did my usual partial 7-man table base rescoring, so these are possible confounders for interpretation. Nets labeled T40B.1 to indicate branch 1 of T40 training. Pure zero (all training on T40 games). Low node Elo estimates now complete, shown below net links. I removed the older nets - ping me on Discord if you want them (you don't, they aren't as good).

T40B.1 TRAINING FINISHED net 106 is the last net

800 fixed node matches for T40B.1 nets played against parent net 42770 (~best T40):

   # PLAYER              :  RATING  ERROR   POINTS  PLAYED   (%)     W      L      D
   1 lc0.net.T40B-86     :    15.3    8.0   1564.5    3000    52   638    509   1853
   2 lc0.net.T40B-94     :    14.7    8.0   1562.0    3000    52   662    538   1800
   3 lc0.net.T40B-82     :    14.6    7.9   1561.5    3000    52   652    529   1819
   4 lc0.net.T40B-106    :    14.1    7.7   1559.5    3000    52   661    542   1797
   5 lc0.net.T40B-40     :    11.4    7.8   1548.0    3000    52   636    540   1824
   6 lc0.net.T40B-78     :    10.9    7.9   1546.0    3000    52   651    559   1790
   7 lc0.net.T40B-98     :     9.7    8.1   1541.0    3000    51   644    562   1794
   8 lc0.net.T40B-102    :     7.5    7.8   1531.5    3000    51   651    588   1761
   9 lc0.net.T40B-90     :     7.1    7.7   1530.0    3000    51   621    561   1818
  .... (trimmed)
  17 lc0.net.42770       :     0.0   ----  24824.0   50800    49  9473  10625  30702


T40B.2 also used the megawindow approach of T40B.1 but with a cosine annealing cyclical LR. The method and logic is close to that described in https://towardsdatascience.com/https-medium-com-reina-wang-tw-stochastic-gradient-descent-with-restarts-5f511975163, except that the peak LR at each cycle is gradually decreased. The cycle is 12k steps in length for 8 cycles followed by 10k steps at the lowest LR. Only the final net is posted as it appears to be the best based on 800 node tests. No obvious improvement on B.1.

800 fixed node matches for best T40B.2 net against parent net 42770 (~best T40):

   # PLAYER                :  RATING  ERROR  POINTS  PLAYED   (%)    W    L     D  CFS(%)
   1 lc0.net.T40B.2-106    :    15.2    7.6  1564.0    3000    52  642  514  1844     100
   2 lc0.net.42770         :     0.0   ----  1436.0    3000    48  514  642  1844     ---

T40B.3 was similar to B.1 except a bigger training batch size. This did not appear to improve on B.1 and no nets are posted.


T40B.4 is is similar to B.1 except for a smaller training batch size. Now complete. [nota bene there was a clerical error in Elo assignments, now fixed]


192x16b: 192 filters, 16 blocks, SE (ratio 6), WDL value head, conv policy head, trained on T40 games


320x24b: 320 filters, 24 blocks, SE (ratio 10), WDL head head, conv policy head, trained on T40 games. Also known as the 'Terminator' series.


J13 last net at LR 0.2: J13-50


256x16b: 256 filters, 16 blocks, SE (ratio 8), WDL value head, conv policy head, trained on T40 games (training on hold to favor other experiments)

Clone this wiki locally
You can’t perform that action at this time.