Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

linrock · 2023-09-21T13:15:41Z

Creating this net involved:

a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized.
permuting L1 weights with FT optimization utility. Integrate with serialization. nnue-pytorch#254

A strong epoch after each training stage was chosen for the next. The 6 stages were:

1. 400 epochs, lambda 1.0, default LR and gamma
   UHOx2-wIsRight-multinet-dfrc-n5000 (135G)
     nodes5000pv2_UHO.binpack
     data_pv-2_diff-100_nodes-5000.binpack
     wrongIsRight_nodes5000pv2.binpack
     multinet_pv-2_diff-100_nodes-5000.binpack
     dfrc_n5000.binpack

2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12
   LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G)
     T60T70wIsRightFarseerT60T74T75T76.binpack
     test78-junjulaug2022-16tb7p.no-db.min.binpack
     test79-mar2022-16tb7p.no-db.min.binpack
     test80-dec2022-16tb7p.no-db.min.binpack

3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20
   leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack
     leela93-filt-v1.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test78-janfeb2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-apr2022-16tb7p.min.binpack
     test80-may2022-16tb7p.min.binpack

4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
   leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
     leela96-filt-v2.min.binpack
     dfrc99-16tb7p-filt-v2.min.binpack
     test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
     test79-may2022-16tb7p.filter-v6-dd.min.binpack
     test80-jun2022-16tb7p.filter-v6-dd.min.binpack
     test80-sep2022-16tb7p.filter-v6-dd.min.binpack
     test80-nov2022-16tb7p.filter-v6-dd.min.binpack
     test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
     test80-mar2023-2tb7p.v6-sk16.min.binpack
     test60-novdec2021-16tb7p.min.binpack
     test77-dec2021-16tb7p.min.binpack
     test78-aprmay2022-16tb7p.min.binpack
     test79-apr2022-16tb7p.min.binpack
     test80-may2023-2tb7p.min.binpack

5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 960 near the end of the first 800 epochs
   5af11540bbfe dataset: https://github.com/official-stockfish/Stockfish/pull/4635

6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28
   Increased max-epoch to 1000 near the end of the first 800 epochs
   1ee1aba5ed dataset: https://github.com/official-stockfish/Stockfish/pull/4782

L1 weights permuted with:

python3 serialize.py $nnue $nnue_permuted \
  --features=HalfKAv2_hm \
  --ft_optimize \
  --ft_optimize_data=/data/fishpack32.binpack \
  --ft_optimize_count=10000

Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:

sf_base =  1329051 +/-   2224 (95%)
sf_test =  1163344 +/-   2992 (95%)
diff    =  -165706 +/-   4913 (95%)
speedup = -12.46807% +/- 0.370% (95%)

Training data can be found at:
https://robotmoon.com/nnue-training-data/

Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3

Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71

Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20

Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11

Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17

bench 1246812

Creating this net involved: - a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized. - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Sopel97 · 2023-09-21T14:47:07Z

So I see this wasn't trained with official-stockfish/nnue-pytorch#259 ? I don't see the benefits at this point. The process stays complicated, net gets larger, but the gains are within noise. I'd be in favor of this only if it simplifies the training process. Even it's as little as getting rid of these large interleaved binpacks. Could we maybe simplify with the current arch?

linrock · 2023-09-21T15:40:44Z

So I see this wasn't trained with official-stockfish/nnue-pytorch#259 ?

Correct, this net is based on a training that started in May, about 2 months before that PR was opened.

The process stays complicated, net gets larger, but the gains are within noise.

These are simplifications vs. the current L1-2048 master training run:

uses fully minimized binpacks for the first 5 stages
no longer uses 800GB+ shuffled binpacks
uses the new ftperm.py for permuting L1 weights
no longer collects activations from a custom stockfish binary for a 2-step weight permutation

It's possible there would have been more elo gains if this training had instead used larger/more-randomized binpacks and the previous complicated weight permutation process.

I'd be in favor of this only if it simplifies the training process. Even it's as little as getting rid of these large interleaved binpacks. Could we maybe simplify with the current arch?

If a simpler training process can't pass SPRT, what's the criteria for whether it can be accepted?

There's a balance between simplifying training and maximizing elo. Have to pick 2 of 3:

maximize elo (best for users)
simplify training (best for trainers)
time-efficient research

As long as gaining elo is top priority, training simplifications will naturally follow sometime later unless one is willing to spend significant time trying to optimize both at once.

Sopel97 · 2023-09-21T16:28:32Z

At this pace we will end up with a 8192-wide L1 before anyone else is able to reproduce the network

vondele · 2023-09-22T05:45:03Z

Let me post an additional measurement Sopel did https://tests.stockfishchess.org/tests/view/650c77c6fb151d43ae6d51dd showing master net is roughly 30 Elo stronger than an old master net with simpler training procedure. I believe that shows that significant progress has indeed been made, i.e. the training protocol is complex and the data sets large, but the Elo results are quite impressive.

The larger network sizes has so far shown quite consistently good scaling with TC, i.e. seemingly growing benefit at larger TC, which is consistent with intuition, and are clearly strong at fixed nodes. This could be contributing to the good performance in some of the ongoing tournaments. Reducing nps is actually also a good thing when in comes to hash pressure, i.e. less hash is needed for the same analysis time.

Having said all these positive things on the evolution of the nets, clearly, picking up training for new contributors, or people who had a break in training (like myself), is pretty difficult. It is essential that we are able to keep the process reproducible, and simple enough that we can improve on it. While I think linrock does a great job in describing in words what the process is, and providing the needed data, this really is a software engineering task. Ideally, the whole process could be reproduced starting from a single declarative file (e.g. a json that documents all datasets and parameters). Our easy_train.py is a first step, and I know we have pending PRs on nnue-pytorch that make good steps in that direction (e.g. official-stockfish/nnue-pytorch#257). I can only encourage this effort, and I will, in a couple of months, pick up training again.

mstembera · 2023-09-23T03:48:49Z

I am probably not the first to have this idea but we could have a second small/fast net to use for our simple eval when the material advantage already looks decisive.

vondele · 2023-09-23T04:54:23Z

yes, the idea is around, but nobody implemented and tried it.

Sopel97 · 2023-09-28T09:19:03Z

800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24
leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
leela96-filt-v2.min.binpack
dfrc99-16tb7p-filt-v2.min.binpack
test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
test79-may2022-16tb7p.filter-v6-dd.min.binpack
test80-jun2022-16tb7p.filter-v6-dd.min.binpack
test80-sep2022-16tb7p.filter-v6-dd.min.binpack
test80-nov2022-16tb7p.filter-v6-dd.min.binpack
test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack
test80-mar2023-2tb7p.v6-sk16.min.binpack
test60-novdec2021-16tb7p.min.binpack
test77-dec2021-16tb7p.min.binpack
test78-aprmay2022-16tb7p.min.binpack
test79-apr2022-16tb7p.min.binpack
test80-may2023-2tb7p.min.binpack

like, none of these files exist. How do I form this dataset

linrock · 2023-09-28T16:20:11Z

leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack
i believe is currently composed from subsets of these kaggle datasets:

https://www.kaggle.com/datasets/linrock/leela96-filt-v2-min
https://www.kaggle.com/datasets/linrock/dfrc99-16tb7p-filt-v2-min
leela96-filt-v2.min.binpack
dfrc99-16tb7p-filt-v2.min.binpack

https://www.kaggle.com/datasets/linrock/t80augtooctt79aprt78aprtosep-v6-mar2023min
test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack

https://www.kaggle.com/datasets/linrock/0dd1cebea57-misc-v6-dd
test79-may2022-16tb7p.filter-v6-dd.min.binpack
test80-jun2022-16tb7p.filter-v6-dd.min.binpack

https://www.kaggle.com/datasets/linrock/0dd1cebea57-test80-v6-dd/versions/2
test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.binpack
test80-nov2022-16tb7p.filter-v6-dd.min.binpack
test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack

https://www.kaggle.com/datasets/linrock/test80-mar2023-2tb7p-v6-sk16
test80-mar2023-2tb7p.v6-sk16.min.binpack

https://www.kaggle.com/datasets/linrock/nn-1e7ca356472e-t60-t79
test60-novdec2021-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test77-dec2021-16tb7p-84p
test77-dec2021-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test78-aprmayjunjul2022-16tb7p
test78-aprmay2022-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/test79-apr2022-16tb7p
test79-apr2022-16tb7p.min.binpack

https://www.kaggle.com/datasets/linrock/1ee1aba5ed-test80-martojul2023-2tb7p
test80-may2023-2tb7p.min.binpack

The filenames may vary a bit between this description and whatever was uploaded to kaggle. Aside from small differences in filenames, the main things to notice are:

the leela test run and the month (ie. test80-may2023)
whether or not the dataset was filtered (ie. filter-v6, v6, v6-dd)

linrock · 2023-09-28T16:31:48Z

also I know the dataset situation is quite messy. It would be amazing if we could host public datasets by simply rsync'ing onto a remote server. That would free up a lot of time for keeping the datasets tidy.

Unfortunately having to manually manage data for uploading to kaggle is kind of a grind. It's currently hard to prioritize keeping the dataset simple vs. elo gainer research, since i'm handling the datasets mostly manually and large portions of the dataset are constantly changing.

This is a later epoch from the same experiment that led to the previous master net. In training stage 6, max-epoch was raised to 1,200 near the end of the first 1,000 epochs. For more details, see official-stockfish#4795 Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep1079 : 15.6 +/- 1.2 Passed STC: https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 29408 W: 7607 L: 7304 D: 14497 Ptnml(0-2): 97, 3277, 7650, 3586, 94 Passed LTC: https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 73164 W: 18828 L: 18440 D: 35896 Ptnml(0-2): 30, 7749, 20644, 8121, 38 bench 1306282

This is a later epoch from the same experiment that led to the previous master net. In training stage 6, max-epoch was raised to 1,200 near the end of the first 1,000 epochs. For more details, see #4795 Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep1079 : 15.6 +/- 1.2 Passed STC: https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 29408 W: 7607 L: 7304 D: 14497 Ptnml(0-2): 97, 3277, 7650, 3586, 94 Passed LTC: https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 73164 W: 18828 L: 18440 D: 35896 Ptnml(0-2): 30, 7749, 20644, 8121, 38 closes #4810 Bench: 1453057

Disservin added bench-change Changes the bench 🚀 gainer Gains elo labels Sep 22, 2023

vondele added the to be merged Will be merged shortly label Sep 22, 2023

vondele closed this in 70ba9de Sep 22, 2023

linrock mentioned this pull request Sep 25, 2023

Increase L1 size to 2560 for SFNNv8 nets official-stockfish/nnue-pytorch#266

Merged

linrock mentioned this pull request Sep 29, 2023

Update default net to nn-0000000000a0.nnue #4810

Closed

linrock mentioned this pull request Oct 4, 2023

Add SFNNv8 architecture and diagrams official-stockfish/nnue-pytorch#267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

linrock commented Sep 21, 2023

Sopel97 commented Sep 21, 2023 •

edited

linrock commented Sep 21, 2023

Sopel97 commented Sep 21, 2023

vondele commented Sep 22, 2023

mstembera commented Sep 23, 2023

vondele commented Sep 23, 2023

Sopel97 commented Sep 28, 2023

linrock commented Sep 28, 2023

linrock commented Sep 28, 2023

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795

Conversation

linrock commented Sep 21, 2023

Sopel97 commented Sep 21, 2023 • edited

linrock commented Sep 21, 2023

Sopel97 commented Sep 21, 2023

vondele commented Sep 22, 2023

mstembera commented Sep 23, 2023

vondele commented Sep 23, 2023

Sopel97 commented Sep 28, 2023

linrock commented Sep 28, 2023

linrock commented Sep 28, 2023

Sopel97 commented Sep 21, 2023 •

edited