-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Update default net to nn-60fa44e376d9.nnue #4314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bench 4015511
Since this network was not trained on positions with |
Or, for research purposes, without books at all? |
DFRC startpos test in progress https://tests.stockfishchess.org/tests/view/63ae135a5bd1e5f27f13d9d2 |
That is an excellent question. As per Discord, that test is now running, rescheduling the LTC passing test to an STC test on noob_2moves: https://tests.stockfishchess.org/tests/view/63aff90a8585daa0aef17b36 In addition, for the record, it has been established that this PR net, which I'll call "linrock2", has significant problems at DFRC relative to current master "linrock1": https://tests.stockfishchess.org/tests/view/63ae135a5bd1e5f27f13d9d2 That said, another test yesterday compared linrock1 vs the previous master net, "pre-linrock", and that test showed that linrock1 gained nearly as much DFRC elo as classical (1-2) compared to pre-linrock: https://tests.stockfishchess.org/tests/view/63aeb94f331d5fca51138029 So these latter two tests make it clear that whatever training linrock2 had, especially filtering on earlygame positions, cost some DFRC elo, but also that no such effect occurred in linrock1, i.e. the DFRC loss is unique to linrock2. I look forward to seeing the result of Sopel's test suggestion. It should shine further light on the effects of skipping earlygame positions in training... hopefully there's no real problem. I suppose that even if this noob_2moves test fails, that it should be merged regardless, but I guess we'll have to wait and see what comes of it. |
Actually, for what it's worth, linrock had two other nets pass STC at the ~same time and elo as this PR, at least one of which had less aggressive earlygame ply skipping:
|
So, summary of tests:
I think 1) is good progress, and is what we measure. 2) and 3) can be expected from the fact that game plies below 28 are skipped in training, i.e. typical opening knowledge is not presented during training. It is quite interesting that this is Elo neutral even at STC. If we led the engine think a little longer, i.e. such that depth 28 is reached, is fully expect this net to be stronger also from the opening position. DFRC is not a priority for the project, and I feel that, as long as the DFRC training data is part of the training data set, we're giving it enough weight. I'll thus proceed with merging this net. |
I agree that merging is best, altho I suspect we should be a bit more hesitant about the strength of the early ply skipping as on this one in the future. As stated, the other two nets also passed STC and are apparently neutral-to-this-PR on LTC, and they had less early ply skipping, so the early ply skipping was not an essential portion of the elo gained here. So hopefully the next batch or two of nets will have no more early ply skipping than this one. |
sure, I would have nothing against reducing that a bit for a next Elo gaining net :-) |
This is a later epoch (epoch 859) from the same experiment run that trained yesterday's master net nn-60fa44e376d9.nnue (epoch 779). The experiment was manually paused around epoch 790 and unpaused with max epoch increased to 900 mainly to get more local elo data without letting the GPU idle. nn-60fa44e376d9.nnue is from official-stockfish#4314 nn-335a9b2d8a80.nnue is from official-stockfish#4295 Local elo vs. nn-335a9b2d8a80.nnue at 25k nodes per move: experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28 run_0/nn-epoch779.nnue (nn-60fa44e376d9.nnue) : 5.0 +/- 1.2 run_0/nn-epoch859.nnue (nn-a3dc078bafc7.nnue) : 5.6 +/- 1.6 Passed STC vs. nn-335a9b2d8a80.nnue https://tests.stockfishchess.org/tests/view/63ae10495bd1e5f27f13d94f LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 37536 W: 10088 L: 9781 D: 17667 Ptnml(0-2): 110, 4006, 10223, 4325, 104 An LTC test vs. nn-335a9b2d8a80.nnue was paused due to nn-60fa44e376d9.nnue passing LTC first: https://tests.stockfishchess.org/tests/view/63ae5d34331d5fca5113703b Passed LTC vs. nn-60fa44e376d9.nnue https://tests.stockfishchess.org/tests/view/63af1e41465d2b022dbce4e7 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 148704 W: 39672 L: 39155 D: 69877 Ptnml(0-2): 59, 14443, 44843, 14936, 71 closes official-stockfish#4319 bench 3984365
Can you provide a way to get the exact dataset you used for training? |
This?
|
But how is the data combined |
Combined with interleave_binpacks.py |
Would there be a way for you to provide the respective unfiltered datasets? I want to try and change the encoding to maximize compression. Also, how were the individual parts from each dataset combined (it shows as ~100-300MB pieces on kaggle. if also interelave then was it a 1 or 2 step process?)? |
Sure, I can upload the exact unfiltered portion later. It's the
The Leela T80 data was effectively unfiltered. The part of this process that breaks binpack compression is the use of
I believe I used a 2-step process here. I've used both and forgot the exact one for this experiment run. I think whatever leads to a more shuffled binpack will yield slightly higher elo. The 2-step process would be: # Re-combine filtered pieces of T60T70wIsRightFarseerT60T74T75T76.binpack and dfrc_n5000.binpack
python interleave_binpacks.py filtered-leela-data/*.binpack leela93-filtered-only.binpack
python interleave_binpacks.py filtered-dfrc-data/*.binpack dfrc99-filtered-only.binpack
# Combine each distinct dataset segment into one
python interleave_binpacks.py \
leela93-filtered-only.binpack \
dfrc99-filtered-only.binpack \
T80.oct2022.bestmove.binpack \
T80.nov2022.bestmove.binpack \
leela93-dfrc99-filt-only-T80-oct-nov.binpack |
Hmm... that's what I was afraid of. Ideally we would still store played moves and set score to 30002, but requires doing the filtering the lc0 rescorer. edit. I think it can be fixed with enough work, if I have all the data available I'll try to. |
Uploaded the unfiltered training data and posted 2 links at the bottom here: Each of the 1,000 data pieces in the unfiltered datasets should map 1-to-1 with data pieces in the master datasets, which contain a mix of filtered and unfiltered pieces. Otherwise,
The raw T80 oct+nov data I used include all the tar files containing |
Created by retraining the master net on the previous best dataset with additional filtering. No new data was added.
More of the Leela-dfrc_n5000.binpack part of the dataset was pre-filtered with depth6 multipv2 search to remove bestmove captures. About 93% of the previous Leela/SF data and 99% of the SF dfrc data was filtered. Unfiltered parts of the dataset were left out. The new Leela T80 oct+nov data is the same as before. All early game positions with ply count <= 28 were skipped during training by modifying the training data loader in nnue-pytorch.
Trained in a similar way as recent master nets, with a different nnue-pytorch branch for early ply skipping:
For the exact training data used: https://robotmoon.com/nnue-training-data/
Details about the previous best dataset: #4295
Local testing at a fixed 25k nodes:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
Local Elo: run_0/nn-epoch779.nnue : 5.1 +/- 1.5
Passed STC
https://tests.stockfishchess.org/tests/view/63adb3acae97a464904fd4e8
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 36504 W: 9847 L: 9538 D: 17119
Ptnml(0-2): 108, 3981, 9784, 4252, 127
Passed LTC
https://tests.stockfishchess.org/tests/view/63ae0ae25bd1e5f27f13d884
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 36592 W: 10017 L: 9717 D: 16858
Ptnml(0-2): 17, 3461, 11037, 3767, 14
bench 4015511