Skip to content

Update default net to nn-60fa44e376d9.nnue #4314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

linrock
Copy link
Contributor

@linrock linrock commented Dec 30, 2022

Created by retraining the master net on the previous best dataset with additional filtering. No new data was added.

More of the Leela-dfrc_n5000.binpack part of the dataset was pre-filtered with depth6 multipv2 search to remove bestmove captures. About 93% of the previous Leela/SF data and 99% of the SF dfrc data was filtered. Unfiltered parts of the dataset were left out. The new Leela T80 oct+nov data is the same as before. All early game positions with ply count <= 28 were skipped during training by modifying the training data loader in nnue-pytorch.

Trained in a similar way as recent master nets, with a different nnue-pytorch branch for early ply skipping:

python3 easy_train.py \
  --experiment-name=leela93-dfrc99-filt-only-T80-oct-nov-skip28 \
  --training-dataset=/data/leela93-dfrc99-filt-only-T80-oct-nov.binpack \
  --start-from-engine-test-net True \
  --nnue-pytorch-branch=linrock/nnue-pytorch/misc-fixes-skip-ply-lteq-28 \
  --gpus="0," \
  --start-lambda=1.0 \
  --end-lambda=0.75 \
  --gamma=0.995 \
  --lr=4.375e-4 \
  --tui=False \
  --seed=$RANDOM \
  --max_epoch=800 \
  --network-testing-threads 20 \
  --num-workers 6

For the exact training data used: https://robotmoon.com/nnue-training-data/
Details about the previous best dataset: #4295

Local testing at a fixed 25k nodes:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
Local Elo: run_0/nn-epoch779.nnue : 5.1 +/- 1.5

Passed STC
https://tests.stockfishchess.org/tests/view/63adb3acae97a464904fd4e8
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 36504 W: 9847 L: 9538 D: 17119
Ptnml(0-2): 108, 3981, 9784, 4252, 127

Passed LTC
https://tests.stockfishchess.org/tests/view/63ae0ae25bd1e5f27f13d884
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 36592 W: 10017 L: 9717 D: 16858
Ptnml(0-2): 17, 3461, 11037, 3767, 14

bench 4015511

bench 4015511
@Sopel97
Copy link
Member

Sopel97 commented Dec 31, 2022

Since this network was not trained on positions with ply <= 28, which excludes a sizeable portion of early game, I'd like to see how it does from very shallow books.

@XInTheDark
Copy link
Contributor

Since this network was not trained on positions with ply <= 28, which excludes a sizeable portion of early game, I'd like to see how it does from very shallow books.

Or, for research purposes, without books at all?

@Sopel97
Copy link
Member

Sopel97 commented Dec 31, 2022

DFRC startpos test in progress https://tests.stockfishchess.org/tests/view/63ae135a5bd1e5f27f13d9d2

@dubslow
Copy link
Contributor

dubslow commented Dec 31, 2022

Since this network was not trained on positions with ply <= 28, which excludes a sizeable portion of early game, I'd like to see how it does from very shallow books.

That is an excellent question. As per Discord, that test is now running, rescheduling the LTC passing test to an STC test on noob_2moves: https://tests.stockfishchess.org/tests/view/63aff90a8585daa0aef17b36

In addition, for the record, it has been established that this PR net, which I'll call "linrock2", has significant problems at DFRC relative to current master "linrock1": https://tests.stockfishchess.org/tests/view/63ae135a5bd1e5f27f13d9d2

That said, another test yesterday compared linrock1 vs the previous master net, "pre-linrock", and that test showed that linrock1 gained nearly as much DFRC elo as classical (1-2) compared to pre-linrock: https://tests.stockfishchess.org/tests/view/63aeb94f331d5fca51138029

So these latter two tests make it clear that whatever training linrock2 had, especially filtering on earlygame positions, cost some DFRC elo, but also that no such effect occurred in linrock1, i.e. the DFRC loss is unique to linrock2.

I look forward to seeing the result of Sopel's test suggestion. It should shine further light on the effects of skipping earlygame positions in training... hopefully there's no real problem. I suppose that even if this noob_2moves test fails, that it should be merged regardless, but I guess we'll have to wait and see what comes of it.

@dubslow
Copy link
Contributor

dubslow commented Dec 31, 2022

Actually, for what it's worth, linrock had two other nets pass STC at the ~same time and elo as this PR, at least one of which had less aggressive earlygame ply skipping:

cool, this one [290] has somewhat less d6 pv2 filtering on the dataset and skips early plies less aggressively (skip first 16) whereas 60f skips the first 28 plies

@vondele
Copy link
Member

vondele commented Jan 1, 2023

So, summary of tests:

  1. Clear Elo gainer on our standard book (both STC and LTC)
  2. Clear loss on the DFRC book (STC: -8.99 [-12.57,-5.44])
  3. No win on noob_2moves at STC (-0.44 [-1.31,0.45]), but essentially Elo neutral.

I think 1) is good progress, and is what we measure. 2) and 3) can be expected from the fact that game plies below 28 are skipped in training, i.e. typical opening knowledge is not presented during training.

It is quite interesting that this is Elo neutral even at STC. If we led the engine think a little longer, i.e. such that depth 28 is reached, is fully expect this net to be stronger also from the opening position. DFRC is not a priority for the project, and I feel that, as long as the DFRC training data is part of the training data set, we're giving it enough weight.

I'll thus proceed with merging this net.

@vondele vondele closed this in be9bc42 Jan 1, 2023
@dubslow
Copy link
Contributor

dubslow commented Jan 1, 2023

I agree that merging is best, altho I suspect we should be a bit more hesitant about the strength of the early ply skipping as on this one in the future. As stated, the other two nets also passed STC and are apparently neutral-to-this-PR on LTC, and they had less early ply skipping, so the early ply skipping was not an essential portion of the elo gained here. So hopefully the next batch or two of nets will have no more early ply skipping than this one.

@vondele
Copy link
Member

vondele commented Jan 1, 2023

sure, I would have nothing against reducing that a bit for a next Elo gaining net :-)

vondele pushed a commit to vondele/Stockfish that referenced this pull request Jan 2, 2023
This is a later epoch (epoch 859) from the same experiment run that trained yesterday's master net nn-60fa44e376d9.nnue (epoch 779). The experiment was manually paused around epoch 790 and unpaused with max epoch increased to 900 mainly to get more local elo data without letting the GPU idle.

nn-60fa44e376d9.nnue is from official-stockfish#4314
nn-335a9b2d8a80.nnue is from official-stockfish#4295

Local elo vs. nn-335a9b2d8a80.nnue at 25k nodes per move:
experiment_leela93-dfrc99-filt-only-T80-oct-nov-skip28
run_0/nn-epoch779.nnue (nn-60fa44e376d9.nnue) : 5.0 +/- 1.2
run_0/nn-epoch859.nnue (nn-a3dc078bafc7.nnue) : 5.6 +/- 1.6

Passed STC vs. nn-335a9b2d8a80.nnue
https://tests.stockfishchess.org/tests/view/63ae10495bd1e5f27f13d94f
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 37536 W: 10088 L: 9781 D: 17667
Ptnml(0-2): 110, 4006, 10223, 4325, 104

An LTC test vs. nn-335a9b2d8a80.nnue was paused due to nn-60fa44e376d9.nnue passing LTC first:
https://tests.stockfishchess.org/tests/view/63ae5d34331d5fca5113703b

Passed LTC vs. nn-60fa44e376d9.nnue
https://tests.stockfishchess.org/tests/view/63af1e41465d2b022dbce4e7
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 148704 W: 39672 L: 39155 D: 69877
Ptnml(0-2): 59, 14443, 44843, 14936, 71

closes official-stockfish#4319

bench 3984365
@Sopel97
Copy link
Member

Sopel97 commented Jan 6, 2023

Can you provide a way to get the exact dataset you used for training?

@dav1312
Copy link
Contributor

dav1312 commented Jan 6, 2023

Can you provide a way to get the exact dataset you used for training?

This?

For the exact training data used: robotmoon.com/nnue-training-data

@Sopel97
Copy link
Member

Sopel97 commented Jan 6, 2023

But how is the data combined

@linrock
Copy link
Contributor Author

linrock commented Jan 6, 2023

Combined with interleave_binpacks.py

@Sopel97
Copy link
Member

Sopel97 commented Jan 6, 2023

Would there be a way for you to provide the respective unfiltered datasets? I want to try and change the encoding to maximize compression. Also, how were the individual parts from each dataset combined (it shows as ~100-300MB pieces on kaggle. if also interelave then was it a 1 or 2 step process?)?

@linrock
Copy link
Contributor Author

linrock commented Jan 6, 2023

Would there be a way for you to provide the respective unfiltered datasets?

Sure, I can upload the exact unfiltered portion later. It's the Leela-dfrc_n5000.binpack dataset from #4100 with the Leela/Farseer and DFRC binpacks each split into 1000 pieces. Something like this:

python shuffle_binpack.py T60T70wIsRightFarseerT60T74T75T76.binpack leela-data/data-piece.binpack 1000
python shuffle_binpack.py dfrc_n5000.binpack dfrc-data-piece/dfrc-piece.binpack 1000

The Leela T80 data was effectively unfiltered. The part of this process that breaks binpack compression is the use of --nnue-best-move=true when using the lc0 rescorer to convert Leela data into .plain data files. Removing positions with castling flags significantly reduced the final binpack data file size (by ~20-30%) I suppose by removing contiguous sequences of positions from each training game.

Also, how were the individual parts from each dataset combined (it shows as ~100-300MB pieces on kaggle. if also interelave then was it a 1 or 2 step process?)?

I believe I used a 2-step process here. I've used both and forgot the exact one for this experiment run. I think whatever leads to a more shuffled binpack will yield slightly higher elo.

The 2-step process would be:

# Re-combine filtered pieces of T60T70wIsRightFarseerT60T74T75T76.binpack and dfrc_n5000.binpack
python interleave_binpacks.py filtered-leela-data/*.binpack leela93-filtered-only.binpack
python interleave_binpacks.py filtered-dfrc-data/*.binpack dfrc99-filtered-only.binpack

# Combine each distinct dataset segment into one
python interleave_binpacks.py \
  leela93-filtered-only.binpack \
  dfrc99-filtered-only.binpack \
  T80.oct2022.bestmove.binpack \
  T80.nov2022.bestmove.binpack \
  leela93-dfrc99-filt-only-T80-oct-nov.binpack

@Sopel97
Copy link
Member

Sopel97 commented Jan 6, 2023

The part of this process that breaks binpack compression is the use of --nnue-best-move=true when using the lc0 rescorer to convert Leela data into .plain data files.

Hmm... that's what I was afraid of. Ideally we would still store played moves and set score to 30002, but requires doing the filtering the lc0 rescorer.

edit. I think it can be fixed with enough work, if I have all the data available I'll try to.

@linrock
Copy link
Contributor Author

linrock commented Jan 11, 2023

Uploaded the unfiltered training data and posted 2 links at the bottom here:
https://robotmoon.com/nnue-training-data/

Each of the 1,000 data pieces in the unfiltered datasets should map 1-to-1 with data pieces in the master datasets, which contain a mix of filtered and unfiltered pieces. Otherwise, Leela-dfrc_n5000.binpack alone should be equivalent to the unfiltered (not d6pv2 filtered) parts of the datasets.

edit. I think it can be fixed with enough work, if I have all the data available I'll try to.

The raw T80 oct+nov data I used include all the tar files containing 202210 and 202211 in the filenames here:
https://storage.lczero.org/files/training_data/test80/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants