-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dynamic contempt for MultiPV #1491
Fix dynamic contempt for MultiPV #1491
Conversation
Well, that was fast ! |
Use `rootMoves[PVIdx].previousScore` instead of `bestValue` for dynamic contempt. This is equivalent for MultiPV=1 (bench remained the same, even for higher depths), but more correct for MultiPV. STC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2657 W: 1079 L: 898 D: 680 http://tests.stockfishchess.org/tests/view/5aaa47cb0ebc590297330403 LTC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2390 W: 874 L: 706 D: 810 http://tests.stockfishchess.org/tests/view/5aaa593a0ebc59029733040b No functional change for MultiPV=1
0863cde
to
a09b817
Compare
Aha, I had no idea dyn. contempt might not be working correctly with Multi PV even though I have been using it for a while. Seems like a good catch, thanks! |
Approved, under the condition that we do extra verification steps to catch any decrease of quality in the multiPV moves and MultiPV lines. See for instance this thread in early February about a MultiPV bug not caught by a standard test in the framework: #1392 So we have to be extra careful |
I will repeat what I said last time : I think the easiest way to do that is to run MultiPV tests at a reduced ["Skill"] setting, for example 17 , so that the secondary lines are sometimes played. |
@snicolet I agree. I know very little about MultiPV search (and actually chess programming in general :) but since I didn't change any inner logic of the search, just tweaked the evaluation, I think SPRT tests should cover it. Either way, I welcome anyone to do/suggest any tests that seem relevant (@syzygy1, perhaps?). Due to the extremely low draw rate with MultiPV, I was wondering if I should also test at higher TC (say, 180+1.8) to verify that the test conclusiveness wasn't influenced by the low draw rate... @Hanamuke Good idea, I'll try that at LTC. |
@WOnder93 Just so you know, enabling skill sets MultiPV at 4 if it was lower, so your test actually runs at MultiPV=4. Of course it's not a problem since it's the same for base and test. |
@Hanamuke Oh, you're right! I resubmitted the test to avoid confusion (since the test wasn't running yet). |
@WOnder93 joergoser is probably right, it would explain the observation. Also it's normal that skill would reduce the observed Elo worth of a patch, maybe skill=19 would bring enough diversity i don't know. |
Ah, so it is not just 'not needed', but 'not working', I didn't realize that... I will decrease the priority of the running test (for now) and start a new one. |
OK, so once I fixed the misquoted option, the test passed really fast:
http://tests.stockfishchess.org/tests/view/5aabccee0ebc5902997ff006 |
To be on the safe side, I ran a VLTC test with MultiPV=3 at 240+2.4" and it also got green quickly:
http://tests.stockfishchess.org/tests/view/5aaf983e0ebc5902a182131f As for the concerns about breaking the PV output, I will just repeat that this patch only tweaks the dynamic contempt value, and thus any regression should be revealed by the usual SPRT tests. @snicolet I am now waiting for your verdict :) |
@WOnder93 • from the starting position for White This is what Ronald did in #1392 to catch the bug of the previous MultiPV improvement :-) |
@snicolet I don't know what exactly I should look for but at least I don't see any obvious problem in the output: Start position, depth 12, master:
Start position, depth 12, patch:
Start position, depth 24, master:
Start position, depth 24, patch:
Full output attached for further inspection: |
@WOnder93 Thanks! |
FYI, this fishcooking thread made me curious so I ran a few more tests around dynamic contempt & MultiPV. It turns out that dynamic contempt (as implemented in master) is a significant regression for MultiPV (test). With the fix in this PR, it changes from regression to improvement (test). The same holds for multithreaded MultiPV (test for master, test for this PR). I found another significant improvement in multi-threaded MultiPV (per-thread dynamic contempt) – I will submit it in a separate PR after this one is merged. Note: although the ELO differences seem huge, they are probably inflated by the nature of Skill Level / MultiPV search, so I don't think they can be reasonably compared with classic ELO strength. |
Use rootMoves[PVIdx].previousScore instead of bestValue for dynamic contempt. This is equivalent for MultiPV=1 (bench remained the same, even for higher depths), but more correct for MultiPV. STC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2657 W: 1079 L: 898 D: 680 http://tests.stockfishchess.org/tests/view/5aaa47cb0ebc590297330403 LTC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2390 W: 874 L: 706 D: 810 http://tests.stockfishchess.org/tests/view/5aaa593a0ebc59029733040b VLTC 240+2.4 (MultiPV=3): LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 2399 W: 861 L: 694 D: 844 http://tests.stockfishchess.org/tests/view/5aaf983e0ebc5902a182131f LTC (MultiPV=4, Skill Level=17): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 747 W: 333 L: 175 D: 239 http://tests.stockfishchess.org/tests/view/5aabccee0ebc5902997ff006 Note: although the ELO differences seem huge, they are inflated by the nature of Skill Level / MultiPV search, so I don't think they can be reasonably compared with classic ELO strength. See #1491 for some verifications searches with MultiPV = 10 at depths 12 and 24 from the starting position and the position after 1.e4, comparing the outputs of the full PV by the old master and by this patch. No functional change for MultiPV=1
Merged via 367304e, thanks! |
Use rootMoves[PVIdx].previousScore instead of bestValue for dynamic contempt. This is equivalent for MultiPV=1 (bench remained the same, even for higher depths), but more correct for MultiPV. STC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2657 W: 1079 L: 898 D: 680 http://tests.stockfishchess.org/tests/view/5aaa47cb0ebc590297330403 LTC (MultiPV=3): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 2390 W: 874 L: 706 D: 810 http://tests.stockfishchess.org/tests/view/5aaa593a0ebc59029733040b VLTC 240+2.4 (MultiPV=3): LLR: 2.96 (-2.94,2.94) [0.00,5.00] Total: 2399 W: 861 L: 694 D: 844 http://tests.stockfishchess.org/tests/view/5aaf983e0ebc5902a182131f LTC (MultiPV=4, Skill Level=17): LLR: 2.95 (-2.94,2.94) [0.00,5.00] Total: 747 W: 333 L: 175 D: 239 http://tests.stockfishchess.org/tests/view/5aabccee0ebc5902997ff006 Note: although the ELO differences seem huge, they are inflated by the nature of Skill Level / MultiPV search, so I don't think they can be reasonably compared with classic ELO strength. See official-stockfish#1491 for some verifications searches with MultiPV = 10 at depths 12 and 24 from the starting position and the position after 1.e4, comparing the outputs of the full PV by the old master and by this patch. No functional change for MultiPV=1
Use
rootMoves[PVIdx].previousScore
instead ofbestValue
fordynamic contempt. This is equivalent for MultiPV=1 (bench remained the
same, even for higher depths), but more correct for MultiPV.
STC (MultiPV=3):
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 2657 W: 1079 L: 898 D: 680
http://tests.stockfishchess.org/tests/view/5aaa47cb0ebc590297330403
LTC (MultiPV=3):
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 2390 W: 874 L: 706 D: 810
http://tests.stockfishchess.org/tests/view/5aaa593a0ebc59029733040b
VLTC 240+2.4 (MultiPV=3):
LLR: 2.96 (-2.94,2.94) [0.00,5.00]
Total: 2399 W: 861 L: 694 D: 844
http://tests.stockfishchess.org/tests/view/5aaf983e0ebc5902a182131f
LTC (MultiPV=4, Skill Level=17):
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 747 W: 333 L: 175 D: 239
http://tests.stockfishchess.org/tests/view/5aabccee0ebc5902997ff006
No functional change for MultiPV=1