New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced verify search for better zugzwang detecting #1338
Enhanced verify search for better zugzwang detecting #1338
Conversation
by disabling null-move-pruning for the side to move for first part of the remaining search tree. This helps to better recognize zugzwang. STC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 18220 W: 3379 L: 3253 D: 11588 http://tests.stockfishchess.org/tests/view/5a2fa6460ebc590ccbb8bc2f LTC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 41899 W: 5359 L: 5265 D: 31275 http://tests.stockfishchess.org/tests/view/5a2fcf440ebc590ccbb8bc47 For further detail see commit notes and discussion at 6401a80 bench: 5776193
This patch is clearly not a simplification, indeed it sensibly complicates the code. So I suppose you consider this a bug fix (otherwise you would have run the test with [0, 5]). This opens a (yet another) discussion on what is a bug fix. Adding a specific endgame because in a bunch of positions it works better then master is it a bug fix? Adding an ad-hoc rule in evaluation because in some reported positions SF plays better is it a bug fix? Traditionally in SF development we reserved the tag bugfix to patches that address a very specific (in many case) technical and code related fix. We purposely avoided tagging as bug fix patches that change evaluation/search behavior to address a position or a set of positions. In particular we traditionally require a test for the latter kind of patches to prove the change in behavior yields to a measurable gain. Measurable ELO gain is really the key here and the pillar above which all the SF development has been based along these years. Before SF was common for authors to add rules/changes supported by intuition or by set of position tests. I think the biggest improvement that SF brought to the field has been exactly this: only real games tests and only tests results as patch metric. It was not SF the first engine development using this (maybe Rybka), but I think was among the firsts and clearly has been instrumental in setting this method as the current standard and accepted one. If I let this patch to go in with a [-3, 1] than anything can go in and we open the door to a dangerous approach where patches are evaluated by consensus/intuition and not by objective and standard metrics. I really would like to keep the sound and time proven approach that if a patch is not a simplification than it has to prove itself with a real games test according to standard rules. |
The tests with the suite prove that the patch is an improvement in objective way. In this specific case, it is known that you will not have around 5 elo patch to pass a sprt with standard boundaries. |
In my opinion null move verification is like parts of the endgame code. It will pass a [-3,1] test if we remove the code (as has been shown), but an improvement will probably never pass a [0,5] or even a [0,4] test, because the code is only useful in specific situations and at best neutral in most cases. Therefore we should test for these specific situations while ensuring it does not cause overal regression. |
Marco, I don't see how the existence of verification is compatible with your argument. It adds code and doesn't gain elo. So either it shouldn't be there, or, if it should be there because it performs a function, then this patch improves that very function. |
Agree with @crossbr . If SF has to have the code (by exception) just implement the enhanced one already tested as better...if it is not the case make the non-regression sprt and remove it all together. |
As long as this pull request is open, personally I would like it if Joona and/or Gary could take a look at the patch? No rush of course, unless Marco intends to close this soon. Personally also, I would not mind myself if the patch was not approved -in some form!- but I would feel sorry for Guenther and all those people that actually use Stockfish. But it would make other versions of Stockfish less interesting so that is a very selfish reason... In other words, I really think it is time we do something about this before pushing for a new release. I just opened a thread to collect some testpositions https://groups.google.com/d/msg/fishcooking/3di82Xke8nc/_CTOCkv0CQAJ |
Verification search itself can be removed with [-3;1], as was tested by Joost. |
@mcostalba on Feb 12, 2014 you (re)-inserted verification-search without [0,5] test. From that I deduct that verification-search by itself as whole thing is indeed considered a bug fix. Please allow me to foreplay you following hypothetical scenario: 1. The old verification search 2. The proposed patch Which solution is better in the end is a matter of taste IMO. I understand that in your role as a maintainer you always have to pinpoint at the governing rules and have to be reluctant and skeptical about such a patch. I fully respect the guidelines and I agree that they are the pillars for success. Solely I don't think the standard rules are the proper tool to take a correct decision on this specific case, I also don't think commiting this patch would open a new can of worms, but of course it's not up to me to judge the consequences ;-) |
Also intuitively speaking this patch should scale better with longer TC because it mostly is effective in endgame. |
Ok I will merge the patch, I don't want to make the commit of a patch a personal thing, that's why there are rules. But maybe with 2018 it is time to start a more liberal approach at what will be committed. personally I think that in this way a lot of stuff that should remain out, will go in and the quality of the source code will get worse, but OTH a relaxing in the committing rules is maybe beneficial. It is anyhow a new thing. |
@mcostalba |
Thanks, I will fix bench at next commit. |
Amazing leadership skills and judgment exhibited by @mcostalba here. As a developer myself I always look forward to reading and learning from comments from you here in all the PRs that you analyze and commit. Being flexible in a way that leads to improvement of Stockfish is a good thing and this approach shud be embraced whenever a patch makes the engine better from the previous state. |
Hi @mcostalba , I liked that you commited the patch, as it solves a real problem. I am bit surprised by your other statements; SF has been doing quite well in the past years, so which problems are you trying to solve by becoming more liberal in committing? |
@Stefano80 in some way going more liberal will increase problems not reduce them :-) I see that the level of discussion here on github on any new patch is greatly increased compared to the past. Almost any new patch gets commented and many of them are also deeply analyzed and people share their opinions with valid arguments. This is a good thing (that was not existing in the past where I or Joona simply merged or drop a PR: any discussion was mainly between maintainers and author and also in that case there were very few discussions). I think now is better than in the past, and I want to encourage this. This is my main reason to go more liberal: it is not a technical thing, it is a community thing. |
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see #1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
by disabling null-move-pruning for the side to move for first part of the remaining search tree. This helps to better recognize zugzwang. STC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 18220 W: 3379 L: 3253 D: 11588 http://tests.stockfishchess.org/tests/view/5a2fa6460ebc590ccbb8bc2f LTC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 41899 W: 5359 L: 5265 D: 31275 http://tests.stockfishchess.org/tests/view/5a2fcf440ebc590ccbb8bc47 For further detail see commit notes and discussion at pb00068@6401a80 bench: 5776193
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
This reverts commit b53239d.
by disabling null-move-pruning for the side to move for first part of the remaining search tree. This helps to better recognize zugzwang. STC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 18220 W: 3379 L: 3253 D: 11588 http://tests.stockfishchess.org/tests/view/5a2fa6460ebc590ccbb8bc2f LTC: LLR: 2.96 (-2.94,2.94) [-3.00,1.00] Total: 41899 W: 5359 L: 5265 D: 31275 http://tests.stockfishchess.org/tests/view/5a2fcf440ebc590ccbb8bc47 For further detail see commit notes and discussion at pb00068@6401a80 bench: 5776193
1. avoid recursive call of verification. For the interested side to move recursion makes no sense. For the other side it could make sense in case of mutual zugzwang, but I was not able to figure out any concrete problematic position. Allows the removal of 2 local variables. 2. avoid further reduction by removing R += ONE_PLY; Benchmark with zugzwang-suite (see official-stockfish#1338), max 45 secs per position: Patch solves 33 out of 37 Master solves 31 out of 37 STC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 76188 W: 13866 L: 13840 D: 48482 http://tests.stockfishchess.org/tests/view/5a5612ed0ebc590297da516c LTC: LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 40479 W: 5247 L: 5152 D: 30080 http://tests.stockfishchess.org/tests/view/5a56f7d30ebc590299e4550e bench: 5340015
Update default net to nn-c3ca321c51c9.nnue
Preface:
A recent test and an older one both suggest that verification search brings nothing in terms of Elo's and if we stick to standard rules it could be simplified away.
However it looks like the community was afraid of SF being prone to zugzwang, so verification search was introduced as kind of remedy against this weakness.
Current verification search implementation is weak.
Anyhow we continue to see loosing SF due to zugzwang from time to time since the current verification search implementation IMO is weak. Why is it weak?
It suppresses null-move-pruning in current ply but nothing is done to avoid its recursive reuse in deeper plies for the interested side to move, so this simply allows SF to move the problem behind it's search-horizon when the zugzwang motif implies a larger sequence of forced 'worsening' moves.
This patch encounters this by disabling null-move-pruning for the side to move for first part of
the remaining search tree systematically. This helps to better recognize zugzwang as shown in following EPD suite Zugzwang-Allgeuer.zip which is an extended suite of the one reported at chessprogramming wiki.
Benchmarks made with pgo-builds, 1 thread, 512 MB hash, max. 40 seconds per position:
The 6 critical positions the patch is able to resolve within 40 secs (and the master not) are the following ones:
For further details about the patch and choosen test bounds please see commit notes and discussion at this commit .
Tested with no-regression bounds
STC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 18220 W: 3379 L: 3253 D: 11588
http://tests.stockfishchess.org/tests/view/5a2fa6460ebc590ccbb8bc2f
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 41899 W: 5359 L: 5265 D: 31275
http://tests.stockfishchess.org/tests/view/5a2fcf440ebc590ccbb8bc47
bench: 5776193