Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification Formula Simplification #5026

Closed
wants to merge 1 commit into from

Conversation

FauziAkram
Copy link
Contributor

@FauziAkram FauziAkram commented Jan 31, 2024

Simplifying the verification formula, which was added in order to better detect zugzwang positions.

Passed STC:
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 60160 W: 15529 L: 15336 D: 29295
Ptnml(0-2): 241, 6702, 16005, 6887, 245
https://tests.stockfishchess.org/tests/view/65b78dffc865510db027530e

Passed LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 159414 W: 39970 L: 39891 D: 79553
Ptnml(0-2): 95, 17369, 44694, 17460, 89
https://tests.stockfishchess.org/tests/view/65b80835c865510db0275b73

Benchmark with zugzwang-suite (see #1338 ), single core, max 45 secs per position:
Patch solves 31 out of 37 - (After the study that follows changed to 32 out of 37)
Master solves 32 out of 37

However, there is alot to be said about this epd suite after I studied it and our master and this patch interaction with it.

Below are the important notes that I was able to deduct.

Position #17: Both variations were able to solve it, but tbh master solved it much quicker than patch.
Position #18 : None of the two variations were able to solve it
Position #19 : None of the two variations were able to solve it
Position #21 : The epd solution here is wrong, as it states that Ke3 and Kd3 are both valid solutions, but in reality, only Kd3 leads to white to win, while Ke3 leads only to draw, and by the way both master and patch didn't see the win, master opted for Ke3 with eval of 0.00 and patch opted for Nc3 also with eval 0.00, but none of them was able to see that Kd3 wins (And epd should be adjusted to remove Ke3 as a valid solution)
Position #28 : Both variations saw the win, but both of them opted to waste two extra plys with a repetition, opting for Nc3+ Kb4 Nd5+ Ka4, and then going for the valid solution with f4, and to be honest I don't see anything wrong with it, as long as the win was foreseen.
Position #33 : Both variations similarly opted for Kd7, which similarly to position 28, also leads to a win, but with a few extra plys, and both of them get stuck for a long time when analyzing this position (@peregrineshahin so, since you are investigating this SF issue, you can also use this position as testing material)
Position #36: Both engines similarly opted for Kb2, with fail high evaluations, and tbh even though I didn't do a deep study, I have the feeling that also here the epd solutions are not correct, as they only give Kb3 and Rf5 as valid solutions, but based on a quick check, it seems that also Kb2 is a valid solution after all.

Anyway in conclusion, I didn't observe huge differences in zugzwang detection for the proposed patch, the only two negative points I observed were:

  1. Position number 17 slower solution
  2. Position number 33 the patch version seems to me that it got stuck for a longer time than master, but I'm not sure also here.

Files:
Analyses.log
Analyses.txt

bench: 1492957
@FauziAkram FauziAkram changed the title Simplification Verification Formula Simplification Feb 1, 2024
@pb00067
Copy link

pb00067 commented Feb 2, 2024

Hm, I'm a bit reluctant to see this as a simplification.
Sure it's a simplification for computation, but from logical point of view it is not simplier IMO.
Both in master and patch the verification search is done with search depth = depth - R (search.cpp line 806)
So currently in master there's a simple and understandable rule:
-during verification search the nm-pruning gets disabled for the first 3/4 plies of the nominal search tree.
The patch turns this rule into something more complicated imo.
Anyhow i'm OK with this patch, important is that the zugzwang detection improves with higher search depth and this seem still to be guaranteed.

@vondele
Copy link
Member

vondele commented Feb 3, 2024

I'm also thinking along the lines of @pb00067 this is not really simplifying the logic here. NMP is using a depth - R as the base variable. So what remains is just a parameter change.

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Feb 3, 2024

I have some concerns about the logic in master for depth - R getting negative rather in the patch the value is always theoretically positive, I wonder about the reason how practically this doesn't happen.. as I'm getting a dbg_hit_on of 0% for R > depth on that particular line with very high depths suggesting that somehow for such high R relative to depth we practically never do the verification probably because of

            if (thisThread->nmpMinPly || depth < 15)
                return nullValue;

It would be probably more robust to have something like the bellow to prevent it from becoming negative
thisThread->nmpMinPly = ss->ply + 3 * (std::max(0, depth - R)) / 4;
but I'm not very sure if my concern is practically valid.

@FauziAkram
Copy link
Contributor Author

FauziAkram commented Feb 3, 2024

That's ok for me, if this PR get closed, I will try again the removal of the *3/4 parameter in the formula, as cj also suggested in Discord.

However, one important thing that I learned while diving into this, is that the zugzwang test suite that we have is not very useful if you just run it blindly because:

  1. It contains some mistakes.
  2. In other cases SF chooses the right solution, but without realizing that it's a winning line.
  3. In other cases SF chooses a different move, going for a single repetition before going to the correct move, but in the analysis it shows that he is able to identify the winning strategy.

So running an engine through the test suite reporting only the number of final solutions found given by the GUI is not a good practice.
The best would be to observe SF behavior on each position for the two different patches that we are testing (master, and the new potential patch).

@vondele vondele closed this Feb 3, 2024
@pb00067
Copy link

pb00067 commented Feb 5, 2024

@peregrineshahin

I wonder about the reason how practically this doesn't happen.

because it's mathematically prof:
-verification is only done with depth >= 16
-the first summand in std::min(int(eval - beta) / 144, 6) + depth / 3 + 4 is maximal 6

so we have 6 + 16/3 + 4
which is 15 meaning that R can reach maximal value of 15 while depth is minimum 16
so depth - R is always >= 1

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Feb 5, 2024

@peregrineshahin

I wonder about the reason how practically this doesn't happen.

because it's mathematically prof: -verification is only done with depth >= 16 -the first summand in std::min(int(eval - beta) / 144, 6) + depth / 3 + 4 is maximal 6

so we have 6 + 16/3 + 4 which is 15 meaning that R can reach maximal value of 15 while depth is minimum 16 so depth - R is always >= 1

Makes sense, probably the tune going from 14 to 15 to 16 helped you in proving this :P

@pb00067
Copy link

pb00067 commented Feb 5, 2024

@FauziAkram

to point 1:
the zugzwang test suite we use is AFAIK basically an expansion of the allgeuer-zugzwang testsuite which I found linked on chessprogramming wiki. An yes is contains some mistakes or at least some positions which imo aren't really based on zugzwang motif. We could build up a more correct and meaningful testsuite but it requires some effort.

to point 2:
generally speaking engines aim to find the best move and that's also enough.
we can't pretend the engine to predict the exact PV until the game is converted.
But I agree the suite, should only consist of positions where the zugzwang motiv is decisive to convert the game.

to point 3:
these positions have to be figured out one by one.
In a epd-position is it also possible to define multiple best continuations, so these epd-positions can probably be adapted to allow alternative moves.

@pb00067
Copy link

pb00067 commented Feb 7, 2024

@peregrineshahin

Makes sense, probably the tune going from 14 to 15 to 16 helped you in proving this :P
By the way, I hate it when tune sessions exploit this parameter by increasing it to help their patches passing.
Imo tuning sessions should leave this parameter as it is.

@peregrineshahin
Copy link
Contributor

I do also think that two tunes going from 14 to 15 to 16 have helped passing those VLTC tunes

By the way, I hate it when tune sessions exploit this parameter by increasing it to help their patches passing.
Imo tuning sessions should leave this parameter as it is.

@FauziAkram
Copy link
Contributor Author

which parameter are talking about, that changed with time from 14 to 15 to 16 ?

@peregrineshahin
Copy link
Contributor

if (thisThread->nmpMinPly || depth < 16) return nullValue:

Which means do not do verification search for depth < 16 and return the null move search value immediatly.

@pb00067
Copy link

pb00067 commented Feb 7, 2024

which parameter are talking about, that changed with time from 14 to 15 to 16 ?

Yes, that one. No hard feelings, but I think this parameter should be left out in tuning sessions,
otherwise it will be shifted up and up with time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants