Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantize eval to multiples of 16. #2733

Closed
wants to merge 1 commit into from

Conversation

vondele
Copy link
Member

@vondele vondele commented Jun 12, 2020

remove some excess precision, helps searchs.

Effectively reintroduces 45dbd9c,
with a slightly different context.

passed STC
LLR: 2.97 (-2.94,2.94) {-0.50,1.50}
Total: 197032 W: 37938 L: 37462 D: 121632
Ptnml(0-2): 3359, 22994, 45446, 23246, 3471
https://tests.stockfishchess.org/tests/view/5ee0c228f29b40b0fc95ae53

passed LTC
LLR: 2.94 (-2.94,2.94) {0.25,1.75}
Total: 77696 W: 9970 L: 9581 D: 58145
Ptnml(0-2): 530, 7075, 23311, 7340, 592
https://tests.stockfishchess.org/tests/view/5ee21426f29b40b0fc95af43

passed LTC SMP
LLR: 2.96 (-2.94,2.94) {0.25,1.75}
Total: 64136 W: 7425 L: 7091 D: 49620
Ptnml(0-2): 345, 5416, 20228, 5718, 361
https://tests.stockfishchess.org/tests/view/5ee387bbf29b40b0fc95b04c

Bench: 4562134

remove some excess precision, helps searchs.

Effectively reintroduces 45dbd9c,
with a slightly different context.

passed STC
LLR: 2.97 (-2.94,2.94) {-0.50,1.50}
Total: 197032 W: 37938 L: 37462 D: 121632
Ptnml(0-2): 3359, 22994, 45446, 23246, 3471
https://tests.stockfishchess.org/tests/view/5ee0c228f29b40b0fc95ae53

passed LTC
LLR: 2.94 (-2.94,2.94) {0.25,1.75}
Total: 77696 W: 9970 L: 9581 D: 58145
Ptnml(0-2): 530, 7075, 23311, 7340, 592
https://tests.stockfishchess.org/tests/view/5ee21426f29b40b0fc95af43

running LTC SMP
https://tests.stockfishchess.org/tests/view/5ee0c228f29b40b0fc95ae53

Bench: 4562134
@ssj100
Copy link

ssj100 commented Jun 12, 2020

Correct link for the "running LTC SMP":
https://tests.stockfishchess.org/tests/view/5ee387bbf29b40b0fc95b04c

@jhellis3
Copy link
Contributor

I think the question is: are the consequences worth 1.5 Elo?

@adentong
Copy link

adentong commented Jun 12, 2020

@jhellis3 Technically speaking, every single patch has consequences. Curious why you think the consequences of this are severe enough to overlook the apparent elo gain?

@jhellis3
Copy link
Contributor

Well, anyone who gives it some genuine thought should to be able to come up with at least a few legitimate concerns....

@vondele
Copy link
Member Author

vondele commented Jun 12, 2020

feel free to share your concerns, especially with some data helps us understand what the problem is.

@jhellis3
Copy link
Contributor

I'm not the one who makes the decision on what gets merged.

@jhellis3
Copy link
Contributor

jhellis3 commented Jun 12, 2020

And I also can't provide data from the future.... What I can say is Stockfish has gained considerable Elo in the last 6+ years.

@AlexandreMasta
Copy link

AlexandreMasta commented Jun 12, 2020

And I also can provide data from the future.... What I can say is Stockfish has gained considerable Elo in the last 6+ years.

I think that what Jhellis wants to say is that obfuscating eval by rounding it can damage search gains in the future. In the long run the more accurate eval is the more search speed ups and improvements are more impactful. BTW...what is the final goal of an engine? Isn´t it to achieve a probably perfect evaluation? How will you achieve this goal by adding noise to it? In some point this "trick" will have to be removed to achieve new limits.

Maybe this is the rationale. Maybe I´m totally wrong. But whatever. I just tried to understand what he was saying.

@ddugovic
Copy link

For candidate move-ordering purposes, this complication sounds similar in effect to adding a small pseudo-random number to each evaluation, but without the advantage of being able to seed the PRNG (in order to expose butterfly effects).

@adentong
Copy link

@AlexandreMasta I can't say I agree. An engine should be able to play the perfect GAME, regardless of what the eval looks like.

@adentong
Copy link

Even the SMP test passed now. Solid elogainer this is.

@snicolet snicolet closed this in 4d65761 Jun 13, 2020
@snicolet
Copy link
Member

Merged via 4d65761, congrats!

@vdbergh
Copy link
Contributor

vdbergh commented Jun 13, 2020

For candidate move-ordering purposes, this complication sounds similar in effect to adding a small pseudo-random number to each evaluation,

@ddugovic Yes that's right. Quantization noise is usually modeled as white noise. I wonder though if 16 is already so large that there might be a non-negligible correlation between the signal and the noise.

snicolet pushed a commit that referenced this pull request Jun 13, 2020
Tuned search constants after many search patches since the last
successful tune.

1st LTC @ 60+0.6 th 1 :
LLR: 2.97 (-2.94,2.94) {0.25,1.75}
Total: 57656 W: 7369 L: 7036 D: 43251
Ptnml(0-2): 393, 5214, 17336, 5437, 448
https://tests.stockfishchess.org/tests/view/5ee1e074f29b40b0fc95af19

SMP LTC @ 20+0.2 th 8 :
LLR: 2.95 (-2.94,2.94) {0.25,1.75}
Total: 83576 W: 9731 L: 9341 D: 64504
Ptnml(0-2): 464, 7062, 26369, 7406, 487
https://tests.stockfishchess.org/tests/view/5ee35a21f29b40b0fc95b008

The changes were rebased on top of a successful patch by Viz (see #2734)
and two different ways of doing this were tested. The successful test
modified the constants in the patch by Viz in a similar manner to the
tuning run:

LTC (rebased) @ 60+0.6 th 1 :
LLR: 2.94 (-2.94,2.94) {0.25,1.75}
Total: 193384 W: 24241 L: 23521 D: 145622
Ptnml(0-2): 1309, 17497, 58472, 17993, 1421
https://tests.stockfishchess.org/tests/view/5ee43319ca6c451633a995f9

Further work: the recent patch to quantize eval #2733 affects search quit
quite a bit, so doing another tune in, say, three months time might be a
good idea.

closes #2735

Bench 4246971
@Rocky640
Copy link

I think that future SPSA tuning about evaluate.cpp, pawns.cpp or material.cpp will have to first disable that line.
In general it is hard to imagine any new small eval bonus or tweak that will make it. 16 seems quite large.

@vondele
Copy link
Member Author

vondele commented Jun 14, 2020

@Rocky640 I actually don't think so, but it would be worth testing. The reason is that these small terms will still contribute, i.e. trip quantization to jump one way or another. Said differently small terms increase the accuracy of the eval function (i.e. move it in the right way), but don't improve the precision of the val function (which is mostly the largest error). useful picture. So, I think small terms will pass equally well. Actually the experiment is easy, let's try to remove a bunch of eval terms with simplification bounds... I don't think we'll be able to remove any.

@vondele
Copy link
Member Author

vondele commented Jun 14, 2020

So, as a test, I started 6 tests to remove small eval times as simplifications:

  constexpr Score BishopPawns         = S(  3,  7);
  constexpr Score BishopXRayPawns     = S(  4,  5);
  constexpr Score FlankAttacks        = S(  8,  0);
  constexpr Score BishopKingProtector = S(  6,  9);
  constexpr Score KnightKingProtector = S(  8,  9);
  constexpr Score RestrictedPiece     = S(  7,  7);
  constexpr Score RookOnQueenFile     = S(  5,  9);

@vondele
Copy link
Member Author

vondele commented Jun 14, 2020

None of the terms could be removed (5 failed at STC, RookOnQueenFile at LTC).

@Rocky640
Copy link

Rocky640 commented Jun 14, 2020

That was quick ! And thank you for the explanation. (useful picture with text here https://chemistrygod.com/accuracy-and-precision-in-chemistry)

The bonus you tested, are usually multiplied by some factor
RookOnQueenFile will use a factor 2 at most, the other often more.

I'm still concerned about features which are scored usually only once.
For example psqt, or some material imbalance.

I'm not against the quantization idea, time will tell if 16 i the best value.
Just curious... Was 8 tried ? Was 4 tried ? or more than 16 ?

@vondele
Copy link
Member Author

vondele commented Jun 14, 2020

yes, other values were tried
8: https://tests.stockfishchess.org/tests/view/5ee106aef29b40b0fc95ae9c
32: https://tests.stockfishchess.org/tests/view/5ee106aaf29b40b0fc95ae9a
as well as other locations (wrt to tempo and 50mrc scaling).

yes, I'm aware these bonus terms are often scaled, but typically the smallest change is still on the order of the bonus. However, feel free to test other terms. Things would be very different if we would round after each eval terms is added, but that's not what we do.

@Rocky640
Copy link

Another result
At LTC Quanta 8 was not better than master Quanta 16
https://tests.stockfishchess.org/tests/view/5ee6aab587586124bc2c109c

mstembera pushed a commit to mstembera/Stockfish that referenced this pull request Jun 15, 2020
Tuned search constants after many search patches since the last
successful tune.

1st LTC @ 60+0.6 th 1 :
LLR: 2.97 (-2.94,2.94) {0.25,1.75}
Total: 57656 W: 7369 L: 7036 D: 43251
Ptnml(0-2): 393, 5214, 17336, 5437, 448
https://tests.stockfishchess.org/tests/view/5ee1e074f29b40b0fc95af19

SMP LTC @ 20+0.2 th 8 :
LLR: 2.95 (-2.94,2.94) {0.25,1.75}
Total: 83576 W: 9731 L: 9341 D: 64504
Ptnml(0-2): 464, 7062, 26369, 7406, 487
https://tests.stockfishchess.org/tests/view/5ee35a21f29b40b0fc95b008

The changes were rebased on top of a successful patch by Viz (see official-stockfish#2734)
and two different ways of doing this were tested. The successful test
modified the constants in the patch by Viz in a similar manner to the
tuning run:

LTC (rebased) @ 60+0.6 th 1 :
LLR: 2.94 (-2.94,2.94) {0.25,1.75}
Total: 193384 W: 24241 L: 23521 D: 145622
Ptnml(0-2): 1309, 17497, 58472, 17993, 1421
https://tests.stockfishchess.org/tests/view/5ee43319ca6c451633a995f9

Further work: the recent patch to quantize eval official-stockfish#2733 affects search quit
quite a bit, so doing another tune in, say, three months time might be a
good idea.

closes official-stockfish#2735

Bench 4246971
vondele pushed a commit that referenced this pull request Jun 15, 2020
The last search tune patch was tested before the implementation of #2733 which
presumably changed the search characteristics noticeably. Another tuning run was
done, see https://tests.stockfishchess.org/tests/view/5ee5b434ca6c451633a9a08c
and the updated values passed these tests:

STC:
LLR: 2.93 (-2.94,2.94) {-0.50,1.50}
Total: 34352 W: 6600 L: 6360 D: 21392
Ptnml(0-2): 581, 3947, 7914, 4119, 615
https://tests.stockfishchess.org/tests/view/5ee62f05ca6c451633a9a15f

LTC 60+0.6 th 1 :
LLR: 2.97 (-2.94,2.94) {0.25,1.75}
Total: 11176 W: 1499 L: 1304 D: 8373
Ptnml(0-2): 69, 933, 3403, 1100, 83
https://tests.stockfishchess.org/tests/view/5ee6205bca6c451633a9a147

SMP LTC 20+0.2 th 8 :
LLR: 2.93 (-2.94,2.94) {0.25,1.75}
Total: 54032 W: 6126 L: 5826 D: 42080
Ptnml(0-2): 278, 4454, 17280, 4698, 306
https://tests.stockfishchess.org/tests/view/5ee62f25ca6c451633a9a162

Closes #2742

Bench 4957812
MichaelB7 pushed a commit to MichaelB7/Stockfish that referenced this pull request Jun 16, 2020
Removes some excess precision, helps searchs.

Effectively reintroduces evaluation grain, with a slightly different context.
official-stockfish@45dbd9c

passed STC
LLR: 2.97 (-2.94,2.94) {-0.50,1.50}
Total: 197032 W: 37938 L: 37462 D: 121632
Ptnml(0-2): 3359, 22994, 45446, 23246, 3471
https://tests.stockfishchess.org/tests/view/5ee0c228f29b40b0fc95ae53

passed LTC
LLR: 2.94 (-2.94,2.94) {0.25,1.75}
Total: 77696 W: 9970 L: 9581 D: 58145
Ptnml(0-2): 530, 7075, 23311, 7340, 592
https://tests.stockfishchess.org/tests/view/5ee21426f29b40b0fc95af43

passed LTC SMP
LLR: 2.96 (-2.94,2.94) {0.25,1.75}
Total: 64136 W: 7425 L: 7091 D: 49620
Ptnml(0-2): 345, 5416, 20228, 5718, 361
https://tests.stockfishchess.org/tests/view/5ee387bbf29b40b0fc95b04c

closes official-stockfish#2733

Bench: 4939103
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants