Update Elo estimates for terms in search. #2401

vondele · 2019-11-09T05:46:35Z

This updates estimates from 1.5yr ago, and adds missing terms.
All tests run at 10+0.1 (STC), 20000 games, error bars +- 3 Elo.

Noteworthy changes are step 7 (futility pruning) going from ~30 to ~49 Elo and step 14 (pruning at shallow depth) going from ~170 to ~204 Elo.

@Rocky640 made the suggestion to look at time control dependence of these terms.
I picked two large terms (early futility pruning and singular extension), so with
small relative error. It turns out it is actually quite interesting (see figure 1).
Contrary to my expectation, the Elo gain for early futility pruning is pretty time
control sensitive, while singular extension gain is not.

Figure 1:

Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo),
the code is actually identical but the margins have changed. It seems like a nice
example of how connected terms in search really are, i.e. the value of early futility
pruning increased significantly due to changes elsewhere in search.

No functional change.

Rocky640 · 2019-11-10T12:50:29Z

Would it make sense to run some of those measurements at LTC ? This would highlight the depth sensitive areas of search.

vondele · 2019-11-10T13:19:18Z

Interesting, but I would suspect that for the terms with small Elo impact we would need much more accurate estimates to do such a test. Is there any of the large Elo terms that you would expect to be TC sensitive?

Generally, I'm a bit critical of TC sensitivity. For me relevant numbers are:
https://github.com/glinscott/fishtest/wiki/UsefulData#elo-change-with-respect-to-tc

Counter question... would you (or any eval expert) be interested in doing something similar in Eval?
This https://github.com/glinscott/fishtest/wiki/UsefulData#elo-contributions-from-various-evaluation-terms got stale, and I think only doc in the code can survive over time.

vondele · 2019-11-10T13:40:40Z

So, I've added 2 LTC measurement to the queue futility and singular extension. Both have large contributions, and futility is only at low depth (<7) while se is high depth (>=6) ... let's see.

FauziAkram · 2019-11-10T14:26:03Z

@vondele I have created something similar for eval terms, you can find it here:
https://onedrive.live.com/edit.aspx?cid=7d656668e4e2c5e8&page=view&resid=7D656668E4E2C5E8!635&parId=7D656668E4E2C5E8!105&app=Excel

But maybe it's now a bit outdated, it might need some updates and refreshment

Rocky640 · 2019-11-10T18:04:50Z

Such tests might help discover a simplification or two.

To start with. we could run at least a rough estimate of threats(), passed(), space(). initiative()
For the king, any change will break the kingDanger calculation, so better to estimate it as a whole too.

Interesting would also to see the impact of using
S((mg+eg)/2, (mg+eg)/2) before scaling. This will measure the value of the "tampered eval".

A third set of tests would be to disable the respective piece eval in piece() or pawn eval contribution, or psqt or mobility.
But instead of completely disable such feature, more informative would be to replace with an average value, computed with a short bench run against more midgame positions (to find average mg) and more endgame positions (to find average eg) and using some dbg_mean_of.

A fourth set of tests would disable each individual bonus.

Another area of research would be to test each individual bonus with 50% value and with 150% value. It is quite possible that despite all the tuning, some bonus are stuck at some local maxima,
and we are missing the global maxima. Such tests would also give more data about the sensitivity of each bonus.

Looking back at Fauzi results, it seems that
anything below 3 ELO could be a candidate to removal if proper adjustments are found.
For example RookOnPawn, ThreatByRank, HinderPassedPawn were all between 1 and 3 ELO
and had been removed since then.

One bonus which we still have is MinorBehindPawn. Removing it "as is" will not work, but it might be removed if we adjust some mg psqt values and a few other bonus cleverly.

Rocky640 · 2019-11-10T18:04:54Z

Here is a more direct link to Fauzi's work, which is about 1 year old, if someone knows how to replace the dead link
https://github.com/glinscott/fishtest/wiki/UsefulData#elo-contributions-from-various-evaluation-terms with this;

Stockfish Feature's Estimated Elo worth (1).xlsx

A few bonus have been introduced or were modified since then.

vondele · 2019-11-10T20:25:50Z

I've updated the link on the wiki (just click 'Edit' on the top of the page).

If you find the time, please submit the Eval tests, I think that would be useful.

FauziAkram · 2019-11-11T08:14:47Z

Have someone tried to remove these 2?
http://tests.stockfishchess.org/tests/view/5dc58b4f0ebc5902562bbd45
http://tests.stockfishchess.org/tests/view/5dc58b170ebc5902562bbd3d

vondele · 2019-11-11T08:17:35Z

@FauziAkram yes
http://tests.stockfishchess.org/tests/view/5dc6bdcf0ebc5902562bd3c0
http://tests.stockfishchess.org/tests/view/5dc6553c0ebc5902562bcd42
http://tests.stockfishchess.org/tests/view/5dc655310ebc5902562bcd3f

ttruscott · 2019-11-11T14:15:56Z

I'd like an ELO estimate for the has_game_cycle() check in search.cpp

snicolet · 2019-11-12T00:46:33Z

Thanks for running the tests!

My suggestion would be to keep the same pattern for Elo estimates in the code as in current master, using a scale of ~2, ~5, ~10, ~15, ~20, ~30, ~40, ~50, etc. instead of writing last digit accuracy which we don't have.

This to avoid people running Elo experiments every two weeks to see if the last digits have changed...

vondele · 2019-11-12T07:18:42Z

@snicolet, let's keep the result as obtained from the tests. Rounding numbers needlessly increases the error. Not rerunning these tests often should be just a policy (and hasn't been a problem so far).

vondele · 2019-11-12T07:23:48Z

@Rocky640 made the suggestion to look at TC dependence of these terms. I picked two large terms, so with small relative error. It turns out it is actually quite interesting. Contrary to my expectation, early futility pruning is pretty TC sensitive, while singular extension is not.

Going back to the old measurement of futility pruning (30Elo vs today 49 Elo), the code is actually identical. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search.

Alayan-stk-2 · 2019-11-20T22:37:09Z

Could you do a measurement for the multicut part of singular extension search ?

Vizvezdenec · 2019-12-07T04:45:20Z

Code in futility pruning is identical, but futility margin itself is vastly different.
#2270 and following PR by proton change this quite a lot.

@Rocky640

This updates estimates from 1.5 year ago, and adds missing terms. All estimates from tests run on fishtest at 10+0.1 (STC), 20000 games, error bars +- 3 Elo, see the original message in the pull request for the full list of tests. Noteworthy changes are step 7 (futility pruning) going from ~30 to ~50 Elo and step 13 (pruning at shallow depth) going from ~170 to ~200 Elo. Full list of tests: #2401 @Rocky640 made the suggestion to look at time control dependence of these terms. I picked two large terms (early futility pruning and singular extension), so with small relative error. It turns out it is actually quite interesting (see figure 1). Contrary to my expectation, the Elo gain for early futility pruning is pretty time control sensitive, while singular extension gain is not. Figure 1: TC dependence of two search terms ![elo_search_tc]( http://cassio.free.fr/divers/elo_search_tc.png ) Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo), the code is actually identical but the margins have changed. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search. No functional change.

snicolet · 2020-01-10T02:33:47Z

Merged via 114ddb7, thanks :-)

@Rocky640

This updates estimates from 1.5 year ago, and adds missing terms. All estimates from tests run on fishtest at 10+0.1 (STC), 20000 games, error bars +- 3 Elo, see the original message in the pull request for the full list of tests. Noteworthy changes are step 7 (futility pruning) going from ~30 to ~50 Elo and step 13 (pruning at shallow depth) going from ~170 to ~200 Elo. Full list of tests: official-stockfish#2401 @Rocky640 made the suggestion to look at time control dependence of these terms. I picked two large terms (early futility pruning and singular extension), so with small relative error. It turns out it is actually quite interesting (see figure 1). Contrary to my expectation, the Elo gain for early futility pruning is pretty time control sensitive, while singular extension gain is not. Figure 1: TC dependence of two search terms ![elo_search_tc]( http://cassio.free.fr/divers/elo_search_tc.png ) Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo), the code is actually identical but the margins have changed. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search. No functional change. Rewrite initialization of PseudoMoves This is a non-functional code style change. I believe master is a bit convoluted here and propose this version for clarity. No functional change

This updates estimates from 2yr ago official-stockfish#2401, and adds missing terms. All tests run at 10+0.1 (STC), 20000 games, error bars +- 1.8 Elo, book 8moves_v3.png. A table of Elo values with the links to the corresponding tests can be found at the PR closes official-stockfish#3868 Non-functional Change

snicolet closed this Jan 10, 2020

BM123499 mentioned this pull request May 12, 2021

Simplify LMR #3460

Closed

BM123499 mentioned this pull request Dec 20, 2021

Update Elo estimates for terms in search. #3868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Elo estimates for terms in search. #2401

Update Elo estimates for terms in search. #2401

vondele commented Nov 9, 2019 •

edited by snicolet

Rocky640 commented Nov 10, 2019

vondele commented Nov 10, 2019

vondele commented Nov 10, 2019

FauziAkram commented Nov 10, 2019

Rocky640 commented Nov 10, 2019

Rocky640 commented Nov 10, 2019

vondele commented Nov 10, 2019

FauziAkram commented Nov 11, 2019

vondele commented Nov 11, 2019

ttruscott commented Nov 11, 2019

snicolet commented Nov 12, 2019 •

edited

vondele commented Nov 12, 2019

vondele commented Nov 12, 2019

Alayan-stk-2 commented Nov 20, 2019

Vizvezdenec commented Dec 7, 2019

snicolet commented Jan 10, 2020

Update Elo estimates for terms in search. #2401

Update Elo estimates for terms in search. #2401

Conversation

vondele commented Nov 9, 2019 • edited by snicolet

Rocky640 commented Nov 10, 2019

vondele commented Nov 10, 2019

vondele commented Nov 10, 2019

FauziAkram commented Nov 10, 2019

Rocky640 commented Nov 10, 2019

Rocky640 commented Nov 10, 2019

vondele commented Nov 10, 2019

FauziAkram commented Nov 11, 2019

vondele commented Nov 11, 2019

ttruscott commented Nov 11, 2019

snicolet commented Nov 12, 2019 • edited

vondele commented Nov 12, 2019

vondele commented Nov 12, 2019

Alayan-stk-2 commented Nov 20, 2019

Vizvezdenec commented Dec 7, 2019

snicolet commented Jan 10, 2020

vondele commented Nov 9, 2019 •

edited by snicolet

snicolet commented Nov 12, 2019 •

edited