Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove razoring #3278

Closed
wants to merge 1 commit into from
Closed

Remove razoring #3278

wants to merge 1 commit into from

Conversation

unaiic
Copy link
Contributor

@unaiic unaiic commented Dec 26, 2020

STC https://tests.stockfishchess.org/tests/view/5fe653403932f79192d3981a
LLR: 2.95 (-2.94,2.94) {-1.25,0.25}
Total: 63448 W: 5965 L: 5934 D: 51549
Ptnml(0-2): 230, 4738, 21769, 4745, 242

LTC https://tests.stockfishchess.org/tests/view/5fe6f0f03932f79192d39856
LLR: 2.93 (-2.94,2.94) {-0.75,0.25}
Total: 65368 W: 2485 L: 2459 D: 60424
Ptnml(0-2): 33, 2186, 28230, 2192, 43

bench: 4184328

Simplify razoring.

@FauziAkram
Copy link
Contributor

Can you please also update the elo estimation for each step, since you calculated it (Maybe in a different PR)

@unaiic
Copy link
Contributor Author

unaiic commented Dec 26, 2020

@FauziAkram Yeah, I'm still doing more elo estimations, so I think I'll wait until I finish with all of them. But sure, good idea :)

STC https://tests.stockfishchess.org/tests/view/5fe653403932f79192d3981a
LLR: 2.95 (-2.94,2.94) {-1.25,0.25}
Total: 63448 W: 5965 L: 5934 D: 51549
Ptnml(0-2): 230, 4738, 21769, 4745, 242

LTC https://tests.stockfishchess.org/tests/view/5fe6f0f03932f79192d39856
LLR: 2.93 (-2.94,2.94) {-0.75,0.25}
Total: 65368 W: 2485 L: 2459 D: 60424
Ptnml(0-2): 33, 2186, 28230, 2192, 43

bench: 4184328

Simplify razoring.
@anshulongithub
Copy link

@joergoster cud you elaborate on why you think that removing Razoring is not a good idea?

@Vizvezdenec
Copy link
Contributor

I think that updating elo estimations should be done separately.
Also IIRC they were done at LTC - and some of this is pretty TC sensitive.

@unaiic
Copy link
Contributor Author

unaiic commented Dec 26, 2020

@Vizvezdenec You suggest testing them at LTC then? It'd be done separately, of course :)

@Vizvezdenec
Copy link
Contributor

well if we will go pretty idle otherwise it's not a priority.

@joergoster
Copy link
Contributor

@anshulongithub Why does one want to remove it in the first place?

It is a well known pruning technique and gains elo. The fact that it can be removed is only because of the current strength of Stockfish and the fact, that at this strength elo gets compressed and small gains or losses are hardly measurable. That's also the reason why it is much easier to get simplifications passed than elo gaining patches.

However, I do no longer care that much about the development of SF. Too much to my dislike in the past ...
I occasionally jump in if there is something that catches my interest and that's all.

@ddobbelaere
Copy link
Contributor

ddobbelaere commented Dec 27, 2020

It is a well known pruning technique and gains elo.

Both linked tests seem to indicate only the slightest of elo regression: estimated elo = -0.07/-0.02 (STC/LTC). 95% confidence intervals are [-1.37,1.14] and [-0.84,0.74], respectively.

Note that funnily enough W > L for both tests, probably this is a weird artifact of the pentanomial model (related to asymmetry of outcomes of the game pairs) that I don't understand?
For trinomial, I think W > L should always imply that estimated elo > 0.

@vdbergh
Copy link
Contributor

vdbergh commented Dec 27, 2020

It is a well known pruning technique and gains elo.

Both linked tests seem to indicate only the slightest of elo regression: estimated elo = -0.07/-0.02 (STC/LTC). 95% confidence intervals are [-1.37,1.14] and [-0.84,0.74], respectively.

Note that funnily enough W > L for both tests, probably this is a weird artifact of the pentanomial model (related to asymmetry of outcomes of the game pairs) that I don't understand?
For trinomial, I think W > L should always imply that estimated elo > 0.

No this is not true. The Elo estimate takes into account the length of the test. It is not so easy to
reason about it intuitively. The formal definition is

P(elo estimate<=true elo)=50%.

This is called a median unbiased estimator (if you repeat the same test many times the elo estimate
will be too low in half the cases).

@ddobbelaere
Copy link
Contributor

ddobbelaere commented Dec 27, 2020

@vdbergh Thanks! Oh wow this is subtle, I didn't even think about length.

Is the following intuitive reasoning correct? Assume a trinomial model. Take test 1 with final outcome W=L and some D. Take test 2 with final outcome W'=L'=2W and D'=2D. Then, as the estimated elo is based on the brownian motion paper, intuitively "estimated elo 2 < estimated elo 1" should hold for [-1.25; 0.25] bounds as test 2 needs double amount of games so chances are higher that true elo lies more closely to -0.5.

@Vizvezdenec
Copy link
Contributor

Vizvezdenec commented Dec 27, 2020

Razoring is known to get almost nothing for ages, in fact previous attempts were close to like 2.5 LLR twice (when we tried to remove it) and now it seems to be even less useful.
Maybe we can try to reintroduce a stronger form of it if it will be removed with losing like 0,2 elo...
For example on of my tests on depth 3 razoring was close to passing somewhere between sf11 and 12. Maybe it can be good to pass from scratch now?

@AlexandreMasta
Copy link

Razoring is a so extended tested technique. It is proved to gain elo in all A/B engines. I can´t imagine why this is being discussed. Ok...you want one more loss-elo patch from Unaiic ok...go ahead...remove one or 2 lines of code for a regressive patch as is being done recently. SF is so ahead no one will notice.

@ianfab
Copy link

ianfab commented Dec 28, 2020

Razoring is known to get almost nothing for ages, in fact previous attempts were close to like 2.5 LLR twice (when we tried to remove it) and now it seems to be even less useful.
Maybe we can try to reintroduce a stronger form of it if it will be removed with losing like 0,2 elo...
For example on of my tests on depth 3 razoring was close to passing somewhere between sf11 and 12. Maybe it can be good to pass from scratch now?

Exactly. Something similar happened with IID already. If a search/pruning technique no longer gains Elo, simplifying it away opens up the room for something better to replace it. If there are doubts that the passed tests provide sufficient evidence that the change is not a (considerable) regression or that the code simplification is not worth the potential minor regression, then there should be a discussion about the testing conditions in general (not in this thread), not about the patch.

@vondele vondele added the to be merged Will be merged shortly label Dec 31, 2020
@vondele vondele closed this in 8ec97d1 Dec 31, 2020
@unaiic unaiic deleted the pm branch April 9, 2021 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants