Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Default Contempt from C=24 to C=20 #2073

Closed
wants to merge 1 commit into from

Conversation

SFisGOD
Copy link
Contributor

@SFisGOD SFisGOD commented Apr 3, 2019

Stockfish contempt is set to the highest non-regressive value against master with contempt=0. Since PawnValueEg increased, the non-regressive contempt might have changed because of the following dependency in line 310 of search.cpp :

int ct = int(Options["Contempt"]) * PawnValueEg / 100; // From centipawns

The default contempt 24 passed STC non-regression but it failed LTC non-regression. So, a proposed new contempt is C=20 which passed both STC and LTC non-regressions.

Contempt 24
Passed STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 30255 W: 6038 L: 5933 D: 18284
http://tests.stockfishchess.org/tests/view/5ca104260ebc5925cfffec6b

Failed LTC
LLR: -2.95 (-2.94,2.94) [-3.00,1.00]
Total: 71069 W: 10037 L: 10287 D: 50745
http://tests.stockfishchess.org/tests/view/5ca1e1050ebc5925cf000493

Contempt 20
Passed STC
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 65905 W: 12642 L: 12601 D: 40662
http://tests.stockfishchess.org/tests/view/5ca472480ebc5925cf002a24

Passed LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 12668 W: 1847 L: 1715 D: 9106
http://tests.stockfishchess.org/tests/view/5ca4bf250ebc5925cf002fab

Against Stockfish 10, C=20 is about equal to C=24.

Contempt 20 Master vs Stockfish 10
ELO: 17.19 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 6424 L: 4446 D: 29130
http://tests.stockfishchess.org/tests/view/5ca4b62c0ebc5925cf002f7e

Contempt 24 Master vs Stockfish 10
ELO: 16.58 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 6649 L: 4742 D: 28609
http://tests.stockfishchess.org/tests/view/5ca294f90ebc5925cf000e4d

Bench: 3490352

@Vizvezdenec
Copy link
Contributor

I honestly tried to push this a lot of times, but...
We all know that even elo neutral patch fails [-3;1] with 30+ % probability.
So, imho, contempt value should be tested with more "forgiving" bounds.
What I want to do there - I want to set contempt to maximum value that will pass [-4;0] vs 0.
Why [-4;0]? Because it's reversed [0;4].
With this bounds we ensure that decreasing contempt is not an elogainer if we count it as a parameter tweak.
Otherwise we are falling into "retest the same contempt value vs 0, we will get 7 times it passing, 3 not passing" and basically setting it to "more of a lucky value" than real non-regressing one.
Imho, losing 0.9 elo, as we had in 24 c test, is nothing critical, but it fails [-3;1].

@vondele
Copy link
Member

vondele commented Apr 3, 2019

I honestly tried to push this a lot of times, but...

:-) reminds me of https://xkcd.com/882/

@Vizvezdenec
Copy link
Contributor

well the thing is that I didn't get any proper reply of why this is worse than what we currently have ;) @vondele
I think that it's pretty logical to set contempt to the highest value that can't be reversed as an elo gain, which requires [-4;0].
Also mentioning the fact that sf contempt now gains elo vs any non-nn engines because they became really relatively weak.

@vondele
Copy link
Member

vondele commented Apr 4, 2019

@Vizvezdenec I personally like the idea of maximizing contempt, mostly for the data that you summarized here: https://github.com/glinscott/fishtest/wiki/UsefulData#contempt-measurements
Maximizing contempt with [-4,0] is very similar to maximizing with [-3, 1], so for simplicity of the rules, we should just stick to [-3, 1].

My xkcd link was just the a little joke, similar to your remark, that if we test sufficiently often, we will see any contempt value in the range 0..24 fail. (Quoting your reply was not quite right, it was not referring to testing a lot of times).

@mcostalba
Copy link

This seems a parameter tweak to me. Why do you assume it is not a parameter tweak (and tested with [-3, 1]), just because you like a lower contempt better? Perhaps due to the comparison with SF 10? Well it does not seem very sensible to me, apart that the error bars overlap, the main point is that there is no reason why we want to optimize against SF 10. Why not SF 7, or even another engine?

Please, if possible, I'd suggest to focus time and resources (you spent a lot of them) on improving current master. That's our target.

@ElbertoOne
Copy link

ElbertoOne commented Apr 4, 2019

In my opinion this is an important data point especially with regards to TCEC (Leela). A high contempt value against a strong competitor (in this case SF10) may not be beneficial, at least that's what the failure of the C=24 test points to. Maybe we could re-run the C=24 test. If it fails again, then for TCEC finals we could opt to lower the contempt value to for instance C=20.

@SFisGOD
Copy link
Contributor Author

SFisGOD commented Apr 4, 2019

This seems a parameter tweak to me.

@mcostalba I updated the PR for more info.

See also 2a7213f
In that commit, snicolet was the one who wrote the explanation and not Vizvezdenec.

Discussion before the above PR was committed #1806

Sorry, I thought it is common knowledge that we set the default contempt to highest non-regressive value against contempt=0 so I did not explain more.

@xoto10
Copy link
Contributor

xoto10 commented Apr 4, 2019

The change in PawnValueEg made Contempt higher, so an automatic reduction to 23 to compensate seems reasonable to me. If we want to stick to multiples of 4 for simplicity, then either 20 or 24 could be argued. I am happy with the lower value since Leela appears to be a strong rival nowadays and it seems reasonable to tend towards slightly more conservative play rather than let contempt rise giving more risky play.
Anywhere in the 20-24 range makes only a subtle difference so I don't see it as a huge deal.

@xoto10
Copy link
Contributor

xoto10 commented Apr 4, 2019

Regarding the tests, I thought the standard was to do them with the 8moves book, and I think that is more appropriate than the 2moves one. (I'm not suggesting we should run them again.)

@SFisGOD
Copy link
Contributor Author

SFisGOD commented Apr 4, 2019

I thought the standard was to do them with the 8moves book

@xoto10

8 moves book is used for Fixed Num Games regression tests

For SPRT regression tests, snicolet used 2moves book so I just followed what he did in his tests.

@xoto10
Copy link
Contributor

xoto10 commented Apr 4, 2019

@SFisGOD
Ah, perhaps I am wrong then. Ok.

@mstembera
Copy link
Contributor

Regarding the TCEC and Leela contempt comments... We can submit any non default parameters to TCEC for any round just as we already do with say "Move Overhead". Therefore those concerns should not be taken into account here.

@Alayan-stk-2
Copy link

Alayan-stk-2 commented Apr 4, 2019

In my opinion this is an important data point especially with regards to TCEC (Leela).

I disagree. Leela has a different sets of strength of weaknesses compared to SF, so a test against SF10 tells us little about what to expect against Leela.

If the goal is to send a version optimized to do better against Leela in SuFi (that won't be useful or necessary in divP), then this should be based on test results against Leela directly. For example, it may be worth to investigate if this ThothFish setup is beneficial in setups where SFdev (default) and Leela are about equally matched : http://talkchess.com/forum3/viewtopic.php?f=2&t=70316

As @mstembera mentioned, this doesn't have to affect the default value.

A high contempt value against a strong competitor (in this case SF10) may not be beneficial, at least that's what the failure of the C=24 test points to.

The regression tests of C24 and C20 against SF10 have results difference well within error bars.

I am happy with the lower value since Leela appears to be a strong rival nowadays and it seems reasonable to tend towards slightly more conservative play rather than let contempt rise giving more risky play.

The core issue is that there isn't one single optimal contempt value.

Wants to do better in rating lists ? Crank up contempt, as it allows SF to crush more weaker engines. Wants to do better in divP ? Also up the contempt. Wants to do have highest self-play or anti-Leela strength ? Lower it down a notch. Wants to have more "objective" evaluations ? Lower it down or correct for half-contempt in output eval (half of the contempt is there only to refuse taking 3-folds, the other is also there to prevent trading down).

Wants to have a contempt value which helps good patches to pass at fishtest ? I don't think we know which value is best for this.

@vondele
Copy link
Member

vondele commented Apr 12, 2019

so, after yet another test, current master value 24, shows non-regression vs 0:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 210238 W: 29834 L: 29980 D: 150424
http://tests.stockfishchess.org/tests/view/5ca92dab0ebc5925cf008a72

and reducing to 20 doesn't pass [0,4] (kind of obvious after the above):
LLR: -2.95 (-2.94,2.94) [0.00,4.00]
Total: 33542 W: 7175 L: 7208 D: 19159
http://tests.stockfishchess.org/tests/view/5cb09bad0ebc5925cf012fd2

so, I propose we close this PR?

@SFisGOD SFisGOD closed this Apr 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants