Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classic contempt effect on NNUE #3168

Closed
mstembera opened this issue Oct 3, 2020 · 28 comments
Closed

Classic contempt effect on NNUE #3168

mstembera opened this issue Oct 3, 2020 · 28 comments

Comments

@mstembera
Copy link
Contributor

The belief that current classic contempt has little to no effect on NNUE is wrong.
https://tests.stockfishchess.org/tests/view/5f763b224386996a8d4f0d75
shows that it fails regression. I would prefer the maintainers propose a way to address this instead of myself so that it has a better chance of being accepted.

@syzygy1
Copy link
Contributor

syzygy1 commented Oct 4, 2020

So contempt can be simplified away? ;-)

I guess the question is if it still helps against weaker engines. Contempt was always supposed to be weaker in selfplay (even though for - I believe - unexplained reasons tests showed otherwise).

@SFisGOD
Copy link
Contributor

SFisGOD commented Oct 4, 2020

Contempt 24 vs. Initial NNUE commit
ELO: 106.35 +-2.1 (95%) LOS: 100.0%
Total: 40000 W: 14339 L: 2464 D: 23197
Ptnml(0-2): 69, 1209, 7881, 8460, 2381
https://tests.stockfishchess.org/tests/view/5f79c2913c4dc0ae679047f6

Contempt 0 vs. Initial NNUE commit
ELO: 102.44 +-2.0 (95%) LOS: 100.0%
Total: 40000 W: 13786 L: 2322 D: 23892
Ptnml(0-2): 61, 1126, 8136, 8642, 2035
https://tests.stockfishchess.org/tests/view/5f79c28b3c4dc0ae679047f4

@syzygy1
Copy link
Contributor

syzygy1 commented Oct 4, 2020

Wow, over 100 Elo gained since the initial NNUE commit? Impressive...

@mstembera
Copy link
Contributor Author

@syzygy1 I don't think so. AFAIK it still works as before for classical. The result just shows that it has an effect on NNUE(due to hybrid I presume) which was not known before now.

@xoto10
Copy link
Contributor

xoto10 commented Oct 4, 2020

I ran some tests using current master with slow mover to provide the weaker engine :

Contempt 24 vs Contempt 24 handicapped with Slow Mover=25 : ELO: 100.40 +-2.7 (95%)
https://tests.stockfishchess.org/tests/view/5f740246ee3cd7deb4746898

Contempt 0 vs Contempt 24 handicapped with Slow Mover=25 : ELO: 95.44 +-2.7 (95%)
https://tests.stockfishchess.org/tests/view/5f7480bad930428c36d34c45

Only 20k games each so wide error bars, but similar result to SFisGOD tests, about +5 Elo for Contempt=24. (SFisGOD tests currently showing +3 Elo)

@ssj100
Copy link

ssj100 commented Oct 8, 2020

@mstembera Thanks for this. I had no idea contempt influenced SF in any way when NNUE was used. The last time I tested this (eg. contempt 24 vs 0 vs 100), it had no impact whatsoever on SF in analysis. I wonder when this changed?

@Technologov
Copy link

NOTE that contempt can backfire, and SF9 actually plays very weak against SF12, even worse than SF8, and the only explanation for that is enabled contempt.

@Fanael
Copy link
Contributor

Fanael commented Oct 9, 2020

For what it's worth, I've tried removing static contempt entirely, but leaving dynamic contempt intact, and the results are… unexpected, because it should be equivalent to setting static contempt to 0, but apparently is not:

Passed STC https://tests.stockfishchess.org/tests/view/5f7f0d345b3847b5d41f906c
LLR: 2.93 (-2.94,2.94) {-1.25,0.25}
Total: 26520 W: 2707 L: 2613 D: 21200
Ptnml(0-2): 117, 2061, 8816, 2143, 123

Passed LTC https://tests.stockfishchess.org/tests/view/5f7f3ca05b3847b5d41f9088
LLR: 2.94 (-2.94,2.94) {-0.75,0.25}
Total: 33640 W: 1431 L: 1375 D: 30834
Ptnml(0-2): 19, 1212, 14304, 1264, 21

Master vs master with slow mover=25 https://tests.stockfishchess.org/tests/view/5f80088f5b3847b5d41f90f1
ELO: 98.76 +-2.6 (95%) LOS: 100.0%
Total: 20000 W: 6189 L: 652 D: 13159
Ptnml(0-2): 8, 406, 4527, 4159, 900

No static contempt vs master with slow mover=25 https://tests.stockfishchess.org/tests/view/5f8008de5b3847b5d41f90f3
ELO: 100.71 +-2.7 (95%) LOS: 100.0%
Total: 20000 W: 6218 L: 578 D: 13204
Ptnml(0-2): 11, 361, 4551, 4131, 946

No static contempt should be about 5 elo weaker than master against SM=25, but it measures as very slightly stronger here; even assuming we got unlucky and hit the far ends of the 95% confidence intervals, the real elo would then be 101.4 in the first test and 98 in the other, which is not quite the ~5 elo difference in favor of C=24.

For comparison, removing all contempt never finished, but scored much worse: https://tests.stockfishchess.org/tests/view/5f749894d930428c36d34c50
LLR: -0.62 (-2.94,2.94) {-1.25,0.25}
Total: 22776 W: 2265 L: 2326 D: 18185
Ptnml(0-2): 113, 1822, 7576, 1767, 110

@MichaelB7
Copy link
Contributor

MichaelB7 commented Oct 9, 2020

Logically , some contempt is good , especially when ahead in score and that been true for almost every engine - as the best moves are those that keep the pieces are on the board and still maintains pressure. I think it does become a hair splitting exercise when the difference say between 24 and 14 may not be great in self play ( two roughly equal engines) , whereas 24 versus 14 is clearly better against weaker engines - hence the desire to have the highest contempt that does not lose Elo in self play, when the engines are equal in value.

Also , as someone else pointed out - against a stronger engine , contempt can be negative Elo , as it will lose games that it should draw, when draw is the best outcome.

@vondele
Copy link
Member

vondele commented Oct 14, 2020

In my opinion, static contempt with NNUE doesn't really work in the current implementation (i.e. at best small Elo gain against weaker engines). contempt for USE NNUE false has become not quite useful, as there now are much stronger (NNUE) engines in general, against which contempt is not helpful. I wouldn't be opposed to removing it completely till we find a real good implementation of contempt for NNUE. Possibly there are other opinions. @snicolet ?

@SFisGOD
Copy link
Contributor

SFisGOD commented Oct 14, 2020

I think it's better to just set it to 0 rather than remove it completely.

@syzygy1
Copy link
Contributor

syzygy1 commented Oct 14, 2020

I suppose someone has already tried to remove the "dynamic contempt" from SF-NNUE? (Otherwise that might be a nice exercise ;-))

@snicolet
Copy link
Member

snicolet commented Oct 16, 2020

There were two main ideas in the current implementation of static contempt: shifting the draw value and avoiding exchanges of material. It seems that the second idea is covered in NNUE now, and no good implementation for the first is available today for NNUE.

So I am not averse to removing the static contempt entirely and the UCI option called "contempt".

We can keep the dynamic part for the moment (I would suggest to rename it to something more neutral, for instance "rootTrendBonus" or just "trend"). Once we do that we can calmly examine the trend part and judge its Elo effect to see if it can be improved/simplified?

@locutus2 @Stefano80 Opinions too?

@mstembera
Copy link
Contributor Author

I would just like to note that it is still quite useful for classical.

@syzygy1
Copy link
Contributor

syzygy1 commented Oct 16, 2020

@snicolet yes, I fully agree dynamic contempt is a separate issue to be tested separately (so admittedly off-topic here).

@mstembera Perhaps at some point it should be considered whether it makes sense to have two separate goals (improve NNUE, improve classical) for the same code base. Why accept improvements to the classical evaluation function that have only been tested classical (and may well hurt NNUE), but simplify away other features that still help classical. I understand that maintaining two branches and e.g. testing each search change separately for both branches would use up a lot of resources, but the current approach doesn't seem ideal either in the long run.
(But in the particular case of contempt I agree you have a good argument for keeping it as an option, even if defaulting to 0.)

@MichaelB7
Copy link
Contributor

FWIW , dynamic contempt could be called "initiative"

@Mr-Twave
Copy link

What is a good measure of the evaluation stability near trades for NNUE and Classical evaluation respectively?

@Vizvezdenec
Copy link
Contributor

Who even uses classical nowadays honestly? Especially about contempt in it since this had 2 different usages - 1) extra points in rating lists; 2) extra point in tournament round robins - nowadays sf wouldn't participate there in classical mode anyway.
I'm all for completely removing static contempt unless we find implementation that actually works.

@ssj100
Copy link

ssj100 commented Jan 14, 2021

@mstembera Can you consider running elo gaining bounds for contempt 0 over default master? If it passes STC and LTC, @vondele suggested he may make it new default.

@mstembera
Copy link
Contributor Author

@ssj100
Copy link

ssj100 commented Jan 14, 2021

@mstembera Thanks for running, it failed yellow STC. I wonder if a low priority LTC run would be reasonable, at least for documentation's sake.

@mstembera
Copy link
Contributor Author

@ssj100 @vondele doesn't like speculative LTC so I won't run it unless he suggests it.

@ghost
Copy link

ghost commented Jun 23, 2021

With contempt removed from Stockfish in the latest dev version, is this issue still necessary?

@vondele vondele closed this as completed Jun 24, 2021
@syzygy1
Copy link
Contributor

syzygy1 commented Jul 1, 2021

Did we ever try adding a gamephase-tapered contempt component to the NNUE/hybrid eval?

The way a contempt Score was added to the classical eval is very elegent (I remember being very impressed by the simple and elegant implementation), but unless I am very mistaken it is not fundamental at all. You can as well first calculate the classical eval Value and only then add a gamephase-tapered contempt Value. With NNUE you can only do the latter, but it is still possible.

@syzygy1
Copy link
Contributor

syzygy1 commented Jul 1, 2021

I guess it was tried:
locutus2/Stockfish@1e62a04...4314eab
But how can it not work against weaker engines?

@vdbergh
Copy link
Contributor

vdbergh commented Jul 1, 2021

My feeling is always that the contempt values that are used are too small. I believe the evaluation should be a measure for the expected score in a position (for practical evaluation functions this needs to be corrected by game phase and possibly other factors). If you are playing against an opponent that is more than a 100 Elo weaker then a contempt value of 50 in internal SF units seems too little to represent the increase in expected score.

@vondele
Copy link
Member

vondele commented Jul 1, 2021

optimal contempt values were measured a few years ago https://github.com/glinscott/fishtest/wiki/UsefulData#contempt-measurements

@xoto10
Copy link
Contributor

xoto10 commented Jul 1, 2021

My feeling is always that the contempt values that are used are too small. I believe the evaluation should be a measure for the expected score in a position (for practical evaluation functions this needs to be corrected by game phase and possibly other factors). If you are playing against an opponent that is more than a 100 Elo weaker then a contempt value of 50 in internal SF units seems too little to represent the increase in expected score.

Nice argument. Thinking along these lines, stockfish (classical or nnue) won't know how strong the opponent is, so this is where a user-supplied contempt number is needed - it could be the estimated elo gap? Then this contempt option would be combined with current game phase and current score to adjust the draw value (/current score) and material taper. As I understand it, from a potential 3 inputs and 2 outputs we currently only use any contempt in classical eval, and look at one input (current score) and adjust the 2 outputs (score and material taper). I'll think some more and maybe try some tests ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests