Dynamic contempt #1394

Stefano80 · 2018-02-05T06:39:43Z

Adjust contempt based on current score.

STC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 110052 W: 24614 L: 23938 D: 61500

LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 16470 W: 2896 L: 2705 D: 10869

Stefano80 · 2018-02-05T06:40:12Z

Let me open a PR.

There has been some discussion going on here

Stefano80 · 2018-02-05T09:16:06Z

The reprosearch testing is failing (locally too), and I do not understand why. Any ideas?

Stefano80 · 2018-02-05T09:20:43Z

Ah, I found out. I have to initialize the contempt.

mcostalba · 2018-02-05T09:31:49Z

Interesting idea.

Maybe next step is to remove static contempt and use only dynamic contempt, this [-3, 1], if succeeds would get rid of combo boxes and other fancy stuff in one go, simplifying a lot UI.

Stefano80 · 2018-02-05T10:18:13Z

Thx @mcostalba. I think your suggestion is smart indeed.

http://tests.stockfishchess.org/tests/view/5a782de40ebc5902971a98cf

snicolet · 2018-02-05T10:36:34Z

If we use this formula:

    contempt =  bestValue / 10;

then it is a simplification compared to current master (it removes the UCI Contempt option)

If we use

   contempt =  Options["Contempt"] * bestValue / 128;

then is is a complexification compared to current master. Users will want to automatically set the UCI Contempt to zero, etc, and the combo box will still be there :-)

Stefano80 · 2018-02-05T10:41:34Z

Are you sure? I think this implementation does not suffer from the asymmetry of the current one since it is centered on zero and covaries with best value. But maybe I am wrong.

In which case you would not need the combo box but just the old contempt value.

syzygy1 · 2018-02-05T11:05:04Z

I will say again that this patch lets different threads modify the same global Eval::Contempt variable. That should be cause for concern.

mcostalba · 2018-02-05T11:21:03Z

The best would be removing contempt uci option altogether. Maybe contempt is just a special case of a more general formula: C = c0 + c1*bestValue Where current contempt is c0

…

On Monday, February 5, 2018, syzygy1 ***@***.***> wrote: I will say again that this patch lets different threads modify the same global Eval::Contempt variable. That should be cause for concern. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1394 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDGAW1YY9rflJ1cg6sTIu2b42ZWLBzXks5tRuBhgaJpZM4R4-2S> .

mcostalba · 2018-02-05T11:24:34Z

Maybe some tuning session would help. I foresee 2 sessions. One standard in self play, another against a fixed lower level opponent like sf8 of course only master would have tuning code enabled in the latter case.

…

On Monday, February 5, 2018, Marco Costalba ***@***.***> wrote: The best would be removing contempt uci option altogether. Maybe contempt is just a special case of a more general formula: C = c0 + c1*bestValue Where current contempt is c0 On Monday, February 5, 2018, syzygy1 ***@***.***> wrote: > I will say again that this patch lets different threads modify the same > global Eval::Contempt variable. That should be cause for concern. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1394 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABDGAW1YY9rflJ1cg6sTIu2b42ZWLBzXks5tRuBhgaJpZM4R4-2S> > . >

vdbergh · 2018-02-05T11:35:33Z

I already asked this elsewhere.

If contempt depends on bestValue doesn't that create a positive feedback loop (as bestValue again depends on contempt)? It is not meant as a criticism. I just want to make sure I understand things correctly.

mcostalba · 2018-02-05T11:59:51Z

Test will tell, I see no other way to know.

…

On Monday, February 5, 2018, vdbergh ***@***.***> wrote: I already asked this elsewhere. If contempt depends on bestValue doesn't that create a positive feedback loop (as bestValue again depends on contempt)? It is not meant as a criticism. I just want to make sure I understand things correctly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1394 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDGAaSrfCEa86b3M3yCiWkMfOTgKZDAks5tRueGgaJpZM4R4-2S> .

vdbergh · 2018-02-05T12:04:24Z

@mcostalba If you have nothing to say then please keep quiet.

Test will tell, I see no other way to know.

What kind of an answer is that?? How can a test show if there is a positive feedback loop or not? This is a property of the algorithm.

syzygy1 · 2018-02-05T14:08:59Z

Yes, there is some feedback. I had wanted to say that contempt increases by 10% per iteration, but that is probably not true. The base contempt setting is constant. In a neutral position, about 10% is added in the first iteration. In the second iteration another 10% of that 10% is added, so 11% total. Then 11.1% etc. So the amount of feedback seems to stay within limits. Unless I am overlooking something.

Stefano80 · 2018-02-05T14:14:36Z

I think @syzygy1 got it right. The dynamic contempt contribution is a fraction of the best value depending on the search iteration in the way Ronald described it. This it the geometric series over and over again.

vdbergh · 2018-02-05T15:00:55Z

@syzygy1 @Stefano80 Thanks. Now I did the excercise myself (I should have done it before but was busy). Assume u is the true value of the position and c is the static contempt. Let b be bestValue and let cc be the dynamic content. In first approximation the recursion is given by b <- cc+u, cc <- c+b/10. The fix point is given by b=(c+u)*1.111... (as Ronald was saying).

So the expected effect of this patch would be to gradually increase the value of bestValue (if c+u>0), but not in an uncontrolled way. It is not clear to me if any conclusions can be drawn from this. Maybe there is some interaction with A-B search resulting in faster cutoffs?

Stefano80 · 2018-02-05T15:05:02Z

Maybe, maybe not. But be careful: the effect is to gradually increase the fraction of the best value which is added to contempt, not best value itself.

vdbergh · 2018-02-05T15:08:01Z

@Stefano80 Since ultimately bestValue (in first approximation) is the sum to the true value and contempt, bestValue increases as well.

Stefano80 · 2018-02-05T15:09:33Z

Depends whether best value is positive or negative for the root colour, in one case it will increase, in the other decrease.

vdbergh · 2018-02-05T15:10:32Z

@Stefano80 Yes this is what I wrote in my original post. Sorry that I did not repeat it in my reply.

vondele · 2018-02-05T17:31:52Z

@Stefano80 there are two failures in your CI travis test. As @snicolet was pointing out, the first one is a trailing '.' in your bench, the second one is a race condition in your code (as @syzygy1 has pointed out).

Stefano80 · 2018-02-05T19:01:22Z

I fixed the commit message, rebased and squashed all together. The race is condition is most probably harmless, but the decision is yours in the end. If you want to merge but only after fixing the race condition, I will need some help.

At the moments, the SMP test is inconclusive, although slightly positive,

http://tests.stockfishchess.org/tests/view/5a7802290ebc5902971a98c4

What I do not understand: I fixed the bench and now a test that was previously successful is failing.

syzygy1 · 2018-02-05T19:46:44Z

As long as the SMP test is inconclusive, there is reason to think that letting all threads modify Eval::Contempt independently does not work very well.

Perhaps it makes sense to let only the main thread adjust Eval::Contempt?

To remove the data race, make Eval::Contempt an atomic variable and read and write to it with "relaxed" ordering.

Kingdefender · 2018-02-05T19:59:48Z

If contempt is only changed at the root at every iteration, would that (race condition) really be a problem, I mean could the be some slowdown or something because of the contempt variable being accessed by multiple threads? Don't know much about that but seems a bit improbable. And it would quickly become less of a problem, once the time between iterations increases? With more threads, the effect of re-searching that you get from contempt is expected to be less because all the threads already do this re-searchng...

mcostalba · 2018-02-05T20:09:37Z

Make Eval::Contempt atomic, you don't need anything else. Even relaxed write is a useless overkill considering that this is far from fast path. Just define atomic and that's all. It is used also as self documenting that is a shared variable.

syzygy1 · 2018-02-05T20:58:24Z

Eval::Contempt is read in each call to evaluate().
Making the read relaxed gives the compiler more freedom to reorder instructions.

syzygy1 · 2018-02-05T21:04:55Z

@Kingdefender
It's indeed unlikely to be a performance bottleneck (though perhaps it could slowdown the search a little bit on NUMA machines in early iterations). But the developers already made an effort to remove other races, so it's not so nice to introduce a new one.

A more serious question is what this all means for the effect of the patch on SMP.
With 1 thread, the Eval::Contempt value (which affects all evaluations) is changed only at the beginning of an iteration. With more threads, it is changed by all threads in an essentially random order at random times. That is quite different.

vdbergh · 2018-02-07T10:23:30Z

@Stefano80 The idea is to have independent evidence of elo gain (science is based on repeatability of experiments). If the test is inconclusive then there is no independent evidence of elo gain.

But I wonder why you reject a serious test ?? I agree 100000 games is a lot, but given the novelty of the patch I think it is warranted. SPRT(-3,1) tests routinely run for 100000 games and nobody complains about this.

Stefano80 · 2018-02-07T10:29:42Z

@vdbergh I don't understand what you are actually arguing about. I am just testing to check whether this capping mechanism is actually as safe as I think it is. We already have independent evidence of Elo gain from a lot of different fishtest contributors. This is the reason why we have a distributed test framework with SPRT tests.

You seem to be extremely convinced that dynamic contempt is no good. I will not reply to you anymore until you adopt a more realistic position.

vdbergh · 2018-02-07T10:39:50Z

@Stefano80

We already have independent evidence of Elo gain from a lot of different fishtest contributors.

Sorry, I do not know where you get that from. There is only one single test that has shown an elo gain for contempt in self play with a statistically significant result and that is yours. Given that so many tests have been run, and given the unknown principle by which the contempt mechanism works (in self play), asking for an independent confirmation to remove all doubts that your test was not a statistical fluke (which of course happens) is just standard scientific practice. Moreover such an independent confirmation can be trivially achieved.

But as I said. I am just recording my personal opinion so that I can refer to it later.

vdbergh · 2018-02-07T11:14:32Z

Also, whereas Stefano's test suggests that convincingly "static contempt"+"dynamic adjustment" > "static contempt" if "static contempt"=20 the recent failed test

http://tests.stockfishchess.org/tests/view/5a7986450ebc5902971a9979

does not confirm this for "static contempt"=0. This is at least somewhat puzzling as generally an idea - in this case the dynamic adjustment - should be able to stand on its own feet, especially if it would give such a convincing elo gain as suggested by Stefano's test.

NKONSTANTAKIS · 2018-02-07T11:57:50Z

This is helping the contempt, but only if contempt exists.
The yellow test of dynamic0 vs 0 just shows that it has no effect on 0.
The red test of dynamic0 vs 20 is just another evidence of the power of 20.

We are not going to doubt any (0,5) passed test on principle, as @mcostalba has said many times we are gladly accepting flukes because they happen 1 time out of 20.

But this is clearly not something that can replace base contempt as hoped.

I think that because contempt manipulation is a new thing, many implementations will come to replace the initial ones and by accepting this it does not mean that it will stay forever. We will have in mind that in future unified contempt tries this code can be removed or altered.

Moving on is better than debating. Just accept and thus enable outside testing of the world and then proceed accordingly with the reports.

@vdbergh Contempt is a new concept which is puzzling for many, nothing to worry about. Things will get clear in due time.

vdbergh · 2018-02-07T12:31:02Z

Not doubting SPRT tests is good practice for patches that follow the standard pattern of improving eval and tweaking search. The working of such patches is clearly understood - both from a theoretical and a practical point of view - and an occasional false positive is totally harmless.

IMHO one should be more strict about patches whose working is not understood. Before expending a lot of energy on them one should make absolutely sure that they really do gain elo. Scientists say: extraordinary claims require extraordinary proof.

Recall that the dynamic contempt attempt from 2014 also first passed STC and LTC testing but then ultimately turned out not to work.

atumanian · 2018-02-07T14:41:51Z

@Stefano80,

Ah ok, I will try it. But why it is so?

Such an initialization of an object of type std::atomic<Score> requires use of the copy constructor. But it is deleted in this class: http://en.cppreference.com/w/cpp/atomic/atomic/atomic

Stefano80 · 2018-02-07T14:45:10Z

Thanks @atumanian for pointing to the right resource.

atumanian · 2018-02-07T15:09:11Z

@Stefano80, you are welcome. C++17 may have changed this. In the new version of C++ this initialization may work due to copy elision. But I haven't tried this.

leesailer · 2018-02-07T20:32:59Z

By coincidence, I just watched a CPPCON youtube video about atomic that showed some surprising things that you cannot do. Initialization syntax was one of them. Maybe it will help.

https://www.youtube.com/watch?v=ZQFzMfHIxng

lee

Stefano80 · 2018-02-08T06:57:47Z

Thx to all, now it looks like the CI tests work, although two of them are taking very long. I scheduled a LTC multithreading test.

mcostalba · 2018-02-08T10:09:09Z

@Stefano80 Some coding style notes:

please insert #include in alphabetic order
Don't need #include in .cpp becuase it is already in header
Add a space bestValue/10; -> bestValue / 10;
I would not use 'modification' temporary but the below

              contempt  = bestValue >  500?  50:
                          bestValue < -500? -50:
                          bestValue / 10;
              contempt += Options["Contempt"] * PawnValueEg / 100; // From centipawns

mcostalba · 2018-02-08T10:20:54Z

...or even better, the opposite:

              contempt = Options["Contempt"] * PawnValueEg / 100; // From centipawns
              contempt += bestValue >  500?  50:  // Dynamic contempt
                          bestValue < -500? -50:
                          bestValue / 10;

mcostalba · 2018-02-08T10:23:20Z

..finally, I would express the bestvalue limit in terms of PawnValueEg , something like:

              contempt += bestValue >  2 * PawnValueEg ?  50:  // Dynamic contempt
                          bestValue < -2 * PawnValueEg ? -50:
                          bestValue / 10;

Stefano80 · 2018-02-08T11:21:42Z

Thx Marco, will do soon.

Stefano80 · 2018-02-08T18:34:02Z

Hi @snicolet, all subsequent tests after the initial SPRT are at around +1 Elo.

Are you going to commit?

If not I will save the time and close the PR before implementing Marco's suggestions.

snicolet · 2018-02-08T22:28:05Z

@Stefano80 My current inclination is to commit this idea, it looks promising. But there is no need to rush, I want that we take the time to get it right. We can wait for the tests to finish.

You can implement Marco's style suggestions on top of this dynamicContempt branch, they are good ones and we shall keep this branch as our working branch for now.

Concerning the multithreaded Contempt, it is good that we have managed to make it atomic and change it in search(). Isn't there an uninitialized variable problem however, if we use a code path where we call evaluate(pos) outside of search(). For instance, if I type the following debugging sequence in a terminal:

./stockfish
position startpos
d
eval
go depth 20
d
eval

The first eval will call Eval::trace(pos), which will in turn call Evaluation<TRACE>(pos).value()with an uninitialized Eval::Contempt variable (or is it implicitly initialized to SCORE_ZERO by the atomic class? anyway not our default contempt of 18 centipawns), while the second eval will use the contempt left by the search. So we have two debugging eval commands which output different values, not a good thing.

So some details to get right, but again, on the grand scale, I am very pleased that the idea works, you imagine :-)

Nordlandia · 2018-02-08T22:32:50Z

Will dynamic contempt backfire in various imbalance position which usually favour Komodo?

Assuming SF vs K.

snicolet · 2018-02-09T00:20:00Z

The bug for the 'eval' debugging command is fixed in master in this commit: 211ebc5

Stefano80 · 2018-02-09T06:51:35Z

Yes, I noticed the problem, the eval contempt is initialized with 0 it seems.

The problem with the value not being initialized with the default contempt is already present in master, right?

Stefano80 · 2018-02-09T10:21:48Z

So, I implemented the changes suggested by @mcostalba (with one small difference), squashed, rebased and updated the bench.

mcostalba · 2018-02-09T11:48:14Z

Why mixing PawnValueEg and PawnValueMg?

This could raise some 'hmmm' in people reading the code.

…by 10%. Include suggestions from Marco Costalba, Aram Tumanian, Ronald de Man, Stephane Nicolet. Bench: 5791090

Stefano80 · 2018-02-09T12:02:53Z

Oops, no reasons. I just am so used to code referencing the MG value that I did not notice that we have PawnValueEg here. Fixed.

snicolet · 2018-02-09T18:22:37Z

Merged via this commit : cb13243 . Time for a code freeze during a couple of days so that people can prepare optimized version for TCEC is they feel like :-)

I have merged the version of "Dynamical Contempt" which was currently running with good results at LTC and in multithread mode (so using the [-50..50] cap instead of Marco's [-48..48] resulting from the cleaner code with PawnValueEg), because it was safer to use a tested version.

During the code freeze we can test the interval [-2 * PawnValueEg .. 2 * PawnValueEg] for the cap instead of [-500..500] as a SPRT(-3..1), and I am sure that it will pass easily.

Bravo à tous :-)

Bench: 5791090

Stefano80 referenced this pull request in Stefano80/Stockfish Feb 5, 2018

Cut the bull's head

b641229

Stefano80 mentioned this pull request Feb 5, 2018

Don’t score and sort all captures in RECAPTURES stage. #1395

Closed

Stefano80 force-pushed the dynamicContempt branch from 9ed5f64 to cee1828 Compare February 5, 2018 18:52

Stefano80 force-pushed the dynamicContempt branch from 0eb9ea3 to 6334237 Compare February 9, 2018 10:20

Make contempt dependent on current score and reduce default contempt …

2e4f1b3

…by 10%. Include suggestions from Marco Costalba, Aram Tumanian, Ronald de Man, Stephane Nicolet. Bench: 5791090

Stefano80 force-pushed the dynamicContempt branch from 6334237 to 2e4f1b3 Compare February 9, 2018 12:02

snicolet closed this Feb 9, 2018

snicolet referenced this pull request in snicolet/Stockfish Feb 10, 2018

Use PawnValueEg to express the caping interval

5ea5fa6

Bench: 5791090

NKONSTANTAKIS mentioned this pull request Nov 13, 2018

Contempt 24 #1806

Closed

Dynamic contempt #1394

Dynamic contempt #1394

Conversation

Stefano80 commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

mcostalba commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

snicolet commented Feb 5, 2018 • edited Loading

Stefano80 commented Feb 5, 2018

syzygy1 commented Feb 5, 2018

mcostalba commented Feb 5, 2018 via email

mcostalba commented Feb 5, 2018 via email

vdbergh commented Feb 5, 2018

mcostalba commented Feb 5, 2018 via email

vdbergh commented Feb 5, 2018

syzygy1 commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

vdbergh commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

vdbergh commented Feb 5, 2018 • edited Loading

Stefano80 commented Feb 5, 2018

vdbergh commented Feb 5, 2018

vondele commented Feb 5, 2018

Stefano80 commented Feb 5, 2018

syzygy1 commented Feb 5, 2018

Kingdefender commented Feb 5, 2018

mcostalba commented Feb 5, 2018 • edited Loading

syzygy1 commented Feb 5, 2018

syzygy1 commented Feb 5, 2018

vdbergh commented Feb 7, 2018 • edited Loading

Stefano80 commented Feb 7, 2018

vdbergh commented Feb 7, 2018 • edited Loading

vdbergh commented Feb 7, 2018

NKONSTANTAKIS commented Feb 7, 2018

vdbergh commented Feb 7, 2018

atumanian commented Feb 7, 2018 • edited Loading

Stefano80 commented Feb 7, 2018

atumanian commented Feb 7, 2018 • edited Loading

leesailer commented Feb 7, 2018

Stefano80 commented Feb 8, 2018

mcostalba commented Feb 8, 2018

mcostalba commented Feb 8, 2018 • edited Loading

mcostalba commented Feb 8, 2018

Stefano80 commented Feb 8, 2018

Stefano80 commented Feb 8, 2018

snicolet commented Feb 8, 2018 • edited Loading

Nordlandia commented Feb 8, 2018

snicolet commented Feb 9, 2018 • edited Loading

Stefano80 commented Feb 9, 2018

Stefano80 commented Feb 9, 2018

mcostalba commented Feb 9, 2018

Stefano80 commented Feb 9, 2018

snicolet commented Feb 9, 2018

snicolet commented Feb 5, 2018 •

edited

Loading

vdbergh commented Feb 5, 2018 •

edited

Loading

mcostalba commented Feb 5, 2018 •

edited

Loading

vdbergh commented Feb 7, 2018 •

edited

Loading

vdbergh commented Feb 7, 2018 •

edited

Loading

atumanian commented Feb 7, 2018 •

edited

Loading

atumanian commented Feb 7, 2018 •

edited

Loading

mcostalba commented Feb 8, 2018 •

edited

Loading

snicolet commented Feb 8, 2018 •

edited

Loading

snicolet commented Feb 9, 2018 •

edited

Loading