Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic contempt #1394

Closed

Conversation

Stefano80
Copy link
Contributor

Adjust contempt based on current score.

STC
LLR: 2.95 (-2.94,2.94) [0.00,5.00]
Total: 110052 W: 24614 L: 23938 D: 61500

LTC
LLR: 2.97 (-2.94,2.94) [0.00,5.00]
Total: 16470 W: 2896 L: 2705 D: 10869

@Stefano80
Copy link
Contributor Author

Let me open a PR.

There has been some discussion going on here

@Stefano80
Copy link
Contributor Author

The reprosearch testing is failing (locally too), and I do not understand why. Any ideas?

@Stefano80
Copy link
Contributor Author

Ah, I found out. I have to initialize the contempt.

@mcostalba
Copy link

Interesting idea.

Maybe next step is to remove static contempt and use only dynamic contempt, this [-3, 1], if succeeds would get rid of combo boxes and other fancy stuff in one go, simplifying a lot UI.

@Stefano80
Copy link
Contributor Author

Thx @mcostalba. I think your suggestion is smart indeed.

http://tests.stockfishchess.org/tests/view/5a782de40ebc5902971a98cf

Stefano80 referenced this pull request in Stefano80/Stockfish Feb 5, 2018
@snicolet
Copy link
Member

snicolet commented Feb 5, 2018

If we use this formula:

    contempt =  bestValue / 10;

then it is a simplification compared to current master (it removes the UCI Contempt option)

If we use

   contempt =  Options["Contempt"] * bestValue / 128;

then is is a complexification compared to current master. Users will want to automatically set the UCI Contempt to zero, etc, and the combo box will still be there :-)

@Stefano80
Copy link
Contributor Author

Are you sure? I think this implementation does not suffer from the asymmetry of the current one since it is centered on zero and covaries with best value. But maybe I am wrong.

In which case you would not need the combo box but just the old contempt value.

@syzygy1
Copy link
Contributor

syzygy1 commented Feb 5, 2018

I will say again that this patch lets different threads modify the same global Eval::Contempt variable. That should be cause for concern.

@mcostalba
Copy link

mcostalba commented Feb 5, 2018 via email

@mcostalba
Copy link

mcostalba commented Feb 5, 2018 via email

@vdbergh
Copy link
Contributor

vdbergh commented Feb 5, 2018

I already asked this elsewhere.

If contempt depends on bestValue doesn't that create a positive feedback loop (as bestValue again depends on contempt)? It is not meant as a criticism. I just want to make sure I understand things correctly.

@mcostalba
Copy link

mcostalba commented Feb 5, 2018 via email

@vdbergh
Copy link
Contributor

vdbergh commented Feb 5, 2018

@mcostalba If you have nothing to say then please keep quiet.

Test will tell, I see no other way to know.

What kind of an answer is that?? How can a test show if there is a positive feedback loop or not? This is a property of the algorithm.

@syzygy1
Copy link
Contributor

syzygy1 commented Feb 5, 2018

Yes, there is some feedback. I had wanted to say that contempt increases by 10% per iteration, but that is probably not true. The base contempt setting is constant. In a neutral position, about 10% is added in the first iteration. In the second iteration another 10% of that 10% is added, so 11% total. Then 11.1% etc. So the amount of feedback seems to stay within limits. Unless I am overlooking something.

@Stefano80
Copy link
Contributor Author

I think @syzygy1 got it right. The dynamic contempt contribution is a fraction of the best value depending on the search iteration in the way Ronald described it. This it the geometric series over and over again.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 5, 2018

@syzygy1 @Stefano80 Thanks. Now I did the excercise myself (I should have done it before but was busy). Assume u is the true value of the position and c is the static contempt. Let b be bestValue and let cc be the dynamic content. In first approximation the recursion is given by b <- cc+u, cc <- c+b/10. The fix point is given by b=(c+u)*1.111... (as Ronald was saying).

So the expected effect of this patch would be to gradually increase the value of bestValue (if c+u>0), but not in an uncontrolled way. It is not clear to me if any conclusions can be drawn from this. Maybe there is some interaction with A-B search resulting in faster cutoffs?

@Stefano80
Copy link
Contributor Author

Maybe, maybe not. But be careful: the effect is to gradually increase the fraction of the best value which is added to contempt, not best value itself.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 5, 2018

@Stefano80 Since ultimately bestValue (in first approximation) is the sum to the true value and contempt, bestValue increases as well.

@Stefano80
Copy link
Contributor Author

Depends whether best value is positive or negative for the root colour, in one case it will increase, in the other decrease.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 5, 2018

@Stefano80 Yes this is what I wrote in my original post. Sorry that I did not repeat it in my reply.

@vondele
Copy link
Member

vondele commented Feb 5, 2018

@Stefano80 there are two failures in your CI travis test. As @snicolet was pointing out, the first one is a trailing '.' in your bench, the second one is a race condition in your code (as @syzygy1 has pointed out).

@Stefano80
Copy link
Contributor Author

I fixed the commit message, rebased and squashed all together. The race is condition is most probably harmless, but the decision is yours in the end. If you want to merge but only after fixing the race condition, I will need some help.

At the moments, the SMP test is inconclusive, although slightly positive,

http://tests.stockfishchess.org/tests/view/5a7802290ebc5902971a98c4

What I do not understand: I fixed the bench and now a test that was previously successful is failing.

@syzygy1
Copy link
Contributor

syzygy1 commented Feb 5, 2018

As long as the SMP test is inconclusive, there is reason to think that letting all threads modify Eval::Contempt independently does not work very well.

Perhaps it makes sense to let only the main thread adjust Eval::Contempt?

To remove the data race, make Eval::Contempt an atomic variable and read and write to it with "relaxed" ordering.

@Kingdefender
Copy link

If contempt is only changed at the root at every iteration, would that (race condition) really be a problem, I mean could the be some slowdown or something because of the contempt variable being accessed by multiple threads? Don't know much about that but seems a bit improbable. And it would quickly become less of a problem, once the time between iterations increases? With more threads, the effect of re-searching that you get from contempt is expected to be less because all the threads already do this re-searchng...

@mcostalba
Copy link

mcostalba commented Feb 5, 2018

Make Eval::Contempt atomic, you don't need anything else. Even relaxed write is a useless overkill considering that this is far from fast path. Just define atomic and that's all. It is used also as self documenting that is a shared variable.

@syzygy1
Copy link
Contributor

syzygy1 commented Feb 5, 2018

Eval::Contempt is read in each call to evaluate().
Making the read relaxed gives the compiler more freedom to reorder instructions.

@syzygy1
Copy link
Contributor

syzygy1 commented Feb 5, 2018

@Kingdefender
It's indeed unlikely to be a performance bottleneck (though perhaps it could slowdown the search a little bit on NUMA machines in early iterations). But the developers already made an effort to remove other races, so it's not so nice to introduce a new one.

A more serious question is what this all means for the effect of the patch on SMP.
With 1 thread, the Eval::Contempt value (which affects all evaluations) is changed only at the beginning of an iteration. With more threads, it is changed by all threads in an essentially random order at random times. That is quite different.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 7, 2018

@Stefano80 The idea is to have independent evidence of elo gain (science is based on repeatability of experiments). If the test is inconclusive then there is no independent evidence of elo gain.

But I wonder why you reject a serious test ?? I agree 100000 games is a lot, but given the novelty of the patch I think it is warranted. SPRT(-3,1) tests routinely run for 100000 games and nobody complains about this.

@Stefano80
Copy link
Contributor Author

@vdbergh I don't understand what you are actually arguing about. I am just testing to check whether this capping mechanism is actually as safe as I think it is. We already have independent evidence of Elo gain from a lot of different fishtest contributors. This is the reason why we have a distributed test framework with SPRT tests.

You seem to be extremely convinced that dynamic contempt is no good. I will not reply to you anymore until you adopt a more realistic position.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 7, 2018

@Stefano80

We already have independent evidence of Elo gain from a lot of different fishtest contributors.

Sorry, I do not know where you get that from. There is only one single test that has shown an elo gain for contempt in self play with a statistically significant result and that is yours. Given that so many tests have been run, and given the unknown principle by which the contempt mechanism works (in self play), asking for an independent confirmation to remove all doubts that your test was not a statistical fluke (which of course happens) is just standard scientific practice. Moreover such an independent confirmation can be trivially achieved.

But as I said. I am just recording my personal opinion so that I can refer to it later.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 7, 2018

Also, whereas Stefano's test suggests that convincingly "static contempt"+"dynamic adjustment" > "static contempt" if "static contempt"=20 the recent failed test

http://tests.stockfishchess.org/tests/view/5a7986450ebc5902971a9979

does not confirm this for "static contempt"=0. This is at least somewhat puzzling as generally an idea - in this case the dynamic adjustment - should be able to stand on its own feet, especially if it would give such a convincing elo gain as suggested by Stefano's test.

@NKONSTANTAKIS
Copy link

This is helping the contempt, but only if contempt exists.
The yellow test of dynamic0 vs 0 just shows that it has no effect on 0.
The red test of dynamic
0 vs 20 is just another evidence of the power of 20.

We are not going to doubt any (0,5) passed test on principle, as @mcostalba has said many times we are gladly accepting flukes because they happen 1 time out of 20.

But this is clearly not something that can replace base contempt as hoped.

I think that because contempt manipulation is a new thing, many implementations will come to replace the initial ones and by accepting this it does not mean that it will stay forever. We will have in mind that in future unified contempt tries this code can be removed or altered.

Moving on is better than debating. Just accept and thus enable outside testing of the world and then proceed accordingly with the reports.

@vdbergh Contempt is a new concept which is puzzling for many, nothing to worry about. Things will get clear in due time.

@vdbergh
Copy link
Contributor

vdbergh commented Feb 7, 2018

Not doubting SPRT tests is good practice for patches that follow the standard pattern of improving eval and tweaking search. The working of such patches is clearly understood - both from a theoretical and a practical point of view - and an occasional false positive is totally harmless.

IMHO one should be more strict about patches whose working is not understood. Before expending a lot of energy on them one should make absolutely sure that they really do gain elo. Scientists say: extraordinary claims require extraordinary proof.

Recall that the dynamic contempt attempt from 2014 also first passed STC and LTC testing but then ultimately turned out not to work.

@atumanian
Copy link

atumanian commented Feb 7, 2018

@Stefano80,

Ah ok, I will try it. But why it is so?

Such an initialization of an object of type std::atomic<Score> requires use of the copy constructor. But it is deleted in this class: http://en.cppreference.com/w/cpp/atomic/atomic/atomic

@Stefano80
Copy link
Contributor Author

Thanks @atumanian for pointing to the right resource.

@atumanian
Copy link

atumanian commented Feb 7, 2018

@Stefano80, you are welcome. C++17 may have changed this. In the new version of C++ this initialization may work due to copy elision. But I haven't tried this.

@leesailer
Copy link

By coincidence, I just watched a CPPCON youtube video about atomic that showed some surprising things that you cannot do. Initialization syntax was one of them. Maybe it will help.

https://www.youtube.com/watch?v=ZQFzMfHIxng

lee

@Stefano80
Copy link
Contributor Author

Thx to all, now it looks like the CI tests work, although two of them are taking very long. I scheduled a LTC multithreading test.

@mcostalba
Copy link

@Stefano80 Some coding style notes:

  1. please insert #include in alphabetic order
  2. Don't need #include in .cpp becuase it is already in header
  3. Add a space bestValue/10; -> bestValue / 10;
  4. I would not use 'modification' temporary but the below
              contempt  = bestValue >  500?  50:
                          bestValue < -500? -50:
                          bestValue / 10;
              contempt += Options["Contempt"] * PawnValueEg / 100; // From centipawns

@mcostalba
Copy link

mcostalba commented Feb 8, 2018

...or even better, the opposite:

              contempt = Options["Contempt"] * PawnValueEg / 100; // From centipawns
              contempt += bestValue >  500?  50:  // Dynamic contempt
                          bestValue < -500? -50:
                          bestValue / 10;

@mcostalba
Copy link

..finally, I would express the bestvalue limit in terms of PawnValueEg , something like:

              contempt += bestValue >  2 * PawnValueEg ?  50:  // Dynamic contempt
                          bestValue < -2 * PawnValueEg ? -50:
                          bestValue / 10;

@Stefano80
Copy link
Contributor Author

Thx Marco, will do soon.

@Stefano80
Copy link
Contributor Author

Hi @snicolet, all subsequent tests after the initial SPRT are at around +1 Elo.

Are you going to commit?

If not I will save the time and close the PR before implementing Marco's suggestions.

@snicolet
Copy link
Member

snicolet commented Feb 8, 2018

@Stefano80 My current inclination is to commit this idea, it looks promising. But there is no need to rush, I want that we take the time to get it right. We can wait for the tests to finish.

You can implement Marco's style suggestions on top of this dynamicContempt branch, they are good ones and we shall keep this branch as our working branch for now.

Concerning the multithreaded Contempt, it is good that we have managed to make it atomic and change it in search(). Isn't there an uninitialized variable problem however, if we use a code path where we call evaluate(pos) outside of search(). For instance, if I type the following debugging sequence in a terminal:

./stockfish
position startpos
d
eval
go depth 20
d
eval

The first eval will call Eval::trace(pos), which will in turn call Evaluation<TRACE>(pos).value()with an uninitialized Eval::Contempt variable (or is it implicitly initialized to SCORE_ZERO by the atomic class? anyway not our default contempt of 18 centipawns), while the second eval will use the contempt left by the search. So we have two debugging eval commands which output different values, not a good thing.

So some details to get right, but again, on the grand scale, I am very pleased that the idea works, you imagine :-)

@Nordlandia
Copy link

Will dynamic contempt backfire in various imbalance position which usually favour Komodo?

Assuming SF vs K.

@snicolet
Copy link
Member

snicolet commented Feb 9, 2018

The bug for the 'eval' debugging command is fixed in master in this commit: 211ebc5

@Stefano80
Copy link
Contributor Author

Yes, I noticed the problem, the eval contempt is initialized with 0 it seems.

The problem with the value not being initialized with the default contempt is already present in master, right?

@Stefano80
Copy link
Contributor Author

So, I implemented the changes suggested by @mcostalba (with one small difference), squashed, rebased and updated the bench.

@mcostalba
Copy link

Why mixing PawnValueEg and PawnValueMg?

This could raise some 'hmmm' in people reading the code.

…by 10%. Include suggestions from Marco Costalba, Aram Tumanian, Ronald de Man, Stephane Nicolet.

Bench: 5791090
@Stefano80
Copy link
Contributor Author

Oops, no reasons. I just am so used to code referencing the MG value that I did not notice that we have PawnValueEg here. Fixed.

@snicolet
Copy link
Member

snicolet commented Feb 9, 2018

Merged via this commit : cb13243 . Time for a code freeze during a couple of days so that people can prepare optimized version for TCEC is they feel like :-)

I have merged the version of "Dynamical Contempt" which was currently running with good results at LTC and in multithread mode (so using the [-50..50] cap instead of Marco's [-48..48] resulting from the cleaner code with PawnValueEg), because it was safer to use a tested version.

During the code freeze we can test the interval [-2 * PawnValueEg .. 2 * PawnValueEg] for the cap instead of [-500..500] as a SPRT(-3..1), and I am sure that it will pass easily.

Bravo à tous :-)

@snicolet snicolet closed this Feb 9, 2018
snicolet referenced this pull request in snicolet/Stockfish Feb 10, 2018
@NKONSTANTAKIS NKONSTANTAKIS mentioned this pull request Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet