Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threefold repetition detection #925

Closed
wants to merge 3 commits into from

Conversation

saproj
Copy link

@saproj saproj commented Dec 13, 2016

Implement a threefold repetition detection. Below are the examples of problems fixed by this change.

  1. Loosing move in a drawn position.
    position fen 8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - - 0 1 moves a1a2 a7a8 a2a1
    The old code suggested a loosing move "bestmove a8a7", the new code suggests "bestmove a8b7" leading to a draw.

  2. Incorrect evaluation (happened in a real game in TCEC Season 9).
    position fen 4rbkr/1q3pp1/b3pn2/7p/1pN5/1P1BBP1P/P1R2QP1/3R2K1 w - - 5 31 moves e3d4 h8h6 d4e3
    The old code evaluated it as "cp 0", the new code evaluation is around "cp -50" which is adequate.

Brings 0.5-1 ELO gain. Passes [-3.00,1.00].

STC: http://tests.stockfishchess.org/tests/view/584ece040ebc5903140c5aea
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 47744 W: 8537 L: 8461 D: 30746

LTC: http://tests.stockfishchess.org/tests/view/584f134d0ebc5903140c5b37
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 36775 W: 4739 L: 4639 D: 27397

@MichaelB7
Copy link
Contributor

A nice effort with a satisfactory result. Congrats.

@Stefano80
Copy link
Contributor

Why -3, 1? It adds code, so it should be 0, 5

@saproj
Copy link
Author

saproj commented Dec 13, 2016

Can be considered a bug fix, therefore tested for no-regression.

@joergoster
Copy link
Contributor

@Stefano80 Similar tries were also tested with bugfix bounds in the past. But I admit this patch adds a LOT of code. OTOH, this is the first one that passed STC and LTC!
I don't envy the maintainer(s) for the decision they have to make.
@saproj Congrats!

@atumanian
Copy link

Good work! The 3-fold repetition blindness has been always annoying.
But why does this change the bench signature? The bench test doesn't have pre-root positions, so the patch shouldn't affect it.

@saproj
Copy link
Author

saproj commented Dec 13, 2016

Because the root position itself is reenterable unless it is a repetition (like in the case of bench). Why? I tested the alternative and it was weaker.

@syzygy1
Copy link
Contributor

syzygy1 commented Dec 14, 2016

It does not seem to make sense to allow a first repetition of the root position. Did keeping the old behaviour for the root position fail on fishtest?

A possible way to remove the st->draw flags: after setting up the position in uci.cpp, walk from the root back into history and set all non-repeated st->key values to zero.

@saproj
Copy link
Author

saproj commented Dec 14, 2016

It does not seem to make sense to allow a first repetition of the root position. Did keeping the old behaviour for the root position fail on fishtest?

Yes, it did.

A possible way to remove the st->draw flags: after setting up the position in uci.cpp, walk from the root back into history and set all non-repeated st->key values to zero.

Interesting idea. I'll try it.

@vdbergh
Copy link
Contributor

vdbergh commented Dec 14, 2016

@saproj The fact that the first version failed SPRT(-3,1) does not mean it is weaker. You cannot draw any conclusion from a failed SPRT(-3,1).

Also in case of a small regression (like -1 elo as could be the case here) you can make a SPRT(-3,1) pass by repeating it enough times (possibly combining it with placebo changes a la "Take 1", "Take 2", "optimized version" etc...).

I think in this case the maintainers should simply be ready to (possibly) sacrifice 1 elo in order to repair what is obviously a bug in Stockfish.

@MichaelB7
Copy link
Contributor

I know I don't have a vote - but I would vote for this patch. There are many users of Stockfish that would appreciate the nuance corrected here.

@Stefano80
Copy link
Contributor

I really don't understand why it should be tested with [-3, 1]. It it adds more than 30 lines of code. It does not fix anything, and for certain not a bug. It has not been approved in advance by maintainers. It is just an arbitrary abuse of the framework. The guidelines say:

These tests are also used for bug fixes and other special cases, but only after being discussed and approved in advance to avoid people testing with no-regression mode becoming their preferred toy, instead of using the stricter standard mode.

Please either stick to the guidelines or propose to change them.

@vdbergh : it has been repeated hundreds of times that conter-intuitive eval in an arbitrary class of positions is not considered a bug.

@Mindbreaker1
Copy link

I think 30 lines is an exaggeration. Some is documentation, spacing, and brackets. I see 13 without that. It fixes a problem and it does not appear to hurt. And is it possible that this might help when contempt is present achieving more at a lower setting? That is just a guess, not an assertion.

At the very least, it means less grumbles on the forum. That has to be worth something.

@saproj
Copy link
Author

saproj commented Dec 14, 2016

@vdbergh I did direct comparison locally. Not allowing to reenter the root position is, surprisingly, worse than allowing it. By several ELO points. Why? It's a mystery.

@saproj
Copy link
Author

saproj commented Dec 14, 2016

@Stefano80 Just to clarify my position: I believe I'm fixing a bug. Surprised by your "for certain not a bug".

@ddugovic
Copy link

Regardless we can agree, "It has not been approved in advance by maintainers."

The old code evaluated it as "cp 0", the new code evaluation is around "cp -50" which is adequate.

How is "cp 0" a bug? (In my fork I do address this issue, however I cannot prove "cp 0" is a bug according to terminology used by maintainers.)

@jhellis3
Copy link
Contributor

To understand why this is a "bug" is fairly straightforward - the current behavior simply does not adhere to the actual rules of chess. Using a small eval difference is a bit disingenuous. Say the eval is -10, but human players repeated the previous position. Showing an eval of draw is clearly in error, since the first repetition does not result in an immediate draw under the actual rules of the game. While it may not make any difference for Stockfish in playing a game, it does cause a clear discontinuity in evaluation when used for analysis. As analysis is the primary use case for the engine, ensuring correct output should at least be worthy of consideration, IMHO.

@saproj
Copy link
Author

saproj commented Dec 15, 2016

@syzygy1 Zeroing non-repeated st->key (at least the way I implemented it) is not better. See http://tests.stockfishchess.org/tests/view/58514cd80ebc5903140c5c04

@saproj
Copy link
Author

saproj commented Dec 15, 2016

@ddugovic,
"cp 0" is a bug because it is a wrong evaluation. This 0 eval may be assigned to a position which is in fact very different from equality.

You mentioned you address this issue in your fork. I do not know about it. So, is there also a fix for the 1st problem from my list? I mean the problem when SF makes a loosing move in a drawn position.

@ddugovic
Copy link

My fork's change probably looks familiar to some; unfortunately I forget who the original author is that I copied the idea from.

People keep using the words "bug" and "fix" and while I only endorse the word "problem" my fork does not play a8a7 in that study position.

@saproj
Copy link
Author

saproj commented Dec 15, 2016

@ddugovic Familiar idea. It introduces an ELO-stealing slowdown because of additional loop iterations.

@Rocky640
Copy link

Rocky640 commented Dec 15, 2016

Congrats to the author !!!

There is one helper function, and a few line changes.
This looks quite reasonable to me.

Let me take some examples:
Let's say that engine would systematically output cp=0 whenever the last move was e.p.
I think we would want to fix this.

If search would never consider promotions to bishop, (hello old Rybka),
I think we would want to fix this. No ELO improvement,
but we want to design an engine which plays according to the rules of chess.

If I have a car which runs perfectly, but would stop working each time I "leave from work" (but would be ok if I leave from all the other places), I would say that this is a problem.

As it is, each time current master "leaves from a repetition", it virtually stops doing its duty which is
to keep trying to output the most realistic evaluation according to the available information.

It's not like a specific funny position which is misevaluated. This is a systemic error which does not really depend of the position. It depends only to the fact that there was a repetition in the game history.

We expect an engine to properly handle e.p., castling rights, 50 moves rule, (which all relates to the game history) AND also repetitions.

It might even be a small ELO gainer. In a time limited game, SF might go for a first repetition, and on the next move. using the info in the TT, and with the time allocated by the time management, it might go for a better fighting continuation or simply keep trying with something else.

@lantonov
Copy link

Having tried myself to fix this problem (bug?) and failing badly, I know how complicated it is if you dig deeper. The author of this patch did a good job, improving on his previous idea.

@syzygy1
Copy link
Contributor

syzygy1 commented Dec 15, 2016

@saproj
I would not expect zeroing st->key instead of adding st->draw flags to do measurably better, but it cannot do worse and, arguably, it is a slightly simpler solution (not needing an additional field in StateInfo, for example).

@atumanian
Copy link

@syzygy1, but your version treats the root position differently: it doesn't allow repeating it even once during a search, unlike saproj's version. He has already written about this here.

@saproj
Copy link
Author

saproj commented Dec 15, 2016

@syzygy1 This idea of zeroing st->key looked appealing to me at first. But now that I have tested it, I want to stick to adding st->draw. By the way, this new field resides in an alignment gap (under x86_64) and StateInfo's size did not change.

@ElbertoOne
Copy link

I'm in favor of this patch, even if it adds some complexity. To remove all doubt, we might consider a reverse simplification test (master against patch instead of patch versus master).

@jcalovski
Copy link

jcalovski commented Dec 16, 2016

If I were a maintainer i'd list a few "exceptions":

  • 3 fold rep
  • fortress detection
  • reworked contempt
  • and so on...

For which i'd relax the rules slightly.

@stockfishdeveloper
Copy link

I guess this patch cuts to the root of Stockfish's purpose.
The question it brings up is:
Should Stockfish be made for playing or analyzing?

@joergoster
Copy link
Contributor

@mcostalba Nice! But this again excludes the root position to be repeated once without directly assigning a draw score ...

@vdbergh
Copy link
Contributor

vdbergh commented Dec 25, 2016

@mcostalba If I read correctly this is precisely the version that failed several times in the past. It seemed to be a -1 elo regression. So if you keep on testing it, it will eventually pass... which would be a good thing IMHO.

@mcostalba
Copy link

mcostalba commented Dec 25, 2016 via email

@vdbergh
Copy link
Contributor

vdbergh commented Dec 25, 2016

@mcostalba AFAICS your version does not contain the optimization "calc_draw". This is probably a placebo but one cannot be sure.

@atumanian
Copy link

atumanian commented Dec 25, 2016

This test shows that it's better to allow repetition of the root position: http://tests.stockfishchess.org/tests/view/58514cd80ebc5903140c5c04
@syzygy1, I understand the difference between repeating the root position and repeating positions after the root. My only argument for repeating the root is the results of tests.

@vdbergh
Copy link
Contributor

vdbergh commented Dec 25, 2016

@mcostalba
Copy link

mcostalba commented Dec 26, 2016 via email

@mcostalba
Copy link

mcostalba commented Dec 26, 2016

@vdbergh regarding the calc_draw(), well if the patch passed just because of that then it is better to fail and do not commit, it means it passed by pure luck. In real games the cases where we reach plies before root are very very few (measured with debug instrumentation) and even in those cases, the extra search to reach the game ply base is absolutely unmeasurable. Again, if this is the case then it is much more probable to explain the successful test by a statistical fluke.

@vdbergh
Copy link
Contributor

vdbergh commented Dec 26, 2016

@mcostalba Thanks for repeating my points (your last post). As for your earlier post. I think the old version from 2014 is equivalent to your "new" one (there is nothing tricky about this). I have not checked it carefully however (it was not my patch).

mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Dec 26, 2016
mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Dec 26, 2016
@vdbergh
Copy link
Contributor

vdbergh commented Dec 26, 2016

BTW I was simply responding to your claim: "FYI this is fully equivalent to the original patch but the recursion at root.". This statement is false as the (probably placebo) optimization is dropped. Also: please do not use "FYI". I am perfectly capable of reading your code.

@mcostalba
Copy link

Ok, I have submitted a couple of test: with root position included and excluded from extended draw repetition.

@ddugovic
Copy link

I apologize for the untimeliness of the following idea as I do not fully understand HGM's comment as it applies to Stockfish:

One way to solve [threefold repetition] is to remove all positions from the game history that occurred only once, during loading of a position in AnalyzeMode. This is code that is unlikely to be patched very frequently. E.g. you could spoil the incrementally updated hash key before parsing the moves by XOR'ing it with some constant, and repair it afterwards. And repair the key of any encountered position that would get stored in the move-history table whenever you detect it is a repetition.

Given that the position ... moves command reconstructs the entire game history I wonder, after that command would it be safe (for positions only existing once in the game history) to spoil stp->key (with or without "repair" code)?

@atumanian
Copy link

atumanian commented Dec 26, 2016

@ddugovic, I've actually tried this idea on Fishtest: http://tests.stockfishchess.org/tests/view/58223a4b0ebc5910626b9c64
My version isn't perfect, so you shouldn't take it literally, but you can look at it.
Before every search it reconstructs pre-root states so that all positions which were repeated only once are deleted. The patch adds a lot of code, but the code which tests for a draw is almost unchanged.

@mcostalba
Copy link

mcostalba commented Dec 31, 2016

Both tests passed, so I think I will commit the one with root position excluded from extended draw repetition because it seems the most logical to me.

@atumanian
Copy link

@mcostalba, but only the version with the root position included works correctly with MultiPV > 1, as I have said earlier.

@atumanian
Copy link

We can also make a match between the two versions to decide which is stronger.

mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Jan 1, 2017
mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Jan 1, 2017
@mcostalba
Copy link

@atumanian The argument on MultiPV is not clear, in particular has been tentatively refuted by @syzygy1 and I think his arguments make sense. Overall to include root because of MultiPV leads to a questionable and subjective argumentation for both parts, not to a clear black or white condition. so because it is not clear the benefit in MultiPV case and because, apart form this very specific and debatable point, there is no logical/measurable added benefit, I will commit the version with root excluded.

mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Jan 1, 2017
mcostalba added a commit to mcostalba/Stockfish that referenced this pull request Jan 1, 2017
mcostalba pushed a commit that referenced this pull request Jan 1, 2017
Implement a threefold repetition detection. Below are the examples of
problems fixed by this change.

    Loosing move in a drawn position.
    position fen 8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - - 0 1 moves a1a2 a7a8 a2a1
    The old code suggested a loosing move "bestmove a8a7", the new code suggests "bestmove a8b7" leading to a draw.

    Incorrect evaluation (happened in a real game in TCEC Season 9).
    position fen 4rbkr/1q3pp1/b3pn2/7p/1pN5/1P1BBP1P/P1R2QP1/3R2K1 w - - 5 31 moves e3d4 h8h6 d4e3
    The old code evaluated it as "cp 0", the new code evaluation is around "cp -50" which is adequate.

Brings 0.5-1 ELO gain. Passes [-3.00,1.00].

STC: http://tests.stockfishchess.org/tests/view/584ece040ebc5903140c5aea
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 47744 W: 8537 L: 8461 D: 30746

LTC: http://tests.stockfishchess.org/tests/view/584f134d0ebc5903140c5b37
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 36775 W: 4739 L: 4639 D: 27397

Patch has been rewritten into current form for simplification and
logic slightly changed so that return a draw score if the position
repeats once earlier but after or at the root, or repeats twice
strictly before the root. In its original form, repetition at root
was not returned as an immediate draw.

After retestimng testing both version with SPRT[-3, 1], both passed
succesfully, but this version was chosen becuase more natural. There is
an argument about MultiPV in which an extended draw at root may be sensible.
See discussion here:

   #925

For documentation, current version passed both at STC and LTC:

STC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 51562 W: 9314 L: 9245 D: 33003

LTC
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 115663 W: 14904 L: 14906 D: 85853

bench: 5468995
@mcostalba
Copy link

Merged with 881a9df

@mcostalba mcostalba closed this Jan 1, 2017
@atumanian
Copy link

@mcostalba, I've created an issue where I've tried to explain my point with an example position: #948.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet