TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 #501

zamar · 2015-11-15T13:06:05Z

Stockfish lost track of the winning line.

Eval dropped from +26.13 to 0.00.

Potentially this is caused by the combination of the following:

3 fold repetitions
Lazy SMP
Transposition table

cuddlestmonkey · 2015-11-15T13:11:56Z

This PGN posted by amhijo does show something not quite right with the handling of the hash entries. I'll reproduce it below together with my description of what is going on.

[Event "?"] 
[Site "?"] 
[Date "????.??.??"] 
[Round "?"] 
[White "New game"] 
[Black "?"] 
[Result "*"] 
[PlyCount "137"] 

1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. Nbd2 c5 6. e4 cxd4 7. e5 Ng4 8. h3 
Nh6 9. Bg2 Nc6 10. O-O Be7 11. Qa4 Bb7 12. Nxd4 Nxd4 13. Bxb7 Rb8 14. Be4 Qc7 
15. Qd1 Nhf5 16. Re1 Qxe5 17. Nb3 Rd8 18. Bf4 Qf6 19. Qd3 Bc5 20. Rad1 Nxb3 21. 
axb3 Nd4 22. Kg2 Nc6 23. h4 a5 24. Qe2 Qe7 25. Qh5 g6 26. Qf3 Nd4 27. Qc3 Qf6 
28. Bd5 Bb4 29. Qd3 O-O 30. Be5 Qf5 31. Qxd4 Bxe1 32. Rxe1 d6 33. Bf6 e5 34. 
Qxb6 Qxf6 35. Qxa5 Kh8 36. b4 g5 37. Rh1 gxh4 38. Rxh4 Qg6 39. Qa3 f5 40. Qf3 
Qg7 41. b5 Rb8 42. b4 Rf6 43. Rh5 Qg6 44. Qe2 f4 45. Be4 Qg7 46. Qf3 Rh6 47. 
Rxh6 Qxh6 48. Qe2 fxg3 49. fxg3 Qg5 50. c5 Rg8 51. Qe1 dxc5 52. bxc5 Rd8 53. b6 
Rd2+ 54. Kg1 Qd8 55. Qe3 Rb2 56. Bf3 Rb1+ 57. Kg2 Rb2+ 58. Kh3 Qf6 59. b7 Qe6+ 
60. g4 h5 61. c6 hxg4+ 62. Bxg4 Qd6 63. Bf5 Qf6 64. Kg4 Rg2+ 65. Kh3 Rc2 66. 
Be4 Rb2 67. Bc2 Rb4 68. Be4 Rb2 69. Bc2 *

Using latest master, single-threaded.

Select move 69 and engage infinite analysis. Immediately evaluated as 0.00 to depth 30+ since Black can repeat the position after move 67 with 69... Rb4. I don't think this is a true 3-fold yet though.

Now, the effect of that is to load the hash with an 0.00 eval for that position with a depth of 30+.

If I now click on the position after 63...Qf6, leaving the analysis running (so no hash clear) Bc2 is now discounted as a candidate move because of this deep 0.00 stored in the hash. Given enough time, so that Bc2 is evaluated to a high enough depth (bearing in mind its now way down the move order, so will have reductions), it will recover, but that would take a long time.

So the 0.00 repetition based eval is being stored in the hash and then being applied to a position where that move would not be a repetition.

Obviously this particular sequence is not an actual game - it's another case of something that interferes with analysis - but is it possible that something similar can happen wherein one thread analyses a line, marks a position (with a rep) as 0.00, and then another thread incorrectly applies that TT score to the non-repped position elsewhere in the tree?

zamar · 2015-11-15T13:39:28Z

@cuddlestmonkey: This is a very old known problem: Bad interaction between Transposition table and 2-fold repetition. It can't be solved without slowing down the engine massively.

cuddlestmonkey · 2015-11-15T13:58:03Z

@zamar Maybe, but it's worth noted that I don't get the same issue using Komodo (another 2-fold rep engine). Komodo 9 evaluates move 69 as 0.00, but evidently isn't using that 0.00 to short-circuit the eval of Bc2 when I switch to move 64.

syzygy1 · 2015-11-15T15:00:20Z

I strongly suspect the graph-history interaction problem also to be the cause of the game 22 problems.

Although I have reproduced the problem with YBW Stockfish (assuming I am correct in thinking that the SF binary from abrok of Thu Oct 15 21:27:52, timestamp 1444969672 is still YBW!!), it would not surprise me if the problem occurs more often with lazy smp. This is because, as I understand, lazy smp lets some threads search deeper than other threads. This could result in one thread searching a position X deeper in the tree (with some key positions for the position X already flagged in history) with relatively high depth, resulting in that position being stored in hash with relatively high depth and a "too low" score (due to the flagged key positions being scored as draw when encountered below in the search of X). When X is then encountered closer to the root by another thread searching less deeply, that thread will accept the "too low" score even though that is wrong.

To reduce the bad effects of the graph-history interaction problem, it seems important to not let threads search at different depths in the endgame.

A paper from 1985 on this problem:
http://wiki.cs.pdx.edu/wurzburg2009/nfp/campbell-ghi.pdf

From the conclusion: "The key in avoiding most occurrences of GHI appears to be iterative deepening". If a position occurs multiple times in the search tree, it should be attempted to first search the occurrence of it that is closest to the root.

cuddlestmonkey · 2015-11-15T15:56:56Z

Another paper:
http://webdocs.cs.ualberta.ca/~mmueller/ps/aaai-ghi.pdf

"The Graph History Interaction (GHI) Problem occurs when the same game position behaves differently when reached via different paths. For example, after following one path a move m may be legal in position p, while after following another path the same move is illegal in p.
Our efficient solution to GHI was instrumental in developing the world's strongest tsume Go solver, and in solving checkers."

syzygy1 · 2015-11-15T17:25:27Z

Unfortunately that "general" solution is only general in a very specific sense. Basically it is of no value for a game-playing engine, but only for game solvers. See http://www.open-chess.org/viewtopic.php?p=17480#p17480

Before I place all the blame on lazy smp threads that search at different depths (even though the problem can be reproduced with YBW versions), the main trigger of the problem might be a particular combination of reductions and extensions that may result in, say, a position P being searched at depth N at a node further away from the root (with a larger position history) before that same position P is searched at depth N at a node closer to the root (with a smaller position history).

joergoster · 2015-11-15T19:35:58Z

Based on the linked talkchess thread (link given by Vince in the forum),
I just created a branch no_drawscore_to_tt, where I don't save a draw score into the transposition table. joergoster@bec4d09

So far, I was not able to get a draw score for the position at move 64.
Maybe this helps at least to lower the probabilty of happening too frequently.

cuddlestmonkey · 2015-11-15T20:31:46Z

@syzygy1 Shame. In regard of reductions and extensions, there was a position given by Uli that exhibited a similar strange "reset to zero" behaviour, and lowering the amount of reductions in that case removed the problem, so you may well be right.

joergoster · 2015-11-16T16:49:03Z

Just one example with my patch.

info depth 41 seldepth 122 multipv 1 score cp 3327 upperbound nodes 28564409809 nps 14653036 hashfull 999 tbhits 93687414 time 1949385 pv f5c2 b2b4
info depth 41 currmove f5c2 currmovenumber 1
info depth 41 seldepth 122 multipv 1 score cp 3737 lowerbound nodes 28612741641 nps 14650383 hashfull 999 tbhits 94030620 time 1953037 pv f5c2
info depth 41 currmove f5c2 currmovenumber 1
info depth 41 seldepth 122 multipv 1 score cp 4512 lowerbound nodes 30264961927 nps 14595990 hashfull 999 tbhits 104153884 time 2073512 pv f5c2
info depth 41 currmove f5c2 currmovenumber 1

It looks like as soon as the fail-low cycle begins, my patch breaks this and SF starts to fail-high again. I don't pretend my patch 'solves' anything, but it really seems to help.

The other thing to consider is the search instability. I think it would also help to open the aspiration window a bit faster, not allowing so many fail-lows in sequence.

lucasart · 2016-09-23T02:49:50Z

This kind of issue is not very useful. After almost one year, it still can't be be reproduced. If it can't be reproduced, it can't be understood or fixed.

mcostalba · 2016-09-23T06:49:21Z

@lucasart I agree. Closing.

Mostly skip passed pawn bonus for grid chess

nmrugg mentioned this issue Jan 14, 2016

Stockfish gives score of 0 when position has many moves (caused by two-fold repetition) #566

Closed

Atahan-Turkoglu mentioned this issue Jan 18, 2016

Depth margin parameter-tweak in TT#save #575

Closed

pb00068 referenced this issue in ajithcj/Stockfish Jun 2, 2016

Restrict to 5 PLY instead of 6 PLY

ca07ec1

ajithcj referenced this issue in ajithcj/Stockfish Jun 10, 2016

insert only best pv into tt

fe695db

mcostalba closed this as completed Sep 23, 2016

niklasf pushed a commit to niklasf/Stockfish that referenced this issue Mar 16, 2018

Merge pull request official-stockfish#501 from ianfab/grid_passed3

24c24fe

Mostly skip passed pawn bonus for grid chess

Kingdefender mentioned this issue Apr 12, 2018

Stockfish loses Winning score in Rook endgame. #1544

Closed

vondele mentioned this issue Dec 15, 2019

Major blunder (bug?) in CCC11 semis game 33 #2451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 #501

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 #501

zamar commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

zamar commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

syzygy1 commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

syzygy1 commented Nov 15, 2015

joergoster commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

joergoster commented Nov 16, 2015

lucasart commented Sep 23, 2016

mcostalba commented Sep 23, 2016

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 #501

TCEC Season 8, game 22: Eval dropped From +26.13 to 0.00 #501

Comments

zamar commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

zamar commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

syzygy1 commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

syzygy1 commented Nov 15, 2015

joergoster commented Nov 15, 2015

cuddlestmonkey commented Nov 15, 2015

joergoster commented Nov 16, 2015

lucasart commented Sep 23, 2016

mcostalba commented Sep 23, 2016