Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce anti-suicide feature #2666

Closed
wants to merge 1 commit into from

Conversation

snicolet
Copy link
Member

@snicolet snicolet commented May 8, 2020

In some recent tournament games, Stockfish exhibited the following
self-destructing behaviour. Stockfish was suffering in a long shuffle
session, having a bad evaluation in a blocked or semi-blocked position
for about 40 moves and yet the eval was sort of flatlined, indicating
that the opponent engine (Leela) had trouble converting the position.
Then, not long before the 50-moves draw rule would be reached,
the opponent would play its pieces to some strange places and SF would
push a pawn, thinking she would get a slightly "less worse" evaluation.
However, the slightly less worse evaluation would prove to be delusional,
the position with a sacrificed pawn crackable and SF eventually lost
these games.

This issue was discussed in the following thread:
#2620

This commit is our best attempt to patch this issue, so that SF gets
more patient in worse positions and tries to play for 50 moves as much
as possible and not suicide. The implementation uses pure evaluation
methods rather than search, damping down the eval after 25 moves of
shuffling (damping factor is linear, starting from 1.0 after 25 shuffling
moves and reaching 0.04 after 50 moves of shuffling). This damping
puts the burden on the attacking player to prove that he can break
the fortress, as now the search will get more and more optimistic
for the defending player to be able to reach a draw by 50 moves rule.

This solution seems to work as intended for the few cases extracted
from tournament losses, according to tests done by @vondele in the
following comments:
#2620 (comment)
snicolet/Stockfish@a66d3c0#commitcomment-38963042

In Fishtest, the best result we managed to get after extensive testing
was a double yellow with Elo-gaining bounds (this patch), maybe because
the problem is quite rare at the short time controls we use in our tests
compared to the longer time controls used in tournament games:

STC:
LLR: -2.97 (-2.94,2.94) {-0.50,1.50}
Total: 201928 W: 38274 L: 38174 D: 125480
Ptnml(0-2): 3452, 23520, 46844, 23772, 3376
https://tests.stockfishchess.org/tests/view/5eb281dd2326444a3b6d3499

LTC:
LLR: -2.94 (-2.94,2.94) {0.25,1.75}
Total: 90232 W: 11446 L: 11353 D: 67433
Ptnml(0-2): 631, 8421, 26967, 8418, 679
https://tests.stockfishchess.org/tests/view/5eb34a862326444a3b6d37ff

Bench: 4834675

In some recent tournament games, Stockfish exhibited the following
self-destructing behaviour. Stockfish was suffering in a long shuffle
session, having a bad evaluation in a blocked or semi-blocked position
for about 40 moves and yet the eval was sort of flatlined, indicating
that the opponent engine (Leela) had trouble converting the position.
Then, not long before the 50-moves draw rule would be reached reached,
the opponent would play its pieces to some strange places and SF would
push a pawn, thinking she would get a slightly "less worse" evaluation.
However, the slightly less worse evaluation would prove to be delusional,
the position with a sacrificed pawn crackable and SF eventually lost
these games.

This issue was discussed in the following thread:
official-stockfish/Stockfish#2620

This commit is our best attempt to patch this issue, so that SF gets
more patient in worse positions and tries to play for 50 moves as much
as possible and not suicide. The implementation uses pure evaluation
methods rather than search, damping down the eval after 25 moves of
shuffling (damping factor is linear, starting from 1.0 after 25 shuffling
moves and reaching 0.04 after 50 moves of shuffling). This damping
puts the burden on the attacking player to prove that he can break
the fortress, as now the search will get more and more optimistic
for the defending player to be able to reach a draw by 50 moves rule.

This solution seems to work as intended for the few cases extracted
from tournament losses, according to tests done by @vondele in the
following comments:
official-stockfish/Stockfish#2620 (comment)
a66d3c0#commitcomment-38963042

In Fishtest, the best result we managed to get after extensive testing
was a double yellow with Elo-gaining bounds (this patch), maybe because
the problem is quite rare at the short time controls we use in our tests
compared to the longer time controls used in tournament games:

STC:
LLR: -2.97 (-2.94,2.94) {-0.50,1.50}
Total: 201928 W: 38274 L: 38174 D: 125480
Ptnml(0-2): 3452, 23520, 46844, 23772, 3376
https://tests.stockfishchess.org/tests/view/5eb281dd2326444a3b6d3499

LTC:
LLR: -2.94 (-2.94,2.94) {0.25,1.75}
Total: 90232 W: 11446 L: 11353 D: 67433
Ptnml(0-2): 631, 8421, 26967, 8418, 679
https://tests.stockfishchess.org/tests/view/5eb34a862326444a3b6d37ff

Bench: 4834675
@vondele
Copy link
Member

vondele commented May 9, 2020

@snicolet I've done a more extensive analysis on all fens posted in the issue. For each fen, I've done a rather deep multiPV search (200s), as well as 200 short (1s) searches to get a distribution of bestmoves (on 250 threads, 80GB hash). Can you have a look at the result, to help judge if the patch brings improvement beyond the one fen:

======= 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143 =======

Deep MultiPV

master: 
info depth 70 seldepth 81 multipv 1 score cp -93 nodes 56024645224 nps 280121825 hashfull 798 tbhits 0 time 200001 pv e7d7 a1h1 d7e7 b3c4 b5c7 c4c3 c7a6 h1h7 e7e8 g5g6 f7g6 h7a7 a6
info depth 70 seldepth 88 multipv 2 score cp -481 upperbound nodes 56024645224 nps 280121825 hashfull 798 tbhits 0 time 200001 pv e7d8 b3b4
info depth 69 seldepth 83 multipv 3 score cp -315 nodes 56024645224 nps 280121825 hashfull 798 tbhits 0 time 200001 pv e7e8 b3b4 b5c7 b4c3 e8f8 c3d4 c7d5 a1c1 d5b4 c1c8 f8g7 c8a8 b
patch: 
info depth 67 seldepth 79 multipv 1 score cp -86 nodes 54046883492 nps 270231715 hashfull 784 tbhits 0 time 200002 pv e7d7 a1h1 d7e8 b3b4 b5c7 b4c3 c7a6 h1h8 e8e7 h8h7 e7e8 g5g6 f7
info depth 67 seldepth 82 multipv 2 score cp -374 nodes 54046883492 nps 270231715 hashfull 784 tbhits 0 time 200002 pv e7d8 b3c4 b5c7 c4c3 d8e7 c3c4 e7e8 a1h1 c7a6 h1h8 e8e7 h8a8 b
info depth 67 seldepth 34 multipv 3 score cp -519 nodes 54046883492 nps 270231715 hashfull 784 tbhits 0 time 200002 pv e7f8 b3b4

bestmove distribution

master: 
      6 bestmove b5c7 
      6 bestmove e7e8 
     12 bestmove e7d8 
    176 bestmove e7d7 
patch: 
      3 bestmove e7d8 
    197 bestmove e7d7 
=================================================================
======= 4B3/2k5/2nb4/1pr2p2/p4P2/P7/1P1Q4/1K6 b - - 45 98 =======

Deep MultiPV

master: 
info depth 51 seldepth 82 multipv 1 score cp -317 nodes 41177013961 nps 205883010 hashfull 884 tbhits 0 time 200002 pv c6a7 d2e3 a7c6 e3e6 c6b8 e6f7 c7b6 f7g6 b6c7 g6e6 d6f4 e8b5 c
info depth 51 seldepth 87 multipv 2 score cp -317 nodes 41177013961 nps 205883010 hashfull 884 tbhits 0 time 200002 pv c5c4 d2d5 c4c5 d5e6 c6b8 e6f7 c7b6 f7g6 b6c7 g6e6 d6f4 e8b5 c
info depth 51 seldepth 111 multipv 3 score cp -689 nodes 41177013961 nps 205883010 hashfull 884 tbhits 0 time 200002 pv b5b4 a3b4
patch: 
info depth 54 seldepth 76 multipv 1 score cp -595 nodes 43932798647 nps 219661796 hashfull 880 tbhits 0 time 200002 pv c5c4 d2d5 c4c5 d5e6 d6f4 e8c6 c5c6 e6f5 f4c1 f5b5 c1b2 b5a4 b
info depth 54 seldepth 94 multipv 2 score cp -595 nodes 43932798647 nps 219661796 hashfull 880 tbhits 0 time 200002 pv c6a7 d2e3 a7c6 e3e6 d6f4 e8c6 c5c6 e6f5 f4c1 f5b5 c1b2 b5a4 b
info depth 54 seldepth 26 multipv 3 score cp -677 nodes 43932798647 nps 219661796 hashfull 880 tbhits 0 time 200002 pv b5b4 a3b4

bestmove distribution

master: 
     12 bestmove c6a7 
    188 bestmove b5b4 
patch: 
      1 bestmove c5c4 
     11 bestmove c6a7 
    188 bestmove b5b4 
=================================================================
======= 3nq3/r2n2p1/1k2p1N1/1p1pP1NP/p1pP2Q1/P1P5/1P3RK1/8 b - - 64 87 =======

Deep MultiPV

master: 
info depth 37 seldepth 62 multipv 1 score cp -588 nodes 37489403706 nps 187446081 hashfull 927 tbhits 0 time 200001 pv b5b4 c3b4
info depth 36 seldepth 72 multipv 2 score cp -584 nodes 37489403706 nps 187446081 hashfull 927 tbhits 0 time 200001 pv b6a5 g4f3 e8g8 g5f7 a7c7 g2g3 a5a6 f3f4 d7f8 f7d6 f8h7 f4g4 c
info depth 36 seldepth 63 multipv 3 score cp -615 nodes 37489403706 nps 187446081 hashfull 927 tbhits 0 time 200001 pv a7b7 g4f4 b6a7 g2g3 e8g8 f4f3 a7a6 g5f7 b7c7 f3f4 d8b7 f4g4 g
patch: 
info depth 37 seldepth 75 multipv 1 score cp -519 nodes 36057894679 nps 180288571 hashfull 907 tbhits 0 time 200001 pv a7c7 g6f4 d7f8 g4f3 b6a6 f4d5 e6d5 f3f8 e8h5 f8d8 h5g4 g2h1 c
info depth 37 seldepth 70 multipv 2 score cp -580 nodes 36057894679 nps 180288571 hashfull 907 tbhits 0 time 200001 pv b6a5 g4f4 a7b7 g2g3 e8g8 g5f7 d7f6 e5f6 d8f7 f6g7 g8g7 f2e2 b
info depth 36 seldepth 66 multipv 3 score cp -571 nodes 36057894679 nps 180288571 hashfull 907 tbhits 0 time 200001 pv b5b4 c3b4 a7c7 g6f4 d7f8 f4e2 d8f7 f2f7 c7f7 g5f7 e8f7 e2c3 f

bestmove distribution

master: 
     12 bestmove a7c7 
     27 bestmove a7b7 
     52 bestmove b6a5 
    109 bestmove b6a6 
patch: 
      2 bestmove a7c7 
     21 bestmove a7b7 
     59 bestmove b6a5 
    118 bestmove b6a6 
=================================================================
======= 5r2/1k2b2p/2q1p3/Pp1bPrpB/2pP4/6QP/2RB1PP1/5RK1 w - - 2 33 =======

Deep MultiPV

master: 
info depth 54 seldepth 5 multipv 1 score cp 0 nodes 37124777499 nps 185622031 hashfull 992 tbhits 0 time 200002 pv h5g4 f5f7 g4h5 f7f5
info depth 54 seldepth 41 multipv 2 score cp 0 nodes 37124777499 nps 185622031 hashfull 992 tbhits 0 time 200002 pv c2b2 b7a8 h5e2 h7h6 e2g4 f5f4 d2f4 g5f4 g3c3 f8b8 f1b1 b5b4 b2b4
info depth 54 seldepth 81 multipv 3 score cp -54 upperbound nodes 37124777499 nps 185622031 hashfull 992 tbhits 0 time 200002 pv h5e2 f8b8
patch: 
info depth 52 seldepth 6 multipv 1 score cp 0 nodes 35934308481 nps 179669745 hashfull 985 tbhits 0 time 200002 pv h5g4 f5f7 g4h5 f7f5
info depth 52 seldepth 71 multipv 2 score cp 0 nodes 35934308481 nps 179669745 hashfull 985 tbhits 0 time 200002 pv h5e2 b7a8 c2b2 h7h6 e2g4 f5f4 d2f4 g5f4 g3c3 h6h5 g4h5 d5g2 f1b1
info depth 51 seldepth 51 multipv 3 score cp 0 nodes 35934308481 nps 179669745 hashfull 985 tbhits 0 time 200002 pv c2b2 b7a8 g1h2 h7h6 h5g4 f5f7 g4e2 c6b7 e2h5 f7f4 f1b1 f4f2 b2b5

bestmove distribution

master: 
      4 bestmove h5e2 
      7 bestmove c2b2 
    189 bestmove h5g4 
patch: 
      1 bestmove h5e2 
      3 bestmove c2b2 
    196 bestmove h5g4 
=================================================================
======= r1rb2k1/1bpnqn2/pp1p4/3Pp1p1/PP2PpPp/NQNB1P1P/5B2/R1R3K1 b - - 10 30 =======

Deep MultiPV

master: 
info depth 45 seldepth 48 multipv 1 score cp 0 nodes 36151336345 nps 180755777 hashfull 951 tbhits 0 time 200001 pv c8b8 b3d1 d7f6 a4a5 c7c5 d5c6 b7c6 a3c4 b6b5 c4b6 d8b6 a5b6 f6d7
info depth 45 seldepth 59 multipv 2 score cp -38 nodes 36151336345 nps 180755777 hashfull 951 tbhits 0 time 200001 pv d7f6 a4a5 c8b8 a5b6 c7b6 b4b5 f6d7 g1g2 a6a5 c3a4 g8h8 b3b2 d7
info depth 45 seldepth 50 multipv 3 score cp -47 upperbound nodes 36151336345 nps 180755777 hashfull 951 tbhits 0 time 200001 pv g8g7 g1g2
patch: 
info depth 39 seldepth 52 multipv 1 score cp -37 nodes 36574678525 nps 182870649 hashfull 963 tbhits 0 time 200003 pv c8b8 b3d1 b7c8 a4a5 d7f6 b4b5 f6d7 b5a6 c8a6 d3b5 a6b5 a3b5 d7
info depth 39 seldepth 61 multipv 2 score cp -54 nodes 36574678525 nps 182870649 hashfull 963 tbhits 0 time 200003 pv g8g7 g1g2 c8b8 b3d1 e7e8 d1e2 b7c8 a1b1 d7f6 a4a5 b6a5 b4a5 b8
info depth 39 seldepth 58 multipv 3 score cp -71 upperbound nodes 36574678525 nps 182870649 hashfull 963 tbhits 0 time 200003 pv d7f6 a4a5

bestmove distribution

master: 
      4 bestmove e7e8 
     18 bestmove d7f6 
     23 bestmove f7h8 
     56 bestmove g8g7 
     99 bestmove c8b8 
patch: 
      2 bestmove e7e8 
     14 bestmove d7f6 
     16 bestmove f7h8 
     63 bestmove g8g7 
    105 bestmove c8b8 
=================================================================
======= 5k2/1R6/4p1n1/4PpP1/3K4/8/8/8 b - - 10 163 =======

Deep MultiPV

master: 
info depth 98 seldepth 75 multipv 1 score mate -37 nodes 76466291027 nps 382329543 hashfull 155 tbhits 0 time 200001 pv f8g8 b7c7 g8f8 d4c5 g6e5 c5d6 e5f7 d6e6 f7g5 e6f6 g5e4 f6f5 
info depth 97 seldepth 63 multipv 2 score mate -31 nodes 76466291027 nps 382329543 hashfull 155 tbhits 0 time 200001 pv g6f4 d4c5 f4h3 g5g6 h3g5 g6g7 f8g8 c5d6 g8h7 b7b8 h7g7 d6e7 
info depth 97 seldepth 59 multipv 3 score mate -29 nodes 76466291027 nps 382329543 hashfull 155 tbhits 0 time 200001 pv g6h4 d4c5 h4f3 b7b8 f8g7 c5d6 f3g5 d6e7 f5f4 b8f8 g5f3 f8f7 
patch: 
info depth 93 seldepth 75 multipv 1 score mate -37 nodes 79586690291 nps 397931461 hashfull 176 tbhits 0 time 200001 pv f8g8 b7c7 g8f8 d4c5 g6e5 c5d6 e5f7 d6e6 f7g5 e6f6 g5e4 f6f5 
info depth 93 seldepth 63 multipv 2 score mate -31 nodes 79586690291 nps 397931461 hashfull 176 tbhits 0 time 200001 pv g6f4 d4c5 f4h3 g5g6 h3g5 g6g7 f8g8 c5d6 g8h7 d6e7 h7g7 b7b8 
info depth 92 seldepth 59 multipv 3 score mate -29 nodes 79586690291 nps 397931461 hashfull 176 tbhits 0 time 200001 pv g6h4 b7b8 f8g7 d4c5 h4f3 c5d6 f3g5 d6e7 f5f4 b8f8 g5f3 f8f7 

bestmove distribution

master: 
     30 bestmove g6h4 
    170 bestmove f8g8 
patch: 
     29 bestmove g6h4 
    171 bestmove f8g8 
=================================================================

@NKONSTANTAKIS
Copy link

I have always considered that the ability of solving blind spots has a special value, which is hard to measure as elo gain. By locating, targeting and removing them one by one, the completeness of the engine is not only making chess analysts and CC players happy, but is bound to scale well.

I leave it to others to judge if the degree of mitigation of the problematic subset this pull offers is worth it. As a principle I would suggest that any solution of a problematic subset (that also does not introduce another one) be treated as a bug-fix. A few lines of code and cpu cycles is a small price to pay for an eventually blunderfree & blindfree SF. The tricky part is measuring and assessing the amount of help they offer.

@vondele
Copy link
Member

vondele commented May 13, 2020

@snicolet I'm reluctant to commit the patch in this form, as it adds code, failed Elo gainer tests, and shows clear benefit only on this one specific fen (AFAIK).

However, a variant of this patch (just the initiative term) actually is a simplification with respect to master, and is very simple overal. vondele/Stockfish@66ed8b6...537d51d

It tested nicely:
passed STC
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 50168 W: 9508 L: 9392 D: 31268
Ptnml(0-2): 818, 5873, 11616, 5929, 848
https://tests.stockfishchess.org/tests/view/5ebb07287dd5693aad4e680b

passed LTC
LLR: 2.93 (-2.94,2.94) {-1.50,0.50}
Total: 7520 W: 981 L: 870 D: 5669
Ptnml(0-2): 49, 647, 2256, 760, 48
https://tests.stockfishchess.org/tests/view/5ebbff747dd5693aad4e6858

and it performs essentially equally well on the test FEN 8/p3kp2/Pp2p3/1n2PpP1/5P2/1Kp5/8/R7 b - - 68 143

master: 
      6 bestmove b5c7 
      6 bestmove e7e8 
     12 bestmove e7d8 
    176 bestmove e7d7 
patch:
      3 bestmove b5c7 
      5 bestmove e7d8 
    192 bestmove e7d7 

I propose we merge that one instead. Agree?

@vondele vondele added to be merged Will be merged shortly and removed discussion needed labels May 14, 2020
@Vizvezdenec
Copy link
Contributor

I honestly dislike scale factor as a concept, it basically says "we are failing to evaluate this endgames properly, let's multiply their eval by something ".
The more scale factor goes into initiative the better it is, imho :)

@vondele vondele closed this in cca6436 May 14, 2020
@vondele
Copy link
Member

vondele commented May 14, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants