Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misevaluated endgame patterns #2288

Closed
1 of 8 tasks
Alayan-stk-2 opened this issue Aug 31, 2019 · 46 comments
Closed
1 of 8 tasks

Misevaluated endgame patterns #2288

Alayan-stk-2 opened this issue Aug 31, 2019 · 46 comments

Comments

@Alayan-stk-2
Copy link

Alayan-stk-2 commented Aug 31, 2019

Endgame positions are used from far away during search and can have a big impact on finding the good moves in the middlegame.

So just like having TBs give a small boost strength, additional endgame knowledge should result in strength increase.

The way we currently have only 0.00 as a value for draws, instead of a range inside which the eval will sit for theoretical draws to still allow to prefer the "good side" of the forced draw (and playing more challenging moves even if it's a known forced draw) ; is a limitation, but it doesn't make additional draw/win pattern detection useless.

I think it is much more practical to maintain a comprehensive list here than over at fishcooking.

  • 1st example - KRPKBPP

The bishop protected by a pawn and blocking an enemy pawn

For example, the following position is dead drawn but has static eval around -1.8, and searching to depth 50 or 60 with 6-men TB doesn't fix the blindness as it just shuffles (Leela correctly sees it as dead drawn) :
8/8/5kBp/4rP1P/8/3K4/8/8 w - -
This 6-men position is even worse when it comes to static blindness (the resulting KPK endgame is drawn) :
8/8/5kBp/2r4P/8/3K4/8/8 w - - 0 1

However, there are positions where the resulting KPK endgame is won for the side with the rook e.g. 8/8/8/6k1/6Bp/4K2P/7r/8 w - - 0 1 ; so this needs some care to detect properly.

  • 2nd example KBPPPKNPP

Update : Viz's initiative patch make things better here.

All pawns on the same side facing each other. The 2v1 and 3v2 pawns setup on the same side are usually very drawish.

In this dead drawn position : 6k1/8/6p1/2n2p2/7P/2B2PP1/5K2/8 b - - 1 69

Redfish, running on monster hardware and with 6-men TB, still evaluates the position over +1

  • 3rd example KRPPKRP

There are many problematic patterns in these endgames.

Here is one that is significantly overevaluated, with one side having a "passed pawn" that can in practice never be pushed : 8/2k5/4R3/Pp6/1P6/6r1/1K6/8 b - - 0 1

Latest fish (without TBs) gives me +1.17 for white there at depth 65.

This pattern of a passed pawn supported by a blocked pawn and which can never be pushed occurs in a number of other misevaluated endgames.

  • 4th example R vs BNN imbalance with no pawn for the minor side

KNN is drawn and KNNKP is often drawn too, so this makes trading off the rook for the bishop a very potent threat for the weak side. KBNNKR is drawn in 80% of the positions in syzygy tables. Sadly, I can't check all the cases that are winning, but the king being in the corner and vulnerable, or the rook being quickly capturable, seem to be the most relevant case.

Example drawn position : k7/2BK4/3N4/1N6/8/8/7r/8 w - -

TB Draw, SF depth 63/104 says +3.24 still.

  • 5th example KNNKP

Fixed with #2553

There is no check whatsoever on the pawn/king combined positions, which means that without TB, SF frequently has extremely inflated evals.

Example position : 8/3k4/8/3N4/8/6K1/1p6/1N6 b - - 4 6

+7 for the side with the knights (!) with seldepth 100 (it stops at the 50mr barrier)

  • 6th example KBPPKRPP

All pawns on the same side touching the edge, the edge pawn being ahead for the bishop side, and both pawns on the same color as the bishop. Draw exploiting a fortress.

Position : 8/6pk/4Kb1p/7P/6P1/2R5/8/8 w - - 0 1

SF evaluates this as +2.69 at depth 100 with 5-men TB on.

Note that a similar position having 3 pawns for each side is completely winning, so a fix for the previous fortress should not damage the evaluation of that sort of position in the process.

Position : 5k2/5p2/3Kb1p1/7p/7P/2R3P1/5P2/8 w - -

SF's static eval evaluates this as winning for white, and a deep search confirms this is right.

  • 7th example KRPPKR

Often winning, but can sometimes be drawn with pawns on both flanks.

Position : 8/6k1/p3R3/P7/7P/5r2/2K5/8 w - -

Similar positions arise with 7 or 8 pieces on the board

  • 8th example KRNKRNP

A case of "being up a passed pawn is not enough". The weak side needs accurate play to hold.

Position : 8/8/5NK1/2r2nP1/1k6/8/4R3/8 w - -

- [ ] 9th example SCB endgames with no passer and weak pawns

8/4k3/4p3/3pPp2/1b1P1P2/4K1B1/8/8 w - -

Black is winning because there are two weak blocked pawns on the same color as the bishops. If there was only one, it would be a draw. SF's static eval absolutely suck at telling them apart, and while it can find the correct continuations once it's on the board, it won't be able to guide search from far away towards such a position.

@gonzalezjo
Copy link

gonzalezjo commented Sep 3, 2019

According to syzygy-tables.info, only 28.1% of KBPPvKRP positions are draws. What kind of rule(s?) would you use to tell if such a position is drawing?

@Alayan-stk-2
Copy link
Author

Looking at the overall win/draw rates for a material setup is not very informative, unless this setup is close to 100% won or drawn ; especially as syzygy bases contain a lot of weird positions which are extremely unlikely to happen in a game.

The result of that kind of endgame depends on how advanced the pawns are, the files they are in, how pieces can defend each other...

But while we can't catch all cases, there are some simple heuristics which can improve things significantly.

@ddugovic
Copy link

ddugovic commented Sep 4, 2019

While 8/8/5kBp/4rP1P/8/3K4/8/8 w - - 0 1 takes a long time to resolve to evaluation 0.00, 8/8/5kBp/4rP1P/8/3K4/8/8 w - - 80 1 resolves much faster (because there are only 20 half-plies until the 50-move rule takes effect).

Perhaps some tool could identify endgame positions (maybe even in a tablebase) where pawn breakthroughs are impossible and Stockfish yields different evaluations depending upon the half-ply counter.

Separately... perhaps in an endgame if pawn breakthroughs are impossible and the capture history indicates that captures are bad, something could be done to the endgame scaling factor (EDIT: or #2298 is an even better idea, and it even gains Elo).

@Vizvezdenec
Copy link
Contributor

well at least part of this patterns are partially hit with latest patch of mine.
Not every single of them and not really hard, but it's smth :)

@miguel-l
Copy link
Contributor

I'm not really sure how to test these kinds of patches, but it seems even in relatively simple rook endgames, there can be some improvements found: http://tests.stockfishchess.org/tests/view/5d8333840ebc5971531d3abb

But alas, LTC failed:
http://tests.stockfishchess.org/tests/view/5d8348ad0ebc5971531d3bb6

@Alayan-stk-2
Copy link
Author

I hoped the LTC would pass, but yeah it didn't... I made a more simple change for NNB vs R(Ps) endgames, which didn't make enough of a difference to pass at fishtest.

These endgame knowledge improvements are hard to pass through fishtest, as you'd need a big chunk together to make enough difference. I also suspect that the effect of those in middlegame move selection would be more important at longer TC (ultra-bullet doesn't get enough depth for this), but this is too impractical to test.

How much adding these additional conditionals hurts nps for those positions isn't all that clear to me either.

So, how to proceed... ? I don't know yet, but I'll keep adding patterns to the list as I see them.

@kelvinwop
Copy link

8/5p2/8/2b2p2/2N2k2/5p2/8/5K2 b - - 5 51
here's another one

@kelvinwop
Copy link

the drawn position arose from 8/5p2/8/5p2/Nb2pk2/5P2/5K2/8 b - - 0 48, and the winning move was only found when infinite analysis was turned on... otherwise it goes for what it thinks is a +4.1 but is actually just a drawn endgame

@MichaelB7
Copy link
Contributor

MichaelB7 commented Sep 21, 2019

@kelvinwop, I do not see what you see, that with even just one thread, Ba3 wins and is found in 3 seconds on my machine, one core with 64M hash. with EGTB it's found even faster

also, you state the "...winning move was only found ..." and later say "...is actually just a drawn endgame .." that sounds contradictory to me.. and the position has 10 moves that win easily , did you post an incorrect FEN perhaps?

 46	+24.34!	214.4M	1:20.51	Ba3! 
 46	+23.84!	212.7M	1:19.90	Ba3! 
 46	+23.46!	210.9M	1:19.19	Ba3! 
 46	+23.16!	209.5M	1:18.68	Ba3! 
 46	+22.95!	207.8M	1:18.04	Ba3! 
 46	+22.79!	205.7M	1:17.25	Ba3! 
 46	+22.69!	202.9M	1:16.15	Ba3! 
 45	+22.58 	202.0M	1:15.80	Ba3 Ke2 Kg3 Nc3 Bb4 Nd5 Bc5 Nf6 Kg2 f4 Kg3 Nd5 Kg4 Kd2 Kf3 Kd1 Bd6 Nc3 Kxf4 Kc1 Kf3 Kc2 e3 Kd3 Be5 Ne2 f4 Nc1 Kg2 Nb3 
 45	+46.53!	202.0M	1:15.80	Ba3! 
 45	+41.15!	201.9M	1:15.77	Ba3! 
 45	+36.87!	201.7M	1:15.68	Ba3! 
 45	+33.46!	201.4M	1:15.57	Ba3! 
 45	+30.75!	200.9M	1:15.41	Ba3! 
 45	+28.60!	199.8M	1:15.00	Ba3! 
 45	+26.90!	196.1M	1:13.65	Ba3! 
 45	+25.55!	187.1M	1:10.22	Ba3! 
 45	+24.49!	178.0M	1:06.77	Ba3! 
 45	+23.66!	175.3M	1:05.84	Ba3! 
 45	+23.01!	174.3M	1:05.48	Ba3! 
 45	+22.51!	172.2M	1:04.70	Ba3! 
 45	+22.12!	169.6M	1:03.76	Ba3! 
 45	+21.83!	165.7M	1:02.38	Ba3! 
 45	+21.61!	160.0M	1:00.26	Ba3! 
 45	+21.46!	151.9M	0:57.13	Ba3! 
 45	+21.35!	143.0M	0:53.70	Ba3! 
 44	+21.24 	135.6M	0:50.89	Ba3 Ke2 Kg3 Nc3 Bb4 Nd5 Bc5 Nf6 Kg2 f4 Kg3 Nd5 Kg4 Kd2 Kf3 Kd1 Bd6 Nc3 Kxf4 Kc1 Kf3 Kc2 Bc5 Na2 e3 Nc1 e2 Nd3 Be3 Ne1+ Ke4 Kb2 f4 Nc2 Bd4+ Ka3 f3 Ka2 f2 Kb3 f1=Q 
 44	+24.28!	134.6M	0:50.51	Ba3! 
 44	+22.57!	131.1M	0:49.23	Ba3! 
 44	+21.23!	122.4M	0:45.99	Ba3! 
 44	+20.16!	113.1M	0:42.48	Ba3! 
 44	+19.33!	108.2M	0:40.72	Ba3! 
 44	+18.69!	106.2M	0:39.96	Ba3! 
 44	+18.18!	105.4M	0:39.65	Ba3! 
 44	+17.80!	104.4M	0:39.27	Ba3! 
 44	+17.51!	102.9M	0:38.72	Ba3! 
 44	+17.29!	99.6M  	0:37.42	Ba3! 
 44	+17.14!	93.7M  	0:35.15	Ba3! 
 44	+17.03!	86.2M  	0:32.31	Ba3! 
 43	+16.92 	81.4M  	0:30.48	Ba3 Ke2 Kg3 Nc3 Bb4 Nd5 Bc5 Nc7 Kg2 f4 Kg3 Na6 Bd4 Nb4 Kxf4 Nc6 Bc5 Nd8 Ke5 Kf1 f4 Kg2 f3+ Kg3 f5 Kh3 e3 Nc6+ Kf4 Nd8 f2 Kh4 Ke5 Nc6+ Ke4 Nd8 e2 Nc6 f4 
 43	+18.19!	80.5M  	0:30.12	Ba3! 
 43	+16.49!	76.3M  	0:28.56	Ba3! 
 43	+15.14!	72.1M  	0:26.89	Ba3! 
 43	+14.08!	66.4M  	0:24.64	Ba3! 
 43	+13.25!	62.2M  	0:23.05	Ba3! 
 43	+12.60!	59.5M  	0:22.06	Ba3! 
 43	+12.10!	57.3M  	0:21.29	Ba3! 
 43	+11.71!	55.7M  	0:20.73	Ba3! 
 43	+11.42!	53.9M  	0:20.09	Ba3! 
 43	+11.21!	52.0M  	0:19.36	Ba3! 
 43	+11.05!	49.1M  	0:18.29	Ba3! 
 43	+10.94!	46.0M  	0:17.12	Ba3! 
 42	+10.84 	44.3M  	0:16.47	Ba3 Ke2 Kg3 Nc3 Bb4 Nd5 Bc5 Nc7 Kg2 f4 Kg3 Nd5 Kg4 Kd2 Kf3 Kd1 Bd6 Nc3 Kxf4 Kc2 Kf3 Nb5 Bc5 Kc3 e3 Kc4 e2 Kxc5 e1=Q Nc7 Qe5+ Kc6 Qd4 Kb7 Kf4 Na6 Qc4 Ka7 Qe6 
 42	+12.52!	42.1M  	0:15.60	Ba3! 
 42	+11.46!	39.1M  	0:14.50	Ba3! 
 42	+10.63!	35.8M  	0:13.35	Ba3! 
 42	+9.98!	32.8M  	0:12.31	Ba3! 
 42	+9.48!	29.6M  	0:11.16	Ba3! 
 42	+9.09!	27.0M  	0:10.27	Ba3! 
 42	+8.80!	25.5M  	0:09.74	Ba3! 
 42	+8.59!	24.0M  	0:09.17	Ba3! 
 42	+8.43!	22.1M  	0:08.49	Ba3! 
 42	+8.32!	20.4M  	0:07.82	Ba3! 
 41	+8.22 	19.3M  	0:07.38	Ba3 Ke2 Kg3 Nc3 Bb4 Nd5 Bc5 fxe4 fxe4 Nf6 Kf4 Nd7 Bd4 Nb8 f5 Nc6 Bc5 Na5 Ke5 Nb3 Bb4 Nc1 f4 Kf1 f3 Kf2 Kf4 Kf1 Kg4 Nb3 Ba3 Kf2 Kf4 Kf1 e3 
 41	+8.18!	17.4M  	0:06.70	Ba3! 
 41	+7.35!	15.1M  	0:05.83	Ba3! 
 41	+6.70!	13.3M  	0:05.13	Ba3! 
 41	+6.20!	12.4M  	0:04.81	Ba3! 
 41	+5.82!	11.7M  	0:04.56	Ba3! 
 41	+5.53!	11.4M  	0:04.44	Ba3! 
 41	+5.31!	10.9M  	0:04.28	Ba3! 
 41	+5.15!	10.7M  	0:04.20	Ba3! 
 41	+5.05!	10.4M  	0:04.07	Ba3! 
 40	+4.94 	10.0M  	0:03.94	Ba3 Nc3 Bc5+ Ke2 exf3+ Kf1 Ke5 Nb5 Kd5 Nc3+ Kd4 Nd1 Ba7 Ke1 Kc4 Nb2+ Kc3 Nd1+ Kb3 Kd2 Kc4 Nb2+ Kd5 Nd3 Bd4 Ke1 Kc4 Nf4 Bc5 Nh5 Kd4 Nf6 Ke5 Nd7+ Kd6 Nxc5 Kxc5 Kf2 Kd5 Kxf3 Ke5 Ke2 f4 Kf2 
 40	+5.35!	8.98M  	0:03.54	Ba3! 
 40	+4.97!	8.63M  	0:03.41	Ba3! 
 40	+4.68!	8.45M  	0:03.34	Ba3! 
 40	+4.46!	8.31M  	0:03.29	Ba3! 
 40	+4.30!	8.22M  	0:03.26	Ba3! 
 40	+4.20!	8.13M  	0:03.22	Ba3! 
 39	+4.09 	6.93M  	0:02.77	Ba3 Ke2 Kg3 Nc3 exf3+ Kf1 Bc5 Nd1 Bd4 Nf2 Kf4 Nd3+ Ke4 Nb4 Bc3 Nc6 Bf6 Na5 Be5 Nc4 Bg3 Nb6 Kd3 Nd7 Ke3 Nf6 Be5 Nd5+ Kd4 Ne7 Ke4 Nc8 Bf4 Nb6 Bc1 Nc4 Kd5 Nb6+ Kd4 Kf2 Ke4 Kf1 f4 Nc4 Kd5 Na5 Be3 Ke1 f6 Kf1 Kd4 Nc6+ Ke4 
 38	+4.09 	6.48M  	0:02.59	Ba3 Ke2 Kg3 Nc3 exf3+ Kf1 Bc5 Nd1 Bd4 Nf2 Kf4 Nd3+ Ke4 Nb4 Bc3 Nc6 Bf6 Na5 Be5 Nc4 Bg3 Nb6 Kd3 Nd7 Ke3 Nf6 Be5 Nd5+ Ke4 Nb6 Kd3 Nc8 Bg3 Nb6 Bd6 Na4 Ke3 Nb2 Bb4 Na4 Kf4 Nb2 Ke4 Na4 Bd2 Nb2 Bf4 Nc4 Kd4 Na5 Be3 Ke1 Kd5 Nb3 Ke4 Na 
 37	+4.09 	6.12M  	0:02.45	Ba3 Ke2 Kg3 Nc3 exf3+ Kf1 Bc5 Nd1 Bd4 Nf2 Kf4 Nd3+ Ke4 Nb4 Bc3 Nc6 Bf6 Na5 Be5 Nc4 Bg3 Nb6 Kd3 Nd7 Ke3 Nf6 Be5 Nd5+ Ke4 Nb6 Kd3 Nc8 Ke3 Ne7 Ke4 Nc8 Bc7 Kf2 Kf4 Kf1 f6 Ne7 Ke4 Nc8 
 36	+4.30 	2.65M  	0:01.06	Ba3 Ke2 exf3+ Kf1 Bc1 Nb6 Ke4 Nc4 Kd5 Nb6+ Kd4 Kf2 Ke4 Kf1 Bf4 Nc8 Bc7 Kf2 Be5 Kf1 Bf4 Nb6 Kd3 Nd7 Bg3 Nb6 Be5 Nc8 f4 Kf2 Ke4 Kf1 f5 Nb6 f2 
 35	+3.71 	1.69M  	0:00.69	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 Bb4 Kf2 f6 Kf1 Kd3 Nb2+ Ke3 Nc4+ Ke4 Nb6 Bc5 Nc4 Kd3 Nd2 f2 Nb3 Be3 Na5 Kd4 Nc6+ Ke4 Nd8 Kd5 Nf7 Bc5 Nh8 f4 Nf7 Ke6 Nh8 Bb6 Ng6 Kf5 Ne7+ Ke5 Ng6+ Ke4 Ne7 Bd4 Ng8 f3 
 34	+3.79 	1.24M  	0:00.51	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 Bb4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Kd3 Nb5 Ke4 Nd6+ Kd5 Nb5 Be3 Nc3+ Ke5 Nb5 Bb6 Na3 Kd5 Nb5 Kc4 Nd6+ Kd3 Ne8 Bd4 Nc7 Ke3 Nb5 f2 Nd6 Kd3 
 33	+3.79 	1.12M  	0:00.47	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 Bb4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Kd3 Nb5 Ke4 Nd6+ Kd5 Nb5 Be3 Nc3+ Ke5 Nb5 Bc5 Nc7 f5 Nb5 Ke4 Nc7 f2 Ne6 Bb6 Ng5+ Ke3 
 32	+3.83 	946040	0:00.39	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 Bb4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Kd3 Nb5 Ke4 Nd6+ Kd5 Nb5 Be3 Na3 Bc5 Nb5 f5 Nc7+ Ke5 Nb5 Ke4 Nc7 f2 Ne6 Bb6 Ng5+ Ke3 Nf7 Bc5 
 31	+3.81 	860127	0:00.36	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 Bb4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Kd3 Nb5 Ke4 Nd6+ Kd5 Nb5 Be3 Na3 Bc5 Nb5 f5 Nc7+ Ke4 Ne6 Be7 Nc7 Bd6 Nb5 Bc5 Nc7 f2 Ne8 Ke5 
 30	+3.83 	674645	0:00.28	exf3 Kf1 Be7 Nb6 Bc5 Nc4 Bb4 Kg1 Ke4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Be5 Nf7 Ke4 Kf2 Bd4+ Kf1 Kd5 Nh6 Ke5 Nf7+ Ke6 Nh6 Kd5 Ke1 Ke4 Kf1 Kd3 Nf7 Be3 Nh6 Kd4 Ng4 f5 Nh6 Ke4 
 29	+3.83 	615375	0:00.26	exf3 Kf1 Be7 Nb6 Bc5 Nc4 Bb4 Kg1 Ke4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Be5 Nf7 Ke4 Kf2 Bd4+ Kf1 Bc5 Nh6 Be3 Nf7 Kd3 Nh6 Kd4 Ng4 f5 Nh6 Ke4 
 28	+3.83 	543843	0:00.23	exf3 Kf1 Be7 Nb6 Bc5 Nc4 Bb4 Kg1 Ke4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Nc7+ Ke4 Nb5 Ke3 Nd6 Be5 Nf7 Ke4 Kf2 Bd4+ Kf1 Bc5 Nh6 Be3 Nf7 Kd3 Nh6 Kd4 Ng4 f5 Nh6 Ke4 
 27	+3.79 	479116	0:00.21	exf3 Kf1 Be7 Nb6 Bc5 Nc4 Bb4 Kg1 Ke4 Kf2 f6 Kf1 f4 Kf2 Bc5+ Kf1 Be3 Nd6+ Kd5 Nb5 Bd4 Na3 Bc5 Nb5 Ke5 Nc7 Ke4 Nb5 Bd4 
 26	+4.01 	394949	0:00.17	exf3 Kf1 Bd2 Nb6 Ke4 Nc4 Bb4 Kf2 Bc5+ Kf1 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 Kd4 Nb7 Kd5 Nd8 Bb6 Nf7 Ke6 Nh8 f4 Ng6 Be3 Nxf4+ Bxf4 Kxf2 Kf5 Kf3 Be5 
 25	+3.87 	381930	0:00.17	exf3 Kf1 Bd2 Nb6 Ke4 Nc4 Bb4 Kf2 Bc5+ Kf1 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 Kd4 Nb7 Kd5 Nd8 Bb6 Nf7 Ke6 Nh8 f4 Ng6 Be3 Nxf4+ Bxf4 Kxf2 Kf5 Kf3 Be5 Kf2 Bd4+ Kf3 
 24	+3.99 	332388	0:00.15	exf3 Kf1 Bd2 Nb6 Ke4 Nc4 Bb4 Kf2 Bc5+ Kf1 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 Kd4 Nb7 Kd5 Nd8 Bb6 Nf7 Ke6 Nh8 f4 Ke2 f3+ Kf1 Kd5 Nf7 
 23	+3.99 	306971	0:00.14	exf3 Kf1 Bd2 Nb6 Ke4 Nc4 Bb4 Kf2 Bc5+ Kf1 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 Kd4 Nb7 Kd5 Nd8 Bb6 Nf7 Ke6 Nh8 f4 Ke2 
 22	+4.05 	238179	0:00.11	exf3 Kf1 Ke4 Kf2 f4 Kf1 Bd2 Nc5+ Kd5 Nd7 Be3 Nf6+ Ke5 Ng4+ Ke4 Nh6 f6 Nf7 Kd5 Nh6 Bd4 Ke1 Ke4 Kf1 f5 Nf7 Bc5 Ng5+ Ke3 
 21	+4.21 	225005	0:00.10	exf3 Kf1 Ke4 Nb2 Bc5 Nc4 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 f4 Nb7 Bd4 Nd6 Bc5 Ne8 f5 Ng7 Ke4 Ne6 Bb6 
 20	+3.88 	210798	0:00.10	exf3 Kf1 Ke4 Kf2 f4 Kf1 Bd2 Nc5+ Kd5 Na4 Be3 Nc3+ Ke5 Nb5 Bc5 Nc3 Bd4 Nb5 Be3 Na3 Ke4 Nb5 f6 Nd6+ Kd4 Nb5+ Ke5 Nc3 Bc5 Nb5 Bd4 Na3 Be3 Nc4+ Kd5 
 19	+3.97 	178588	0:00.08	exf3 Kf1 Ke4 Kf2 f4 Kf1 Bd2 Nc5+ Kd5 Na4 Be3 Nc3+ Ke5 Nb5 Bc5 Nc3 Bd4 Nb5 Be3 Na3 Ke4 Nb5 f6 Nd6+ Kd4 Nb5+ Kd3 
 18	+4.00 	161140	0:00.08	exf3 Kf1 Ke4 Kf2 f4 Kf1 Bd2 Nb6 Be3 Nc4 f6 Nd6+ Kd4 Nb5+ Ke5 Nc3 Bb6 Nb5 f2 
 17	+4.01 	118666	0:00.06	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 f6 Nd2+ Ke3 Nc4+ Kd3 Nb2+ Ke4 Nc4 f4 Nd2+ Ke3 Nc4+ Kd3 Nb2+ Ke4 Nd1 Kd5 Ke1 Ke5 Kf1 Bd4 Ke1 Ke4 Nf2+ Bxf2+ Kxf2 f5 Kg1 Ke3 
 16	+4.01 	101217	0:00.05	exf3 Nb2 Ba3 Nc4 Bc5+ Kf1 Bb4 Kf2 Ke4 Kf1 f6 Nb2 Bc5 Nc4 f4 Nd2+ Ke3 Nc4+ Kd3 Nb2+ Kd2 Nc4+ Kc3 
 15	+4.16 	80831  	0:00.04	exf3 Nb2 Bc5+ Kf1 Ke3 Nc4+ Ke4 Nd2+ Kf4 Nc4 Kg3 Ne5 f6 Nd3 Bd4 Nb4 f4 Nd5 f2 Ne7 Bb6 Nd5 Bd8 
 14	+4.28 	59714  	0:00.03	exf3 Nb2 Bc5+ Kf1 Ke3 Nc4+ Ke4 Ke1 Be7 Nd2+ Ke3 Nc4+ Kd4 Nb2 Bh4+ Kf1 Bg3 Nd1 Kd3 Nb2+ Ke4 Nd1 f4 Nc3+ Ke3 Nd5+ Kd3 Nb4+ Ke3 
 13	+4.70 	34551  	0:00.02	exf3 Nb2 Bc5+ Kf1 Kg3 Nc4 f6 Nb2 f4 Nd1 Bd4 Nf2 f5 Nh1+ Kg4 
 12	+4.35 	28399  	0:00.02	exf3 Nb2 Bc5+ Kf1 Kg3 Nc4 f4 Nd2 f5 Nc4 f2 Ne5 
 11	+4.22 	14502  	0:00.01	exf3 Nb2 Ke4 Kf1 Bc5 Nc4 f6 Nd2+ Ke3 Nb3 Bd6 Na1 Bb8 Nc2+ Ke4 Nb4 Bg3 
 10	+4.20 	7258    	0:00.00	exf3 Nb2 Ke4 Na4 f4 Kf1 Bd2 Kf2 Be3+ Ke1 f5 Nc3+ Kd4 
  9	+8.74 	2174    	0:00.00	exf3 Nb6 Bc5+ Kf1 Bxb6 Ke1 Ke3 Kf1 Ba5 
  8	+8.67 	1042    	0:00.00	exf3 Nb6 Bc5+ Kf1 Bxb6 Ke1 Ke3 
  7	+6.79 	805      	0:00.00	exf3 Nb6 Bc5+ Kf1 Bxb6 Ke1 Bd4 Kf1 
  6	+6.71 	678      	0:00.00	exf3 Nb6 Bc5+ Kf1 Bxb6 
  5	+3.18 	546      	0:00.00	exf3 Nb6 Ba3 Nc4 Be7 
  4	+4.41 	204      	0:00.00	exf3 Nb6 Kg4 Nd5 Bc5+ Ke1 
  3	+4.85 	98        	0:00.00	exf3 Kf1 Ke3 
  2	+4.57 	71        	0:00.00	exf3 Kf1 
  1	+5.56 	21        	0:00.00	e3+ Kf1 Kxf3 

and the position has 10 moves that win easily , did you post an incorrect FEN perhaps?

 45	+100.00 	561.4M	0:16.01	Ba5 Nb2 Bb6+ Ke2 exf3+ Kf1 Ke4 Nc4 Bc7 Nd2+ Ke3 Nc4+ Kd3 Nb2+ Kc3 Nd1+ Kd2 Nb2 Bg3 Na4 Kd3 Nb2+ Kc3 Na4+ Kd4 Kg1 Kd3 Kf1 Be5 Nb6 Bd4 Nd5 Ke4 Nc7 Bc5 Ne8 Kd5 Nf6+ Ke6 Ne8 Bd4 Nc7+ Kd6 Nb5+ Kd5 Nc7+ Kc6 Na6 Bc5 Nb8+ Kd6 Ke1 Ba3 
 45	+100.00 	561.2M	0:16.00	Bd2 Nb2 Be3+ Ke2 exf3+ Ke1 Ke4 Nc4 Bg5 Nb6 Bf4 Kf1 Kd3 Nc8 Bc7 Ke1 Ke4 Kf2 Kf4 Ne7 Bb6+ Kf1 Ke5 Nc6+ Ke4 Nb4 Bc5 Nc6 Kd5 Nd8 Be7 Nb7 Bf6 Na5 Bd4 Nb3 Bc3 Nc1 Kc4 Kf2 Bd2 Na2 Bf4 Kg1 Be3+ Kf1 Kb3 Nc1+ Bxc1 Kf2 
 45	+100.00 	560.7M	0:15.99	Ke5 Ke2 Kd4 Nb6 Bc5 Nd7 Kd5 f4 Be7 Ke3 Ke6 Ne5 Bc5+ Ke2 Bd6 Nc6 Kd5 Na7 Bxf4 Nb5 Be5 Kd2 f4 Ke2 Bd6 Nc3+ Ke5 Kf2 Bc5+ Kg2 f3+ Kg3 f2 Kg2 f1=Q+ Kxf1 
 45	+100.00 	560.3M	0:15.98	Be7 Ke2 Bd6 Nc3 Kg3 Nd5 Bc5 Nf6 Kg2 f4 Kg3 Nd7 Bd6 Nb6 Bxf4 Nc8 Kg2 Ne7 Bg5 Nxf5 
 45	+100.00 	549.4M	0:15.66	f6 Ke2 Kg3 Nb6 Kg2 f4 Kg3 Nd5 Bc5 Nc7 Bd6 Ne6 Bxf4 Nd4 Bg5 Nxf5+ 
 45	+100.00 	430.6M	0:12.16	Bf8 Ke2 Kg3 Nc3 Kg2 fxe4 f4 Nd1 f3+ Kd3 Bh6 Kc4 f2 Nxf2 
 45	+100.00 	404.2M	0:11.37	Ba3 Ke2 Kg3 Nb6 Kg2 f4 Bc5 Nc4 Kg3 Ne5 Kxf4 Nxf7 
 45	+100.00 	380.2M	0:10.65	Bd6 Nb2 Bc5+ Ke2 Kg3 Nc4 Kg2 f4 Kg3 Ne5 Kxf4 Nxf7 
 45	+78.64 	576.5M	0:16.44	exf3 Nb2 Bc5+ Kf1 Ke4 Nc4 f6 Nd2+ Ke3 Nc4+ Kd3 Nd2 f2 Nb3 Be3 Na5 f4 Nc6 Ke4 Ke2 Bb6 Kf1 Bc5 Nb8 Kf5 Nc6 Ke6 Nd8+ Kd5 Nf7 Bb6 Nh8 Be3 Ng6 Ke4 Nh4 Bd4 Ng6 Kf3 Ne5+ Kg3 Nc4 Bc5 Na5 Kg4 Nc4 Kf5 Ke2 Be3 Nd6+ Ke5 Ne8 Bd4 Nc7 Ke4 Ne 
 45	+71.47 	580.8M	0:16.57	e3+ Ke2 Bd2 Nc5 Kg5 Nb3 f4 Kd3 Bb4 Nc1 Bd6 Nb3 Be5 Nc5 Bc7 Ne4+ Kf5 Ke2 Ke6 Ng5+ Kf6 Ne4+ Kf5 Nc5 Be5 Nd3 Bd6 Ne1 Ke5 Kd3 Bc5 Nc2 Kd6 Ke2 Kd5 Kd3 Bd6 Nd4 Bc7 Nc2 Ke5 Nd4 Bb6 Ne2 Kf5 Nc3 Bc5 Ne4 Bb4 Ke2 Kg6 

also your first position is not draw either

"8/5p2/8/2b2p2/2N2k2/5p2/8/5K2 b - - 5 51
here's another one"

5 different moves win

56 +100.00 3.72G 1:01.36 f6 Na5 Kg4 Nc4 f4 Nd2 Be3 Ne4 Bd4 Nd6 f2 Ne4 Kf5 Nd6+ Ke6 Nb5 Kd5 Nc7+ Ke4 Ke2 Bb6 Nb5 Bc5 Nc3+ Kd4 Nb5+ Kd5 Nc7+ Kd6 Nb5+ Kc6 Nc3 Bd4 Ne4 Kd5 Nd2 Ke5 Nc4+ Kf5 Nd6+ Kg6 Ne4 Bb6 Nd6 Bc5 Ne4 Bd4 Kf1 Bb6 Nd6 Be3 Nc4 Bc5 Nd2 Kf5 K
56 +100.00 3.71G 1:01.09 Bf8 Kf2 f6 Na5 Bb4 Nc4 Bc5+ Kf1 Bd4 Nd2 Kg3 Nc4 f2 Nd6 Kg4 Nf7 f4 Nd6 Kf3 Nc4 Bc5 Ne5+ Kg3 Nc4 Bb4 Nb6 Bc3 Nc4 Be5 Nd2 Bd4 Ne4+ Kh4 Ke2 Kg4 Kf1 Kf5 Nd6+ Ke6 Nb5 Kd5 Nc7+ Ke4 Ke2 Bb6 Nb5 Bc5 Nc3+ Kd4 Nb5+ Kd5 Nc7+ Kd6 Nb5+ Kc6
56 +100.00 3.70G 1:01.00 Ke4 Nd2+ Ke3 Nc4+ Kf4 Nd2 Bb6 Nc4 Bc7 Kf2 Be5 Nd2 Bd4+ Kf1 Bc3 Nc4 Bb4 Kg1 Ke4 Kf2 f4 Kf1 Kd4 Nb6 Bd2 Nd7 Ke4 Nf6+ Kf5 Ne8 Be3 Nd6+ Ke6 Ne4 f6 Nc3 Kd6 Ne4+ Ke5 Nc3 Bd4 Nb1 Ke4 Nd2+ Ke3 Nc4+ Kd3 Nd6 f2 Nf5 Ke4 Nd6+ Kd5 Nb5 Ke5
56 +100.00 3.70G 1:00.94 Bb4 Kg1 Ke4 Kf2 f4 Kf1 Kd4 Nb6 Bd2 Nd7 Ke4 Nf6+ Kf5 Ne8 Be3 Nd6+ Ke6 Ne4 f6 Nc3 Kd6 Ne4+ Ke5 Nc3 Bd4 Nb1 Ke4 Nd2+ Ke3 Nc4+ Kd3 Nd6 f2 Nf5 Ke4 Nd6+ Kd5 Nb5 Ke5 Ke2 Be3 Nc3 Bc5 Kf1 Bd4 Nb1 Ke4 Nd2+ Kd3 Nf3 Bb6 Nh4 Bc5 Nf3 Ke3 Nh
56 +100.00 3.67G 1:00.54 Be7 Kf2 Bh4+ Kg1 Bg3 Nd2 Be1 Nc4 f6 Kf1 Bg3 Nd6 Kg4 Nc4 Be5 Ne3+ Kg5 Nc4 Bc7 Ne3 Bb6 Nc4 Bc5 Nd2 Kg4 Nc4 f4 Nd2 Be3 Ne4 Kf5 Nd6+ Ke5 Nb5 Bc5 Nc7 f2 Nb5 Ke4 Nc7 Bb6 Ne8 Ke5 Ke2 Bc5 Ng7 Ke4 Nh5 Bd4 Ng7 Bb6 Ne8 Ke5 Ng7 Bc5 Ne8 Ke

@kelvinwop
Copy link

kelvinwop commented Sep 21, 2019

right, I was recommended bishop A5 for some reason and then after following it for about 100 moves or so, the evaluation eventually dropped to 0

@kelvinwop
Copy link

kelvinwop commented Sep 21, 2019

5 different moves win

56 +100.00 3.72G 1:01.36 f6 Na5 Kg4 Nc4 f4 Nd2 Be3 Ne4 Bd4 Nd6 f2 Ne4 Kf5 Nd6+ Ke6 Nb5 Kd5 Nc7+ Ke4 Ke2 Bb6 Nb5 Bc5 Nc3+ Kd4 Nb5+ Kd5 Nc7+ Kd6 Nb5+ Kc6 Nc3 Bd4 Ne4 Kd5 Nd2 Ke5 Nc4+ Kf5 Nd6+ Kg6 Ne4 Bb6 Nd6 Bc5 Ne4 Bd4 Kf1 Bb6 Nd6 Be3 Nc4 Bc5 Nd2 Kf5 K

image

... is it really though??

@kelvinwop
Copy link

I'm following these lines you sent, but even with a search depth of 70 half-moves the evaluation is still at -3.4.

@kelvinwop
Copy link

kelvinwop commented Sep 21, 2019

56 +100.00 3.71G 1:01.09 Bf8 Kf2 f6 Na5 Bb4 Nc4 Bc5+ Kf1 Bd4 Nd2 Kg3 Nc4 f2 Nd6 Kg4 Nf7 f4 Nd6 Kf3 Nc4 Bc5 Ne5+ Kg3 Nc4 Bb4 Nb6 Bc3 Nc4 Be5 Nd2 Bd4 Ne4+ Kh4 Ke2 Kg4 Kf1 Kf5 Nd6+ Ke6 Nb5 Kd5 Nc7+ Ke4 Ke2 Bb6 Nb5 Bc5 Nc3+ Kd4 Nb5+ Kd5 Nc7+ Kd6 Nb5+ Kc6

image

this second line is also stuck at -3.4

Yeah it seems impossible for black to win. His bishop can't force the white king off the promotion square so he can only move back and forth forever.

I guess its possible to make a generalized checker for this pattern:

  • Given you have extra pawn/pawns
  • If your bishop doesn't cover the promotion square
  • The evaluated "advantage" doesn't change for at least 10-15 moves

@kelvinwop
Copy link

Another idea I had is generally when you're winning, the opponent doesn't really have counterplay and your advantage keeps increasing. Perhaps looking at d/dx advantage could be useful, as an example d/dx advantage in these drawn positions is always zero or slightly negative (I've evaluated the picture above to 76 moves, and it says move 78 is now at -3.3 instead of -3.4) so maybe that can be used to generalize the drawn endgame patterns.

@Alayan-stk-2
Copy link
Author

Alayan-stk-2 commented Sep 21, 2019

Your position is winning through zugzwang. The knight can't flee forever, and once he's down, the white king will be forced to leave the promotion square.

This is a tablebase win, there is not any doubt about the result or the best moves.

@MichaelB7
Copy link
Contributor

MichaelB7 commented Sep 21, 2019

@kelvinwop Stockfish will eventually see the win here in your examples which is correct. Cyclic zugzwang positions are one the most difficult concepts to understand in chess as they are often mistaken for fortresses where no fortress exists. Through a series of repetitive like moves, the winning side can force the losing side to make a move that breaks the fortress and the winning side then wins easily. Here's another example of a cyclic zugzwang position where the solution is not quite as far out in the number of moves required to break the fortress as it is in your examples, which might be be useful for you to study.


8/2p5/5k2/3P1P2/4P1K1/8/1pB4P/4b3 w - - 0 1

1. e5 Kxe5 2. Kg5 Kxd5 3. f6 Ke5 4. Kg6 Bh4 5. f7 Be7 6. h4 Bf8 7. h5 Ke6 8. Bb1
Ke7 9. Bf5 c6 10. Bb1 Ke6 11. Bc2 Ke7 12. Bf5 c5 13. Bb1 Ke6 14. Bc2 Ke7
15. Bf5 c4 16. Bb1 Ke6 17. Bc2 Ke7 18. Bf5 c3 19. Bb1 Ke6 20. Bc2 Ke7 21. Bf5
c2 22. Bxc2 Ke6 23. Bb1 Ke7 24. Bf5 b1=Q 25. Bxb1 Ke6 26. Ba2 Ke7 27. h6 1-0

@kelvinwop
Copy link

ah yes, I had black follow syzygy tablebase moves and eventually stockfish got with the program
image
Probably the analysis board doesn't use the table which was why its moves couldn't win the game

@MichaelB7
Copy link
Contributor

I’m not sure what your settings are , but Stockfish without etgb and with default settings does find the winning moves given enough time.

@protonspring
Copy link

fyi, I wrote some end game positions generator some time ago and can generate 1000's of games given a certain endgame. Just let me know what you'd like.

@protonspring
Copy link

I can pick one and work on it. How about KNNKP? This seems like a draw unless the pawn can promote before a knight gets it. A strict eval on piece values probably doesn't do anything.

@Vizvezdenec
Copy link
Contributor

I don't really think it's any use to improve KNNKP.
It's not even 6-men but 5-men TB, in any real games/analysis stockfish plays with them.
I mean it's all cool and stuff but... SF is usually good enough there even in synthetic tests and in any real games this is 100 covered by TBs.

@Alayan-stk-2
Copy link
Author

TBs aren't used at fishtest, so though endgame knowledge is not very relevant in tournaments using TBs, it can influence testing games (plus some people don't use TBs, etc).

The difficulty is that any single game pattern probably doesn't bring enough elo to do a clean pass at fishtest.

Today, I taught Ethereal that a single minor piece has virtually no hope of winning against one or more enemy pawns. This covers KNKP, KBKP, KNKPP, KBKPP... where Ethereal frequently displayed eval between +1 and +2.5.

Elo gain ? About +2 elo at STC and LTC.

Most 5-6-7 men patterns that we can directly code rules for are not as frequent, or as clear-cut, or as egregious misevals. For KNNKP, the way SF overevaluates the drawn positions is poor (the current code is good to do the correct moves to win if it's winning, not to see from afar if the position is good or not), but a complete fix might gain 0.5 elo or so, something too small to be measured alone at fishtest. Even though good endgame knowledge helps to make better middlegame moves, fishtest games are often decided by the time a depth 15 search has many "hits" in 5 or 6-men positions.

So, one would need to do reliable code for different endgames through individual specialized testing (there, your position generation @protonspring is very useful), then ensure that if the eval transition is not smooth, it's only because of an almost-certain win/draw (i.e., if the specialized code isn't able to clearly determine the status of a position, you don't want to have a brutal eval jump from previous positions in a search tree), then test a bundle in fishtest hoping for the small parts to all add together for a measurable gain.

I have been wondering for a time too if some derivative of this approach could yield benefits for endgame knowledge:

A while ago I tried to improve the syzygy WDL+ compression by using decision trees to predict the outcome based on the position. The decision tree computes a permutation of the 5 possible WDL+ values, where the first value corresponds to the most likely result, the second value corresponds to the second most likely result, and so on. The table then stores the index in the permutation instead of storing the WDL+ value directly. This means that if the decision tree always predicts the correct WDL+ value, the table will contain only 0 values and compresses to almost nothing.

This works really well in some simple cases that the decision tree is able to handle efficiently. For example, KBBBvK is predicted almost perfectly, because I have a predicate that tells if there is a bishop on white squares and another predicate that tells if there is a bishop on black squares. Therefore the KBBBvK table can be compressed from 723K to about 1K.

The problem though is that the simple cases are mostly irrelevant to the overall quality of the compression scheme. The simple cases are taking up an insignificant part of the total disk space in the current syzygy implementation, so compressing them even better is mostly useless.

For the complicated cases, such as KRNPvKQ which is the largest 6 man WDL table, the decision tree is a lot less efficient at reducing the compressed size. The KRNPvKQ table is reduced by about 30% by the decision tree prediction. Other "complicated" tables are reduced by around 30% to 50%.

I did not finish my implementation because I currently do not think it is likely that I will be able to come up with significantly better predictors, which would be required to reduce the size further.

@kelvinwop
Copy link

What if when running the fishtest, instead of starting from normal starting position, you start from positions you know can simplify into these "bad" endgames

@Vizvezdenec
Copy link
Contributor

The problem with this that this heuristics will still affect other games that can not be simplified into such endgames.

@protonspring
Copy link

protonspring commented Oct 31, 2019

I wrote a different KNNKP ending and it beats master 3 to 1. So far, only 50k games, but looks good so far. I will keep going and post my own PR after I get my "best" version. I will also include how other can test to verify my results.

Either way, it looks like there is MUCH room for improvement in these endgames.

EDIT: My numbers were wrong because I had resign on.

@protonspring
Copy link

fyi, #2386

@protonspring
Copy link

Here is another improved KNNKP. #2553

@vondele
Copy link
Member

vondele commented Mar 13, 2020

Just so this doesn't get lost: some of these endgames also came up https://groups.google.com/forum/#!topic/fishcooking/B9kp77iiGdE with e.g. KRPvKBP and KRPPvKRP being misplayed often.

I did some testing, and at short TC (1+0.01), starting from books with just these two engames in a +- 50/50 win/draw mix as given by TB, master is 50 Elo worse than master with TB.

I did work a bit on KRPvKBP and it is possible to come up with a version that is 15-20Elo better than master on these engames:

Score of patch vs master: 12895 - 11123 - 15982  [0.522] 40000
Elo difference: 15.4 +/- 2.6, LOS: 100.0 %, DrawRatio: 40.0 %

but that's not enough to pass STC and LTC. Corresponding tests are here:
http://tests.stockfishchess.org/tests/view/5e6776b6e42a5c3b3ca2e392
http://tests.stockfishchess.org/tests/view/5e677ed7e42a5c3b3ca2e395

There was also an interesting test (@joergoster) on the value of current endgame knowledge:
http://tests.stockfishchess.org/tests/view/5e6508f8e42a5c3b3ca2e2d2
http://tests.stockfishchess.org/tests/view/5e64d119e42a5c3b3ca2e2af
showing it is worth ~25Elo, or ~60Elo on the normal and endgames book respectively.

In this context, there is data showing that full 6men syzygy on top of master is only about 20Elo at STC conditions: https://github.com/glinscott/fishtest/wiki/UsefulData#elo-gain-using-syzygy
which implies that adding endgame knowledge will be really difficult in our current setup, as even adding perfect knowledge on ~500 endgames is just 20Elo.

On the other hand, I feel that with the availability of all this data (TB, played games, ...), somehow it must be possible to extract some knowledge in a way that should benefit gameplay.

@protonspring
Copy link

protonspring commented Mar 13, 2020 via email

@Vizvezdenec
Copy link
Contributor

tbh I think that this endgame stuff means less and less with bigger depth and especially when TBs are added.
Sure it's cool to improve it and stuff but every dog uses TBs for any serious analysis anyway.

@Alayan-stk-2
Copy link
Author

Alayan-stk-2 commented Mar 15, 2020

7-men is not accessible for the vast majority of users, but 7-8-9 men positions are extremely relevant for analysis as they often end up as leaf nodes of deep searches and guide critical choices in the middlegame.

It is conceivable that say improving play in KRPPKRP can also be done with eval methods that would benefit rook endgames with even more pawns on board. For example, something evaluating how many tempi a rook needs to attack a pawn could be very useful. Many winning positions involve a rook being one tempi too late to stop promotion without being lost for the pawn, while in many drawn positions that are wrongly evaluated as very good, the weak side isn't too late.

A specialized eval function for a single combination of pieces has the downside of being rather narrow. Generally speaking, the less pawns left on the board and the more we find erratic patterns that deviate a lot from regular eval. In theory, though, an upside of a specialized term is that when the main eval isn't tasked with not being too wrong in peculiar positions, it can reach a better optimum elsewhere. Like when after a new eval term is introduced, there is some elo in tuning the related PSQTs.

If we consider that SF is used wihtout TB as a chess tool by many websites, specialized knowledge can also make it give better advice to players.

Whenever possible, changes that can target a whole class of material combinations rather than a single one are more interesting.

@vondele
Copy link
Member

vondele commented Mar 15, 2020

I agree that certainly 7men is not available to most users (let alone on SSD). Yes, ideally, one can improve eval in a way that is also suitable for more than 7 men. Just to illustrate there is some room there, I've made histograms of the endgame score over ~1M KRPPvKRP positions, in two categories, those that are TB wins, those that are TB draws. While the e.g. score can differentiate between the two, it is clearly far from perfect:
KRPPvKRP_score_vs_tb

@Vizvezdenec
Copy link
Contributor

It's simplier to be said then done.
I read some stuff around chess quite a lot and I can tell you that even 6-men rook endgames can't really be evaluated by GMs on "fly" in quite a lot of cases and even with some calculations - there are a lot of weird wins and weird draws there. By GMs I talk about playing 2650~ or so.
So if GMs can't really do it reliably it's would be even harder to code them in simple static eval.
Honestly probably the most reliable way to do this will be to create a tiny neural network and train it to return "good" static eval on this types of positions. :)

@vondele
Copy link
Member

vondele commented Mar 15, 2020

Sure, some machine learning might really be useful, not necessarily NN. The patch I mentioned above for KRPvKBP was written using some simple ML. I'll try to do something similar for KRPPvKRP in the coming weeks.

@joergoster
Copy link
Contributor

There was also an interesting test (@joergoster) on the value of current endgame knowledge:
http://tests.stockfishchess.org/tests/view/5e6508f8e42a5c3b3ca2e2d2
http://tests.stockfishchess.org/tests/view/5e64d119e42a5c3b3ca2e2af
showing it is worth ~25Elo, or ~60Elo on the normal and endgames book respectively.

It looks like even the 6-man bases have a hard time to catch up with all endgame knowledge!
(The stripped off version vs. itself with 6-man syzygy bases, endgames.epd, tc 15+0.15, 500 games)

Finished game 500 (SF-NoEG2 vs SF-6-man): 1/2-1/2 {Draw by 3-fold repetition}
Score of SF-6-man vs SF-NoEG2: 126 - 53 - 321  [0.573] 500
Elo difference: 51.1 +/- 18.0, LOS: 100.0 %, DrawRatio: 64.2 %
Finished match

@protonspring
Copy link

protonspring commented Mar 15, 2020 via email

@vondele
Copy link
Member

vondele commented Mar 18, 2020

To test individual endgames, I think right now best done at home, we need some books and stats.

https://www.dropbox.com/s/b2i63tzgwi1h39v/material_key_books.zip?dl=0

contains a collection of FENs, ordered by material key. These FENs have been extracted from a few million Stockfish LTC testing games, collecting positions with a given material key, 9 pieces or less, one position per key per game, and only for keys that are on the board for 6 plies or moe. The could serve as testing books for certain material counts, and give an indication of importance of certain combinations. There is a README.txt with full statistics (i.e. counts), but a summary is below:

====== 2 =====
====== 3 =====
KPvK 78603
====== 4 =====
KPvKP 84144
KRvKR 62640
KRvKP 30401
KQvKQ 29124
KRvKN 27004
KRvKB 25984
KBvKP 25908
KNvKP 25193
KNvKN 17323
KBPvK 12495
====== 5 =====
KRPvKR 156903
KRNvKR 46519
KRBvKR 42938
KNPvKN 23886
KQPvKQ 23612
KBPvKB 22023
KPPvKP 19888
KPPvKR 19730
KNPvKB 17721
KNPvKR 16866
====== 6 =====
KRPvKRP 209365
KPPvKPP 51289
KRPPvKR 50869
KBPvKNP 37997
KRNvKRP 34333
KRBvKRP 34161
KNPvKNP 30643
KQPvKQP 27005
KBPvKBP 26727
KRPvKBP 15524
====== 7 =====
KRPPvKRP 297797
KBPPvKBP 55704
KQPPvKQP 46384
KNPPvKNP 39945
KRBPvKRB 31618
KNPPvKBP 30076
KBPPvKRP 24190
KBPPvKNP 23503
KRPPvKRN 22838
KRPPvKRB 22717
====== 8 =====
KRPPvKRPP 245752
KRPPPvKRP 60410
KBPPvKNPP 58577
KBPPvKBPP 46016
KPPPvKPPP 44663
KQPPvKQPP 44092
KRBPvKRNP 43996
KNPPvKNPP 41787
KRBPvKRBP 34610
KRNPvKRNP 24680
====== 9 =====
KRPPPvKRPP 277860
KRBPPvKRBP 85042
KQPPPvKQPP 62193
KBPPPvKBPP 58985
KRNPPvKRBP 52590
KRNPPvKRNP 52306
KRBPPvKRNP 45318
KNPPPvKNPP 38940
KRRPPvKRRP 34505
KNPPPvKBPP 31339

@vondele
Copy link
Member

vondele commented Mar 19, 2020

For the endgames above, I used these positions to have master play against master+table bases (6men + relevant 7men), 1000 games at short TC (1.0+0.01). It does show that this set of positions is quite biased to drawing position, but more importantly, this gives an idea which endgames might have most potential for improvement:

=================== KPvKP ===============
Score of tb vs master: 0 - 0 - 1000  [0.500] 1000
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
=================== KRvKR ===============
Score of tb vs master: 3 - 0 - 997  [0.501] 1000
Elo difference: 1.0 +/- 1.2, LOS: 95.8 %, DrawRatio: 99.7 %
=================== KRvKP ===============
Score of tb vs master: 7 - 0 - 993  [0.503] 1000
Elo difference: 2.4 +/- 1.8, LOS: 99.6 %, DrawRatio: 99.3 %
=================== KQvKQ ===============
Score of tb vs master: 0 - 0 - 1000  [0.500] 1000
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
=================== KRvKN ===============
Score of tb vs master: 20 - 0 - 980  [0.510] 1000
Elo difference: 6.9 +/- 3.0, LOS: 100.0 %, DrawRatio: 98.0 %
=================== KRvKB ===============
Score of tb vs master: 67 - 0 - 933  [0.533] 1000
Elo difference: 23.3 +/- 5.4, LOS: 100.0 %, DrawRatio: 93.3 %
=================== KBvKP ===============
Score of tb vs master: 0 - 0 - 1000  [0.500] 1000
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
=================== KNvKP ===============
Score of tb vs master: 1 - 0 - 999  [0.500] 1000
Elo difference: 0.3 +/- 0.7, LOS: 84.1 %, DrawRatio: 99.9 %
=================== KNvKN ===============
Score of tb vs master: 0 - 0 - 1000  [0.500] 1000
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
=================== KBPvK ===============
Score of tb vs master: 0 - 0 - 1000  [0.500] 1000
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %


=================== KRPvKR ===============
Score of tb vs master: 51 - 21 - 928  [0.515] 1000
Elo difference: 10.4 +/- 5.7, LOS: 100.0 %, DrawRatio: 92.8 %
=================== KRNvKR ===============
Score of tb vs master: 40 - 0 - 960  [0.520] 1000
Elo difference: 13.9 +/- 4.2, LOS: 100.0 %, DrawRatio: 96.0 %
=================== KRBvKR ===============
Score of tb vs master: 247 - 0 - 753  [0.624] 1000
Elo difference: 87.6 +/- 9.9, LOS: 100.0 %, DrawRatio: 75.3 %
=================== KNPvKN ===============
Score of tb vs master: 5 - 0 - 995  [0.502] 1000
Elo difference: 1.7 +/- 1.5, LOS: 98.7 %, DrawRatio: 99.5 %
=================== KQPvKQ ===============
Score of tb vs master: 160 - 24 - 816  [0.568] 1000
Elo difference: 47.5 +/- 8.9, LOS: 100.0 %, DrawRatio: 81.6 %
=================== KBPvKB ===============
Score of tb vs master: 8 - 1 - 991  [0.503] 1000
Elo difference: 2.4 +/- 2.0, LOS: 99.0 %, DrawRatio: 99.1 %
=================== KPPvKP ===============
Score of tb vs master: 36 - 4 - 960  [0.516] 1000
Elo difference: 11.1 +/- 4.2, LOS: 100.0 %, DrawRatio: 96.0 %
=================== KPPvKR ===============
Score of tb vs master: 39 - 1 - 960  [0.519] 1000
Elo difference: 13.2 +/- 4.2, LOS: 100.0 %, DrawRatio: 96.0 %
=================== KNPvKB ===============
Score of tb vs master: 7 - 0 - 993  [0.503] 1000
Elo difference: 2.4 +/- 1.8, LOS: 99.6 %, DrawRatio: 99.3 %
=================== KNPvKR ===============
Score of tb vs master: 26 - 1 - 973  [0.512] 1000
Elo difference: 8.7 +/- 3.5, LOS: 100.0 %, DrawRatio: 97.3 %
=================== KRPvKRP ===============
Score of tb vs master: 26 - 4 - 970  [0.511] 1000
Elo difference: 7.6 +/- 3.7, LOS: 100.0 %, DrawRatio: 97.0 %


=================== KPPvKPP ===============
Score of tb vs master: 9 - 0 - 991  [0.504] 1000
Elo difference: 3.1 +/- 2.0, LOS: 99.9 %, DrawRatio: 99.1 %
=================== KRPPvKR ===============
Score of tb vs master: 366 - 230 - 404  [0.568] 1000
Elo difference: 47.5 +/- 16.7, LOS: 100.0 %, DrawRatio: 40.4 %
=================== KBPvKNP ===============
Score of tb vs master: 6 - 0 - 994  [0.503] 1000
Elo difference: 2.1 +/- 1.7, LOS: 99.3 %, DrawRatio: 99.4 %
=================== KRNvKRP ===============
Score of tb vs master: 48 - 0 - 952  [0.524] 1000
Elo difference: 16.7 +/- 4.6, LOS: 100.0 %, DrawRatio: 95.2 %
=================== KRBvKRP ===============
Score of tb vs master: 192 - 0 - 808  [0.596] 1000
Elo difference: 67.5 +/- 8.8, LOS: 100.0 %, DrawRatio: 80.8 %
=================== KNPvKNP ===============
Score of tb vs master: 6 - 0 - 994  [0.503] 1000
Elo difference: 2.1 +/- 1.7, LOS: 99.3 %, DrawRatio: 99.4 %
=================== KQPvKQP ===============
Score of tb vs master: 67 - 13 - 920  [0.527] 1000
Elo difference: 18.8 +/- 6.0, LOS: 100.0 %, DrawRatio: 92.0 %
=================== KBPvKBP ===============
Score of tb vs master: 1 - 0 - 999  [0.500] 1000
Elo difference: 0.3 +/- 0.7, LOS: 84.1 %, DrawRatio: 99.9 %
=================== KRPvKBP ===============
Score of tb vs master: 307 - 94 - 599  [0.607] 1000
Elo difference: 75.2 +/- 13.4, LOS: 100.0 %, DrawRatio: 59.9 %


=================== KRPPvKRP ===============
Score of tb vs master: 169 - 44 - 787  [0.563] 1000
Elo difference: 43.7 +/- 9.7, LOS: 100.0 %, DrawRatio: 78.7 %
=================== KBPPvKBP ===============
Score of tb vs master: 58 - 15 - 927  [0.521] 1000
Elo difference: 14.9 +/- 5.7, LOS: 100.0 %, DrawRatio: 92.7 %
=================== KQPPvKQP ===============
Score of tb vs master: 167 - 39 - 794  [0.564] 1000
Elo difference: 44.7 +/- 9.5, LOS: 100.0 %, DrawRatio: 79.4 %
=================== KNPPvKNP ===============
Score of tb vs master: 123 - 32 - 845  [0.545] 1000
Elo difference: 31.7 +/- 8.3, LOS: 100.0 %, DrawRatio: 84.5 %
=================== KRBPvKRB ===============
Score of tb vs master: 71 - 6 - 923  [0.532] 1000
Elo difference: 22.6 +/- 5.8, LOS: 100.0 %, DrawRatio: 92.3 %
=================== KNPPvKBP ===============
Score of tb vs master: 92 - 30 - 878  [0.531] 1000
Elo difference: 21.6 +/- 7.4, LOS: 100.0 %, DrawRatio: 87.8 %
=================== KBPPvKRP ===============
Score of tb vs master: 163 - 38 - 799  [0.563] 1000
Elo difference: 43.7 +/- 9.4, LOS: 100.0 %, DrawRatio: 79.9 %
=================== KBPPvKNP ===============
Score of tb vs master: 133 - 45 - 822  [0.544] 1000
Elo difference: 30.7 +/- 8.9, LOS: 100.0 %, DrawRatio: 82.2 %
=================== KRPPvKRN ===============
Score of tb vs master: 35 - 1 - 964  [0.517] 1000
Elo difference: 11.8 +/- 4.0, LOS: 100.0 %, DrawRatio: 96.4 %
=================== KRPPvKRB ===============
Score of tb vs master: 116 - 1 - 883  [0.557] 1000
Elo difference: 40.1 +/- 7.0, LOS: 100.0 %, DrawRatio: 88.3 %

vondele referenced this issue in vondele/Stockfish Mar 22, 2020
@vondele
Copy link
Member

vondele commented Mar 22, 2020

I tried to quantify the effect of using selected 7men TB for playing games (tb7 = complete 6men +
KBPPvKBP KBPPvKNP KBPPvKRP KNPPvKBP KNPPvKNP KQPPvKQP KRBPvKRB KRPPPvKR
KRPPvKRB KRPPvKRN KRPPvKRP). The WDL files are on a fast SSD (Corsair MP600, Gen4 PCIe), while the DTZ files are on spinning disk. TC is short (10+0.1), master(tb7) vs master(tb6). The machine has 64GB RAM, which is of course not enough the cache all WDL in use (380GB). The result is not very impressive:

Score of tb7 vs tb6: 2667 - 2671 - 10271  [0.500] 15609
Elo difference: -0.1 +/- 3.2, LOS: 47.8 %, DrawRatio: 65.8 %

I'm wondering if this can be reproduced (@joergoster ?), but my setup seems legit.

Earlier results indicated that at STC the benefit of TB is larger than at LTC (6men, but in RAM, see https://github.com/glinscott/fishtest/wiki/UsefulData#elo-gain-using-syzygy).

@miguel-l
Copy link
Contributor

Hi, I'm curious if there's an update on this issue.

Are there a plans for example to make endgame books available on fishtest or to have a separate method (testing against TBs?) or do we still rely on sprt for now?

@protonspring
Copy link

#2745

@kelvinwop
Copy link

image
lichess stockfish analysis board correctly identifies this position as winning, but the deeper the search goes, it actually becomes evaluated as drawn... is this a lichess analysis board quirk or a stockfish bug?

@vondele
Copy link
Member

vondele commented Jun 28, 2020

current master finds mate 10 quite quickly, same as sf11

@dsmsgms
Copy link
Contributor

dsmsgms commented Jul 4, 2020

@kelvinwop Your screenshots crops the FEN string, I think you might have gotten struck by the 50 move rule, but you have to verify when was the last capture or pawn push.

@protonspring
Copy link

@miguel-l You can test with the endgames.epd book.

@vondele
Copy link
Member

vondele commented Aug 12, 2020

I'll close this, needs a fresh investigation after NNUE has been tuned.

@vondele vondele closed this as completed Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests