Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] closed positions book. #2646

Closed
vondele opened this issue Apr 25, 2020 · 73 comments
Closed

[RFC] closed positions book. #2646

vondele opened this issue Apr 25, 2020 · 73 comments

Comments

@vondele
Copy link
Member

vondele commented Apr 25, 2020

I have made a pull request to the official book repo with a closed positions book.
official-stockfish/books#8
this still needs some testing, but should eventually be available.

I first want to do some testing comparing this to the noob_3moves book on fishtest before we possibly start using this, so that we have a feeling for its quality. My initial impression is rather good.

There are several options we can first discuss here before I decide on this.

  • Allow patches to be tested against this book, normal stc and ltc. Leave choice up to the submitter
  • test all patches against this book, just switch for a couple of weeks.
  • first retest a couple of patches that were aiming at closed positions but didn't pass.
  • etc.
@MJZ1977
Copy link
Contributor

MJZ1977 commented Apr 25, 2020

This is a very good idea I had suggested many times before !

But just for me pe->blockedcount() >=4 is not enough. Many of positions of the book are not blocked (>80%). Can we add by hand some french and king indian positions and retrieve clearly open positions?

Edit: we can allow patchs with this book and test STC non regression with initial book.

@vondele
Copy link
Member Author

vondele commented Apr 25, 2020

@MJZ1977 , thanks! Some related observations/notes:

  • the positions in the book are from before the blocked position is reached, i.e. the position in the game out of which this position is extracted becomes more blocked as SF plays.
  • I hope that this allows for some variety still, and that improvements will come from both avoiding to get into blocked positions when not advantageous and from playing blocked positions well
  • Roughly only 1 out 50 games currently played games on fishtest matched the criterion 'blocked', so this is already 'a massive change' compared to the current state.
  • Adding by hand is not so easy, I had no way to get the ECO code of a game (fishtest games start from a fen nowadays, not from moves), and one needs ~50k different positions to make a reasonable book. I assume a few people have more advanced tools, and could contribute another book constructed with a different strategy.
  • Very narrow opening books (e.g. just French) might be a bit risky, overfitting could be lurking there.

@NKONSTANTAKIS
Copy link

Thanks for this exciting incentive!

Both strategies should be valid, the specialized one would indeed require a non-regression step.
This is a versatile book with a stronger closed position signal, imo safe to use as normal book.
Probably more universal, due to closed positions heavy underrepresentation in default.
Distribution is evened out in regards to opening type instead of opening availability.

Another point is that for open positions search is a nifty tool, so its closed positions which need elements.

@vondele
Copy link
Member Author

vondele commented Apr 25, 2020

Influence of the book on Elo difference. noob_3moves.epd vs closedpos.epd.
Basically, books have a similar Elo performance, for both SF10 - SF11,
as well as SF11 - SFdev.

  • SF11 vs master (STC)

closed:

ELO: 17.94 +-1.7 (95%) LOS: 100.0%
Total: 60000 W: 13779 L: 10684 D: 35537
Ptnml(0-2): 880, 6085, 13460, 8210, 1365
https://tests.stockfishchess.org/tests/view/5ea415c913fcd4bb2f00a0e4

noob:

ELO: 17.91 +-1.7 (95%) LOS: 100.0%
Total: 60000 W: 13292 L: 10202 D: 36506
Ptnml(0-2): 814, 6166, 13525, 8106, 1389
https://tests.stockfishchess.org/tests/view/5ea415c913fcd4bb2f00a0e4
  • SF10 vs SF11 (STC):

closed:

ELO: 50.59 +-1.8 (95%) LOS: 100.0%
Total: 60000 W: 17819 L: 9143 D: 33038
Ptnml(0-2): 586, 4917, 12288, 9653, 2556
https://tests.stockfishchess.org/tests/view/5ea413e913fcd4bb2f00a0d3

noob:

ELO: 48.18 +-1.8 (95%) LOS: 100.0%
Total: 60000 W: 17306 L: 9038 D: 33656
Ptnml(0-2): 619, 5006, 12298, 9642, 2435
https://tests.stockfishchess.org/tests/view/5ea415ac13fcd4bb2f00a0e1
  • SF11 vs master (LTC, Edit: final values)

closed:

ELO: 20.12 +-1.8 (95%) LOS: 100.0%
Total: 40000 W: 7149 L: 4835 D: 28016
Ptnml(0-2): 211, 3221, 11101, 4977, 490 
https://tests.stockfishchess.org/tests/view/5ea45e85b908f6dd28f34ada

noob:

ELO: 17.45 +-1.7 (95%) LOS: 100.0%
Total: 40000 W: 6357 L: 4350 D: 29293
Ptnml(0-2): 224, 3109, 11590, 4590, 487 
https://tests.stockfishchess.org/tests/view/5ea45e72b908f6dd28f34ad7

I think this indicates that the book is pretty general purpose.

I will now reschedule a few of the recent yellow LTCs that presumably target
closed positions with the new book

@vondele
Copy link
Member Author

vondele commented Apr 25, 2020

Can I ask authors of recent yellow LTC patches (e.g. @Vizvezdenec @xoto10 @locutus2 @MJZ1977 @Lolligerhans) that target closed positions to resubmit them LTC, with the new closedpos.epd book, putting closedbook in the info field as well? Looks like a few of them will need rebasing so I can't easily reschedule.

I've reschedule 2 that were based on current master:
https://tests.stockfishchess.org/tests/view/5ea49685b908f6dd28f34b85
https://tests.stockfishchess.org/tests/view/5ea4969ab908f6dd28f34b87

@locutus2
Copy link
Member

I will retest with the closed book my pawn chain patches . I had three similiar version which all passed STC and failed LTC yellow.

xoto10 referenced this issue in SFisGOD/Stockfish Apr 25, 2020
@Lolligerhans
Copy link
Contributor

@vondele I had no such patch. I kept track of yellows so I am pretty sure. :)

@adentong
Copy link

Unrelated to the current topic, but the last regression was only ~11elo, but @vondele's LTC tests are showing 18/20 elos respectively for closed book/noob book. I know we use a different book for regression, but still a bit surprising.

@xoto10
Copy link
Contributor

xoto10 commented Apr 25, 2020

Very interesting results! Am i right in thinking this book is about the same size as noob_3moves ?

So we've used noob_3moves to play a lot of games, then sampled games we're interested in after 8 plies - is that 14 plies from startpos then? That might be a concern for long-term use as the standard book, but given the performance tests give very similar results to noob_3moves, I'm happy to test it out for a couple of weeks. Definitely a plus point to just update the main book instead of having a choice, and having to do non-regression tests against the main book, I just hadn't expected this to be an option. Interesting ...

@Vizvezdenec
Copy link
Contributor

well side note that last RT has different master that was behind by 2 elo patches and one simplification.
Also it's kinda expected I guess with 2 space/blocked positions interacting patches...

@adentong
Copy link

Yea well usually I wouldn't expect a 7-9 elo difference with just two elo gaining patches lol...

@NKONSTANTAKIS
Copy link

@adentong RT's use 8_moves book, which has the lowest elo spread (around 10% less). This makes the +50 elo between versions more meaningful. On top of that are the 3 patches, an undefined small effect of book optimization, and double error-bars.

@vondele
Copy link
Member Author

vondele commented Apr 26, 2020

I indeed wouldn't focus to much on the comparison to the RT, it is indeed not exactly the same version of the code, and the 8moves_v3 book is known to yield less Elo difference. The draw rate is slightly different with the books as well 8moves 0.74, noob_3moves 0.73, closedpos 0.70.
This all looks good IMO.

There have been a number of tests overnight using the new book (on old yellow LTCs):
https://tests.stockfishchess.org/tests/view/5ea49685b908f6dd28f34b85
https://tests.stockfishchess.org/tests/view/5ea4b95ab908f6dd28f34bde
https://tests.stockfishchess.org/tests/view/5ea4a0dcb908f6dd28f34ba4
https://tests.stockfishchess.org/tests/view/5ea4969ab908f6dd28f34b87
https://tests.stockfishchess.org/tests/view/5ea4a14cb908f6dd28f34bab
https://tests.stockfishchess.org/tests/view/5ea4a0efb908f6dd28f34ba7
none of them passed, and IIRC one yellow.... probably not too surprising.

So let's get the expectations right. The closedpos book is not a magic bullet, and it will remain a real challenge to get patches passed.

@vondele
Copy link
Member Author

vondele commented Apr 26, 2020

Based on the data collected, my proposal is to switch the default book to closedpos.epd relatively soon, used for essentially all tests (but not RT), and just continue testing as before. In particular, after passed STC and LTC tests on closedpos, PRs can be made, no need for additional non-regression tests. After a couple of weeks (June?) this strategy is reassessed.

Give thumbs up or down if you agree or disagree with this proposal.

@locutus2
Copy link
Member

@vondele
I would prefer more to do a non-regression against noob book but more in the sense of monitoring to be alarmed if it goes really bad. Here we can probably use weaker bounds like [-2;0].

But the the best approach seems for me to do a mixed book: 50% positions from closed book and 50% positions from noob book. So we would have the best of two worlds: closed position testing but no overfitting to this type of positions IMO.

@vondele
Copy link
Member Author

vondele commented Apr 26, 2020

@locutus2 I plan to do the monitoring based on the usual 8moves RT runs.

My argument against doing additional non-regression tests is that I want to keep our procedure as simple as possible. I'm also pretty confident that regression are unlikely. But if there is a strong feeling in favor of the additional testing on passed patches, I'm fine with it. So, let's see what the vibes are.

I'm not in favor of mixing the books. Let's try to get a clean signal. Again, the book is not extreme, and there will be opinions going in either direction (e.g. @MJZ1977 would like to see it more closed, you prefer a little more open).

@locutus2
Copy link
Member

locutus2 commented Apr 26, 2020

@vondele
About the clean signal point:
Ok i understand it from scientific standpoint it is good to get clean data about the closed book to asses it (here i'am with you). But its important how we go from there. Say the closed book seems good: take we then this further or mix it with par example the noob book (which till now also works). Here only the second one seems to avoid biased development and i think it is not good to go now from one extreme (unusual open positions) to another (near closed positions) so mixing up seems the best approach.

@vondele
Copy link
Member Author

vondele commented Apr 26, 2020

@locutus2 long term I can indeed see the point, and we can reassess.

Short term, let's figure out if the book actually matters much. I think this is an experiment to try and see if the perceived weakness in closed positions can actually be more easily fixed with a closed book (if one looks at the positions, it really is not that closed). We might find that this is not as important as we think.

This is in part an old discussion, the many years of development with the 2moves book, which really was not very sophisticated, illustrated that the book might not be the key ingredient to progress.

@MJZ1977
Copy link
Contributor

MJZ1977 commented Apr 26, 2020

I think we can keep the 2 books for instance and change the default once we have the ideas clear. It will be interessant to find a patch that shows a big gap between the 2 books. Green to "closed book" and red to "noob book". Then we can conclude.

@xoto10
Copy link
Contributor

xoto10 commented Apr 26, 2020

Last night I was thinking this was a big development ... now seeing the results of the reruns, it seems it doesn't make much difference at all. Perhaps there is a subtle change that we will become aware of over time. At the moment (very early of course), it seems the lower draw rate is perhaps the main change (benefit?) of this.

My main concern if we switch to using this book for the medium term remains the beginning of the game. If we want sf to get better at the early moves, surely we need a test book that includes small ply openings (say 0-5) as well as longer ones?

@miguel-l
Copy link
Contributor

The way I understand it is that we get positions which, in its games Stockfish closes the position (please correct me if I misunderstood something). But what about games that Stockfish fails to close the position? For example, when searching from root, very commonly we see the exchange French, etc. Something feels off about it.

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented Apr 26, 2020

I believe that the beginning of the game is too vague to be helped by eval, due to very high availability of viable options and different setups. But as the midgame eval becomes more accurate, it will show at openings via better steering of search.

This book should not be regarded as a specialized closed position book, but as an attempt for a more balanced general book in regards to position type. The conditioning is soft and leads to open positions too. The problem with typical books is that they are balanced in regards to viable opening availability, thus tiny signal of truly closed positions. SF has problem with those for 3 reasons:

  • Rarity of occurence, as explained
  • Vastly different characteristics
  • Inefficiency of search (as their long-term nature, where 1 pawn move can ruin the prospects forever, entering a distant dead-end)

Search inefficiency (and unfortunate setup selection) has partly to do with seeking generically favorable evals: A highly valued bonus in a static position acts like a black hole for the search. It sucks up all the resources to that direction, because it "believes" its something supreme, blinding it for alternatives. An example is a very deep knight outpost at totally blocked flank + space advantage. Totally useless at a glance for chess players, but SF aiming for it form early game even.

Removing those black-holes completely will require "alien" tech like pattern-recognition, MCTS, NN, or a detailed categorization of cases. But an increased representation of black-hole situations will surely boost long-term health.

I don't believe SF needs training at positions that are very easy for it, nor is it in danger of regressing. At tactical cases the various paths are narrow and concrete and search shines.

@xoto10
Copy link
Contributor

xoto10 commented Apr 26, 2020

But what about games that Stockfish fails to close the position?

Good question. I guess there will be a few d4/e5 French advance structures in this book, perhaps this can be an iterative process and the book can be recreated occasionally? If we can improve sf's blocked position play a little, then it will choose more blocked positions ... then we can improve it's play a little more ... etc

Edit: or we could just get some games from somewhere else, no reason to only use fishtest? e.g. http://data.lczero.org/files/match_pgns/1/

@vondele
Copy link
Member Author

vondele commented Apr 28, 2020

I believe there have been some valid concerns raised in this thread, enough so that we should consider alternatives. I have now built a new book with a very different approach based on these comments. I'll again do some testing on fishtest later. The major concerns I have seen raised are:

  • balance between closed and open lines (e.g. closedpos.epd vs noob_3moves.epd)
  • need for short lines (2moves, noob_3moves)
  • need for long lines (8moves)
  • presence of particular openings like french advance, KID, etc (8moves)
  • absence of 'strange/rare openings' (2moves, noob_*)
  • Elo resolution

To address this, I made a book based on the frequency of FENs in games played at lichess (restricted to Elo > 1800, TC > 60). I retained the 200k most frequent FENs out of >8M games. (see official-stockfish/books#9)

This have the following advantages:

  • lines closed and open are balanced, reflecting human choice
  • short lines are present (e.g. startpos is the most frequent position)
  • long lines are present (i.e. popular deep lines are played relatively often).
  • has all named openings
  • 'strange/rare' openings are absent or a very small fraction (e.g. no grob in the top 200'000)
  • Elo resolution needs to be measured on fishtest.

Of course, the choice of the initial database will somewhat influence the resulting FENs, but I think that's more or less secondary.

Edit: the Elo testing yielded the following:

SF11 -> master (STC)
 ELO: 11.89 +-1.6 (95%) LOS: 100.0%
Total: 60000 W: 13791 L: 11738 D: 34471
Ptnml(0-2): 763, 6016, 14647, 7553, 1021 
https://tests.stockfishchess.org/tests/view/5ea7e0a953a4548a0348ecb1

SF11 -> master (LTC)
ELO: 14.61 +-1.6 (95%) LOS: 100.0%
Total: 40000 W: 7331 L: 5650 D: 27019
Ptnml(0-2): 181, 3045, 11987, 4486, 301 
https://tests.stockfishchess.org/tests/view/5ea7e0d653a4548a0348ecb5

SF10 -> SF11 (STC)
ELO: 43.35 +-1.7 (95%) LOS: 100.0%
Total: 60000 W: 17566 L: 10119 D: 32315
Ptnml(0-2): 531, 4776, 13411, 9279, 2003 
https://tests.stockfishchess.org/tests/view/5ea7e0c353a4548a0348ecb3

So the Elo spread is somewhat small on this book.

Anybody has a pointer to another pgn database of high quality games (e.g. master level, ICCF), but it will need to be > 2M games to be suitable to build a book, I would say.

Alternatively, a subset of high quality leela training games (again >2M) ?

@xoto10
Copy link
Contributor

xoto10 commented Apr 29, 2020

noob_2/3moves books were selected to avoid drawish openings IIRC, but the closedpos book just turned out to have a good Elo spread without any explicit drawish checks. (I wonder why?)

Do you have any info on how many of these popularpos lines qualify as closed under the closedpos tests? Maybe we need a not-drawish test if we want to consider these popular and more open lines?

@vdbergh
Copy link
Contributor

vdbergh commented Apr 29, 2020

noob_2/3moves books were selected to avoid drawish openings IIRC,

No they were not. In fact their draw ratio is rather high. Note: for the same Elo you want the highest possible draw ratio (= least amount of noise). It you want to lower the draw ratio convert every draw into a win or loss using a coin.

@vondele
Copy link
Member Author

vondele commented Apr 30, 2020

I ran a second test on a book popularpos_lichess_v2.epd which was contructed retaining games from >2200 Elo players only. The result, however, is nearly identical:

 ELO: 43.41 +-1.7 (95%) LOS: 100.0%
Total: 59896 W: 16875 L: 9430 D: 33591
Ptnml(0-2): 492, 4789, 13408, 9300, 1959 
https://tests.stockfishchess.org/tests/view/5eab03cb09d25e8e5058169b

the noob_3moves book was not selected specifically to avoid drawish openings, but it might be a side effect of how the database has been constructed.

@noobpwnftw
Copy link
Contributor

My books were built from one simple rule: pick moves that are top N and not worse than a score threshold.
I find it interesting that the result converges with a book built with human games.

@vondele
Copy link
Member Author

vondele commented Apr 30, 2020

I did a quick analysis (depth 13) of the score of the book moves, and that highlights quite some difference between the 2 classes of books:
opening_book_score
basically, the human games, even in these 'popular positions' have a much broader range of scores, i.e. essentially won or lost. This improves only very little with Elo of the players. I think the main problem is that these human games are mostly very short TC (>60s, but typically 180s). So, if anybody has a clean database of long TC games between good players...

@vondele
Copy link
Member Author

vondele commented May 2, 2020

so average number nodes needed to reach depth 13:

book nodes
noob_3moves 81385
closedpos 123145
popularpos 113054
popularpos_v2 111785
popularpos_v3 115037

@noobpwnftw
Copy link
Contributor

Weird, so the theory is right, but the result went the opposite...

@dorzechowski
Copy link

dorzechowski commented May 3, 2020

Out of curiosity I checked depth 13 nodes in 2moves_v2 book. The book is relatively small (12k positions) so I analyzed whole book. The average is 134673 and histogram looks like this:
2moves_v2_depth13_nodes_histogram

Perft 5 nodes vs depth 13 nodes scatter plot looks like below. There is no correlation at all (R=0.14).
nodes_d13_p5_2moves

Position with max depth 13 nodes (385505):
rnbqkbnr/p1pp1ppp/1p2p3/8/3P4/4P3/PPP2PPP/RNBQKBNR w KQkq -

Position with min depth 13 nodes (28154):
rnbqkbnr/p1pp1ppp/4p3/1p6/5P2/2N5/PPPPP1PP/R1BQKBNR w KQkq -

All with latest SF (2 May 2020).

@noobpwnftw
Copy link
Contributor

It makes sense now, elo spread is related to the percentage of positions contained in the book may be reached by playing SF topN moves. This is why closedpos had a good spread but popularpos didn't.

@dorzechowski
Copy link

dorzechowski commented May 4, 2020

I'm not sure. For example book 2moves_v1 contained basically random sequences of moves and had the same spread as noob_3moves. We measured it end of December and results were as below. Looks like books constructed differently and even with vastly different RMS bias may give the same sensitivity.

book Elo spread draw ratio RMS
2moves_v1 44.50 0.513 73.85
noob_3moves 44.90 0.566 31.47
noob_2moves 40.75 0.562 33.02

@noobpwnftw
Copy link
Contributor

noobpwnftw commented May 4, 2020

Well as for 2moves there are just 2 moves, so pretty much anything not losing a pawn's worth is within topN, and it did remove some outright bad moves.

@dorzechowski
Copy link

dorzechowski commented May 4, 2020

I added noob_2moves to the table above. Both 2moves books have very little in common it seems.

Actually I want now to test hypothesis that positions with bigger depth 13 nodes are more complex. I'm going to sort 12k positions from 2moves_v2 by depth 13 nodes, split it in 3 equal parts and then use 1st and 3rd part as a new books to play 8000 games matches between SF11 and SF10. If it's true that bigger node count mean more complexity, then book made from 3rd part should give significantly bigger spread than the first one. It would be interesting to either confirm or debunk it. Unfortunately I have only a measly laptop, so it may take some time before I get back with the results.

@noobpwnftw
Copy link
Contributor

noobpwnftw commented May 4, 2020

The difference between my 2moves and 3moves book are just making one move that is not too bad and my scores are back propagated, but still I think coverage ratio among topN matters, spread of 2moves_v1 might because of higher RMS matters only for a few moves in but not more.

@vondele
Copy link
Member Author

vondele commented May 4, 2020

I have #W # L #D (White POV) for the noob_3moves from fishtest LTCs. Typically looks like:

  "rn1qkbnr/ppp2ppp/3p4/4pb2/2PP1P2/8/PP2P1PP/RNBQKBNR w KQkq -": [
    59,
    48,
    215
  ],
  "rnbqkb1r/pp1pppp1/2p2n1p/8/3P1P2/8/PPPBP1PP/RN1QKBNR w KQkq -": [
    38,
    27,
    186
  ],
  "rnbqkbnr/2pp1ppp/1p6/p3p3/8/3P4/PPPNPPPP/1RBQKBNR w Kkq -": [
    25,
    44,
    233
  ],
  "rn1qkb1r/pbpppppp/5n2/1p6/8/PP4P1/2PPPP1P/RNBQKBNR w KQkq -": [
    39,
    35,
    226
  ],

So, openings appear winnable from both sides. I don't directly see a pattern. @vdbergh do you think that this data be used to select good positions for a book ?

bookstats_noob_3moves.json.zip

@snicolet snicolet added the books label May 4, 2020
@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented May 5, 2020

A lot of 150K-350K eval yellows recently. Maybe check them on closedpos?
I am thinking its getting harder and harder to get 1 elo with a single patch.
As most of those should be around +0.5 to +1.3, I like the idea of a standardized decider.
Different environment + excellent spread scaling of book...how about at a bit higher LTC?
It feels wasteful to throw them away after having spent so many LTC games.
The higher the game count, the closer they are to +1. Well probably around 0.9, due to selection bias.

Also with too many tests + low success rate, eventually some will pass out of luck. With a closer examination of the best performers the harvesting will be safer.

Atm it seems to me that too many resources are used on an extreme amount of different versions on very low pass rate, and thus a higher confidence would be logical.

@noobpwnftw
Copy link
Contributor

noobpwnftw commented May 5, 2020

closedpos will not make them pass, the LTC bounds are very narrow, it is expected to take large number of games to resolve for patches fall within this elo diff range. This is the price to pay so that less patches pass by luck. Low success rate and too many similar tests cannot be solved by lowering the bar while I'm colorblind so that I cannot tell the difference between a yellow and a red SPRT test.

@dorzechowski
Copy link

@vondele I think we could calculate SNR of each book position by normalized Elo formula or just check z=(w-l)/sqrt(w+l) and get rid of positions with z close to zero as they don't give any signal. But it would be also good to get confirmation from @vdbergh of course.

@NKONSTANTAKIS
Copy link

NKONSTANTAKIS commented May 5, 2020

@noobpwnftw I want less patches to pass by luck, not more. Atm the pass rates are extremely low, but the amount of tested patches is huge, so inevitably the quality decreases & resources are wasted. For colorblind purposes the yellow can be regarded as red without lowering the elo bar but with an even higher amount of games. A higher spread will enable better performance.

closedpos had equal spread at STC but +2.7 at LTC, a very good indication.

So it might not make them pass as you say, but it can make them fail faster!

@noobpwnftw
Copy link
Contributor

noobpwnftw commented May 5, 2020

I hope so but with the large number of games their elo measurement is actually very accurate, they do fall around +0.5 range and they would still cost similar resources to conclude, and book probably won't change that.
In fact, if it does, then I see trouble.

@NKONSTANTAKIS
Copy link

Well at this point maybe even a +0.5 at worst is nice. Using millions of LTC games for little gain feels ineffective. What if without you? I also think that testing many versions of same patches with slight changes is bad practice. One might get lucky in the end, worth 0.5, but at a very high price.
The beast needs to be fed I guess...so why not to get our +0.5 in a smarter way?

Btw I like the system more than ever, but I think its very beneficial to keep evolving it, not only SF.

@noobpwnftw
Copy link
Contributor

For that then I think it is important to understand how to manipulate elo spread.

This is my scored list of all unique positions after 2 moves without any filtering:
https://www.chessdb.cn/downloads/2moves_scores.zip

I think I have calculated scores for any position up to 4 moves but the data is quite large.

@vondele
Copy link
Member Author

vondele commented May 5, 2020

@noobpwnftw could you make that scores data available for 3moves ? Either all if less than a few GB, or just for the positions in the noob_3moves book ? That will be interesting to correlate with ' z=(w-l)/sqrt(w+l)'

@vondele
Copy link
Member Author

vondele commented May 5, 2020

snr out
apart for a 'feature' near zero (not sure where this is coming from), the distribution of (w - l) / sqrt(w + l) is very Gaussian for the noob_3moves book. This could be because the limited statistics for each of the openings? Might nevertheless be interesting to try in split the positions in two sets.

@vondele
Copy link
Member Author

vondele commented May 5, 2020

So, I locally did a test, splitting the noob_3moves according to the abs( (w-l) / sqrt(w+l)) > 0.167 (roughly 1 sigma), and there is no measurable difference (60k games) between the low and high parts of the book. So I start suspecting the broad Gaussian is just the noise, and the feature near 0 is the signal.... this is using the results of 44M LTC fishtest games using the noob_3moves book.

@noobpwnftw
Copy link
Contributor

@vondele Full scores of positions after 3 moves: https://www.chessdb.cn/downloads/3moves_scores.zip

@vondele
Copy link
Member Author

vondele commented May 6, 2020

Interesting distribution of the scores of all positions after 3 moves...

3moves_scores

@noobpwnftw
Copy link
Contributor

The feature around -15 and 0 are probably caused by the way I calculate things, might actually be smooth but doesn't matter when you sample moves with a wider range.

@dorzechowski
Copy link

No difference in my tests between book created from positions with low or high node count on depth 13 (TC 10+0.1).
Low:
Score of Stockfish_11 vs Stockfish_10: 2296 - 1236 - 4468 [0.566] 8000
High:
Score of Stockfish_11 vs Stockfish_10: 2276 - 1276 - 4448 [0.562] 8000

@vondele vondele mentioned this issue May 10, 2020
@vondele
Copy link
Member Author

vondele commented May 11, 2020

so, with #2670 we have a first patch that resulted from the closedpos book. Let's call this a success :-)

I don't think we have particular evidence to change the default book, but I'm sure we now know that we still don't know quite a few things about opening books.

I'll thus close this issue, keeping noob_3moves the default book. The other books can be used as non-default books, either for experimenting or to create Elo gainers, but we'll test patches for non-regression against noob_3moves to gather experience with this setup, asserting that we prefer generic solutions rather than specialized ones.

@vondele vondele closed this as completed May 11, 2020
@xoto10
Copy link
Contributor

xoto10 commented May 11, 2020

See also: https://tests.stockfishchess.org/tests/view/5eb1e2dd2326444a3b6d33f9 #2662 :)
Although the stc was with noob_3moves, don't remember why I made those choices. Probably intended to use closedpos with the stc but forgot to set it, then made sure I did for the ltc.

@vondele
Copy link
Member Author

vondele commented May 11, 2020

OK, I overlooked that... should have been in the PR a little more clearly ;-). Extra credit for the book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests