-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
many trained NNs ? #82
Comments
The nets should be getting better, but it is non-trivial. Every net that makes it to master passed statistical tests on OpenBench ( http://chess.grantnet.us/index/ ) meaning it wins in a head to head against the prior master net. Unfortunately, this doesn't mean it is necessarily stronger, but based on my tests, this is mostly true. Note that when I did the switch, I think Winter got significantly worse in regular (not Chess960) play. So version v0.9.9 was something like 60 Elo stronger than v0.9.10 in regular chess, in my rough tests. I have started an OpenBench regression test to compare v1.09 with v1.0 which can be found here: http://chess.grantnet.us/test/29557/ There are so many newer nets as I essentially reworked my entire evaluation approach. The nets are no longer based on the same features or data as in Winter versions up to v0.9.9. In this new approach, initially Winter relied on some 350k CCRL Fischer random games to train a network. This is an extremely small dataset relative to what top engines like Stockfish use (SF has in the billions of positions in its training dataset). Therefore, I cannot train networks to the size of those Stockfish uses, without massive overfitting issues. I am generating new double-Fischerrandom games based on Winter self play to train stronger networks. In the long-term I would like to drop the CCRL games and train nets exclusively on the self play games. Hopefully double Fischer random games will result in more diverse middle and endgame positions, requiring less data to generalize. Furthermore, I like having the guarantee that Winter cannot memorize openings, so I intend to remove games from the standard starting position from the training set. Winter's net architecture is changing from version to version, but at the moment the architecture is 772x224x3. The 772 inputs consist of the bitboards (64 binary values) for the 6 piecetypes for each respective side with 4 additional inputs encoding the castling rights for each side. The 3 outputs correspond to the probabilities of the side to move winning, drawing, and losing respectively. If I recall correctly and In contrast, the Stockfish architecture has an input dimension which is more than 100 times larger, multiple intermediate layers, but only a single output value. |
Thanks for this explanation. Once i read that self-play games are not optimal to train an NN .. why not play against a set of other engines ? Their style may differ, that could be an advantage !? And what about the classic eval : does Winter still use this and consider both evals to decide a best move ? How ?
But when an engine prefers to answer 1.e4 with 1...e6 (the French Defence), you could train an NN with French games of the known best replies like Exchange variation, Advance variation, Tarrasch variation, Winawer etc. , all having their structures and ideas .. what's your opinion on this reasoning ? |
Another question which comes to my mind, being an outsider (i'm not programming any engine myself yet) : what about consulting several (small) NNs and decide a best move by combining / comparing the outcomes, and then maybe also judging the classic eval .. did you ever experiment with that ? I always wonder : how can we distinguish their strength / style ? Do tests exist for NNs to see their best move (in max depth / time) in a FEN position ? Maybe my thoughts are too wild for you :-) |
I am not familiar with this.
It might be better. There are a few games against Cheng in older parts of the dataset. There are a couple reasons why I am not generally doing it. TCEC has originality constraints and I have some further idealistic views on them. One of the bigger ones is purely practical: If I rely on games against other engines, I have to keep finding and adding other engines around Winters level. Relying on self play against prior Winter versions means I have a steady pool of opponents and can do some light regression testing. At the moment, v1.05 is actually doing reasonably well against v1.09. That one has a bit of a different network architecture, which may be why it is performing better than some of the later versions in this direct matchup.
Not at this time. I would like to try adding some features as network inputs, as was the case in previous Winter releases. One of the main issues to solve there is that I want to allow other people to train Winter nets, without the complicated reliance on Winter binaries that existed previously.
The hope is that structures that occur from the regular start position are not unique to the regular start position. On the other hand DFRC games definitely have some structures which are not common in regular chess. This means Winter is probably a bit weaker overall in regular chess than it could be, but in some positions it will be better.
That is generally correct. Large neural networks like those in Leela can to some degree memorize openings as positions are repeatedly encountered in games and thus end up in the training dataset. Such "opening books" cannot be removed without altering the training data, as I am doing in Winter.
What advantage would that have over a single larger network?
Yes, I think that is an avenue that may be worth exploring.
It can be argued I did that with the mixture models I used in Winter versions before I switched to Neural networks in 2019.
I am not an expert on strength / style. There are more knowledgeable people over at the computer chess club. I am not exactly sure what you are asking regarding the "test for NNs to see their best move". There are tons of tests datasets of positions where engines are tested on finding the best move in some amount of time. |
i mean, i can imagine a (web)interface to input a FEN and choose eg. 3 NNs to see which best move they show : in many positions several moves are OK, their eval just slightly differs, but their style can be different : agressive tending to sacrifice material, or defensive tending to closed positions, etc. It might be interesting and fun to see how different engines / NNs approach a certain (puzzle) position. I worked with Test Suits .epd with their 'bm' and 'am' solutions. |
Another idea : when several moves have almost equal eval (eg. using MPV) then choose the one which results in the "most harmonious" position .. this may be vague, because how to determine harmony ? Nevermind .. |
it seems you train NN files almost every day, i regularly receive an email message about it. Now i wonder : are these nets getting better ? Or are you just testing ? I managed to compile your latest source and it gave me v1.09, but it's a rather small file of 1 Mb and i guess the NN is included .. are those NNs that small ? Most (SF) NNs are about 20 Mb or even 45 Mb .. can you explain ?
Winter runs fine in CuteChess and it's strong ! Thanks.
[ i'm on Xubuntu 22.04 ]
The text was updated successfully, but these errors were encountered: