Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many trained NNs ? #82

Open
tissatussa opened this issue Nov 15, 2022 · 6 comments
Open

many trained NNs ? #82

tissatussa opened this issue Nov 15, 2022 · 6 comments

Comments

@tissatussa
Copy link

it seems you train NN files almost every day, i regularly receive an email message about it. Now i wonder : are these nets getting better ? Or are you just testing ? I managed to compile your latest source and it gave me v1.09, but it's a rather small file of 1 Mb and i guess the NN is included .. are those NNs that small ? Most (SF) NNs are about 20 Mb or even 45 Mb .. can you explain ?

Winter runs fine in CuteChess and it's strong ! Thanks.
[ i'm on Xubuntu 22.04 ]

@rosenthj
Copy link
Owner

rosenthj commented Nov 16, 2022

The nets should be getting better, but it is non-trivial. Every net that makes it to master passed statistical tests on OpenBench ( http://chess.grantnet.us/index/ ) meaning it wins in a head to head against the prior master net. Unfortunately, this doesn't mean it is necessarily stronger, but based on my tests, this is mostly true. Note that when I did the switch, I think Winter got significantly worse in regular (not Chess960) play. So version v0.9.9 was something like 60 Elo stronger than v0.9.10 in regular chess, in my rough tests. I have started an OpenBench regression test to compare v1.09 with v1.0 which can be found here: http://chess.grantnet.us/test/29557/

There are so many newer nets as I essentially reworked my entire evaluation approach. The nets are no longer based on the same features or data as in Winter versions up to v0.9.9.

In this new approach, initially Winter relied on some 350k CCRL Fischer random games to train a network. This is an extremely small dataset relative to what top engines like Stockfish use (SF has in the billions of positions in its training dataset). Therefore, I cannot train networks to the size of those Stockfish uses, without massive overfitting issues. I am generating new double-Fischerrandom games based on Winter self play to train stronger networks. In the long-term I would like to drop the CCRL games and train nets exclusively on the self play games.

Hopefully double Fischer random games will result in more diverse middle and endgame positions, requiring less data to generalize. Furthermore, I like having the guarantee that Winter cannot memorize openings, so I intend to remove games from the standard starting position from the training set.

Winter's net architecture is changing from version to version, but at the moment the architecture is 772x224x3. The 772 inputs consist of the bitboards (64 binary values) for the 6 piecetypes for each respective side with 4 additional inputs encoding the castling rights for each side. The 3 outputs correspond to the probabilities of the side to move winning, drawing, and losing respectively. If I recall correctly and In contrast, the Stockfish architecture has an input dimension which is more than 100 times larger, multiple intermediate layers, but only a single output value.

@tissatussa
Copy link
Author

Thanks for this explanation.

Once i read that self-play games are not optimal to train an NN .. why not play against a set of other engines ? Their style may differ, that could be an advantage !?

And what about the classic eval : does Winter still use this and consider both evals to decide a best move ? How ?

I like having the guarantee that Winter cannot memorize openings,..

But when an engine prefers to answer 1.e4 with 1...e6 (the French Defence), you could train an NN with French games of the known best replies like Exchange variation, Advance variation, Tarrasch variation, Winawer etc. , all having their structures and ideas .. what's your opinion on this reasoning ?
Btw. I recall engines are not allowed to use an opening book at tournaments. Indeed, when letting engines play in CuteChess i always disable the books, because i think it's not fair.

@tissatussa
Copy link
Author

Another question which comes to my mind, being an outsider (i'm not programming any engine myself yet) : what about consulting several (small) NNs and decide a best move by combining / comparing the outcomes, and then maybe also judging the classic eval .. did you ever experiment with that ? I always wonder : how can we distinguish their strength / style ? Do tests exist for NNs to see their best move (in max depth / time) in a FEN position ? Maybe my thoughts are too wild for you :-)

@rosenthj
Copy link
Owner

Once i read that self-play games are not optimal to train an NN ..

I am not familiar with this.

why not play against a set of other engines ? Their style may differ, that could be an advantage !?

It might be better. There are a few games against Cheng in older parts of the dataset. There are a couple reasons why I am not generally doing it. TCEC has originality constraints and I have some further idealistic views on them. One of the bigger ones is purely practical: If I rely on games against other engines, I have to keep finding and adding other engines around Winters level.

Relying on self play against prior Winter versions means I have a steady pool of opponents and can do some light regression testing. At the moment, v1.05 is actually doing reasonably well against v1.09. That one has a bit of a different network architecture, which may be why it is performing better than some of the later versions in this direct matchup.

And what about the classic eval : does Winter still use this and consider both evals to decide a best move ? How ?

Not at this time. I would like to try adding some features as network inputs, as was the case in previous Winter releases. One of the main issues to solve there is that I want to allow other people to train Winter nets, without the complicated reliance on Winter binaries that existed previously.

But when an engine prefers to answer 1.e4 with 1...e6 (the French Defence), you could train an NN with French games of the known best replies like Exchange variation, Advance variation, Tarrasch variation, Winawer etc. , all having their structures and ideas .. what's your opinion on this reasoning ?

The hope is that structures that occur from the regular start position are not unique to the regular start position. On the other hand DFRC games definitely have some structures which are not common in regular chess. This means Winter is probably a bit weaker overall in regular chess than it could be, but in some positions it will be better.

Btw. I recall engines are not allowed to use an opening book at tournaments. Indeed, when letting engines play in CuteChess i always disable the books, because i think it's not fair.

That is generally correct. Large neural networks like those in Leela can to some degree memorize openings as positions are repeatedly encountered in games and thus end up in the training dataset. Such "opening books" cannot be removed without altering the training data, as I am doing in Winter.

what about consulting several (small) NNs and decide a best move by combining / comparing the outcomes

What advantage would that have over a single larger network?

and then maybe also judging the classic eval ..

Yes, I think that is an avenue that may be worth exploring.

did you ever experiment with that ?

It can be argued I did that with the mixture models I used in Winter versions before I switched to Neural networks in 2019.

I always wonder : how can we distinguish their strength / style ? Do tests exist for NNs to see their best move (in max depth / time) in a FEN position ? Maybe my thoughts are too wild for you :-)

I am not an expert on strength / style. There are more knowledgeable people over at the computer chess club.

I am not exactly sure what you are asking regarding the "test for NNs to see their best move". There are tons of tests datasets of positions where engines are tested on finding the best move in some amount of time.

@tissatussa
Copy link
Author

test for NNs to see their best move

i mean, i can imagine a (web)interface to input a FEN and choose eg. 3 NNs to see which best move they show : in many positions several moves are OK, their eval just slightly differs, but their style can be different : agressive tending to sacrifice material, or defensive tending to closed positions, etc. It might be interesting and fun to see how different engines / NNs approach a certain (puzzle) position. I worked with Test Suits .epd with their 'bm' and 'am' solutions.

@tissatussa
Copy link
Author

Another idea : when several moves have almost equal eval (eg. using MPV) then choose the one which results in the "most harmonious" position .. this may be vague, because how to determine harmony ? Nevermind ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants