-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChessAlpha Zero development #10
Comments
Hi, guys. I refactored my APV_MCTS.py now it uses hash table the way you guys did. But I found a confusing that |
Hi @Zeta36
It's great!! Besides, making it in just one day!
Oh, it is similar to my case.
I agree your method of using another trusted judgement.
Unfortunately, I don't have free GPU time now... FYI: |
Hi @yhyu13 Thanks for the message!
Yes, that's right. It is unnecessary.
Oh, it is wonderful!
No, I haven't. |
Hello, @mokemokechicken and @yhyu13 . I've done today a new version of the Reversi Zero project. This time I adapted it to the game Connect4: https://github.com/Zeta36/connect4-alpha-zero @mokemokechicken I'm really in love with your implementation. You did it (and DeepMind thought it) in a way that I can apply it easily to any new environment I imagine. Moreover, Connect4 is a more easy game and I could train the model without GPU. Results are amazing. The model is able to learn to play well in only 3 generations in a couple of hours (just with a Intel i5 CPU). I insist you've done a GREAT work, friend. It's a pitty I don't have enough power machine to check if the chess version is able to learn to play well. |
@Zeta36 good job! Just one thinking: maybe Connect4 is easy enough so it even doesn't need an accurate NN? Just MCST will make AI strong enough? Besides, Connect4 is a very unbalanced game. Who plays first should always win, which makes AI training much easier. So how about Connect5 with specific opening pattern, such as "11D (瑞星): equal" in https://en.wikipedia.org/wiki/Renju_opening_pattern. That seems to be an more persuasive game for AlphaGoZero algo. |
Great! I was thinking about trying it with that game too!
That's interesting and a nice property. Just thinking: |
Hello, @mokemokechicken and @Yuhang.
As I promised I've done (just in one day, I had no more time) an adaptation of the reversi-zero project @mokemokechicken did into a chess version: https://github.com/Zeta36/chess-alpha-zero
The project is already functional (in the sense that it doesn't fail and the three workers do their job), but unfortunately I have no GPU (just an Intel i5 CPU) nor money to spend in a AWS server or similar.
So I just could check the self-play with the toy config "--type mini". Moreover, I had to descend self.simulation_num_per_move = 2 and self.parallel_search_num = 2.
In this way I was able to generate the 100 games needed for the optimitazion worker to start. The optimization process seemed to work perfeclty, and the model was able to reach a loss of ~0.6 after 1000 steps. I guess so that the model was able to overfit the 100 games of the former self-play.
Then I execute the evaluation process and it worked fine. The overfitted model was able to defeat the random original model of the beggining by 100% (causality??).
Finally I check the ASCII way to play against the best model. It worked as expected. To indicate our moves we have to use UCI notation: a1a2, b3b8, etc. More info here: https://chessprogramming.wikispaces.com/Algebraic+Chess+Notation
By the way, the model output is now of size 8128 (instead of the 64 of reversi and the 362 of Go), and it correspond to all possible legal UCI moves in a chess game. I generate these new labels in the config.py file.
I have to note you also that the board state (and the player turn) is traced by the fen chess notation: https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation
Here is for example the FEN for the starting position: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1 (w is for white to move).
I changed also a little bit the resign function. Chess is not like Go or Reversi where you always finish the game more or less in the same number of movements. In Chess the game can end in a lot of ways (checkmate, stalemate, etc.) and the self-play coul be more than 200 movements before reaching an ending position (normally in a draw). So I decided to cut-off the play after some player has more than 13 points of advantage (this score is computed as usual taking into account the value of the pieces: he queen is worth 10, roots 5.5, etc).
As you can imagine with my poor machine I could not fully test the project beyond these tiny tests of functionality. So I'd really appreciate if you could please take some free time of your GPU's for testing this implementation in a more serious way. Both you can of course be colaborators of the project if you wish.
Also I don't know if I commited some theoretical bugs after this adaptation to chess and I'd apretiate too any comments by your side in this sense.
Best regards!!
The text was updated successfully, but these errors were encountered: