Libtorch C++ AlphaZero #319

christianjans · 2020-08-15T23:11:12Z

A Libtorch version of C++ AlphaZero.

Notes:

Unlike the TensorFlow Python AlphaZero, there is only the option of a residual network model so far.
Multiple devices are supported - there is a noticeable increase in speed with more GPUs.
An extra device managing option was added (the --explicit_learning flag) that can be used when multiple devices are available. It ensures that the GPU holding the model responsible for learning does not also take inference requests when it is supposed to be learning from the replay buffer. This seems to speed up the learning process when there are quite a few actors and evaluators.

Results:

Passes the vpnet_test.cc (requires the network to fit to batches of different sizes from Tic-Tac-Toe).
Learns to play Tic-Tac-Toe optimally.

Design decisions:

This does not write an initial graph definition to a file ('vpnet.pb') like the TensorFlow C++ AlphaZero, rather it simply writes a struct describing the model to the file. The Libtorch model can be initialized from this file. This was done to preserve the way alpha_zero.cc creates the model (calls a function to create the graph definition file, then uses this file to make the model).
Model layers: Compared TensorFlow and Torch documentation to ensure the hyperparameters of the layers were coherent.
Model loss functions: Tried to recreate the loss functions of the TensorFlow version. There was not as much documentation on the TensorFlow policy loss (softmax_cross_entropy_with_logits_v2), but the policy loss version implemented here should be similar and produces loss around the same order of magnitude.

Let me know if you have thoughts or suggestions on this!

lanctot · 2020-09-17T19:47:57Z

Hi @christianjans , this looks really great. Just a heads-up that we still plan to look into this.. we just got busy with vacation sand an influx of small PRs. Planning to get to this in the coming weeks.

christianjans · 2020-09-18T14:40:47Z

Hi @lanctot , no worries! There's no rush on my end.

lanctot · 2020-11-09T14:49:03Z

@christianjans regarding:

Model loss functions: Tried to recreate the loss functions of the TensorFlow version. There was not as much documentation on the TensorFlow policy loss (softmax_cross_entropy_with_logits_v2), but the policy loss version implemented here should be similar and produces loss around the same order of magnitude.

Seems rather important to be sure to get this right. Can you point us specifically to the place you're not sure about? Tagging a few PyTorch users I know: @michalsustr @ssokota

lanctot · 2020-11-09T14:53:41Z

Tagging @elkhrt because he's also taking a look with me.

ssokota · 2020-11-09T16:43:14Z

From a brief look, it seems okay to me re: consistency. One thing I did notice is that it looks like the code computes the cross entropy from policy rather than the logits, which is numerically unstable. Pytorch's cross entropy function doesn't accept soft labels but it has a stable log softmax function that you could use. IE do log_softmax(logits) rather than log(softmax(logits)). Alternatively, you could use KL divergence, which has the same gradient as cross entropy.

lanctot

Hi @christianjans, can you see the comments about the policy loss? Thanks.

christianjans · 2020-11-14T21:41:41Z

Hi @lanctot and @ssokota !

Sorry for the late reply, but thank you for the advice! I have been implementing the log softmax solution and am working on testing it (I have encountered a build error with Clang --- "unsupported option '-fopenmp' which I will try and investigate soon). Unfortunately, I finish up midterms this upcoming week and am working on completing a research project in the next few weeks. I will try and dedicate as much time as I can to this in the coming weeks, although it may be sparse. However, if it would be helpful, I can commit the untested implementation for review/revision.

With regards to the softmax_cross_entropy_with_logits_v2 function confusion, I have confirmed its implementation with a quick example, and this Torch version should follow the same implementation.

lanctot · 2020-11-15T09:34:22Z

@christianjans there is no rush, let's wait until you have time to properly test it.. at least such that it passes the same tests remarked in the original PR (i.e. passes vpnet_test and solves Tic-Tac-Toe).

lanctot · 2021-03-02T12:58:38Z

@christianjans , just checking in to see if you still plan to do the final bits necessary to pull this in? My understanding is that people are already using it, so it'd be great to merge it.

christianjans · 2021-03-05T16:43:12Z

Hi @lanctot, sorry yes I was actually planning to resume work on it this weekend. I apologize for the lack of updates, but I should have more time to dedicate to this now. Glad to hear people are using it!

christianjans · 2021-03-07T22:44:43Z

Okay great, it looks like it's still passing the VPNet test:

(venv) open_spiel $ ./build/algorithms/alpha_zero_torch/torch_vpnet_test 
TestModelCreation: resnet
TestModelLearnsSimple: resnet
states: 7
0: Losses(total: 3.334, policy: 2.062, value: 1.249, l2: 0.023)
1: Losses(total: 2.062, policy: 1.556, value: 0.482, l2: 0.023)
...
33: Losses(total: 0.102, policy: 0.054, value: 0.000, l2: 0.048)
34: Losses(total: 0.096, policy: 0.048, value: 0.000, l2: 0.048)
TestModelLearnsOptimal: resnet
states: 4520
0: Losses(total: 1.983, policy: 1.226, value: 0.735, l2: 0.023)
1: Losses(total: 1.771, policy: 1.076, value: 0.672, l2: 0.023)
...
65: Losses(total: 0.203, policy: 0.122, value: 0.024, l2: 0.057)
66: Losses(total: 0.171, policy: 0.093, value: 0.021, l2: 0.057)

and still learning in games like Tic Tac Toe:

./build/examples/alpha_zero_torch_example --nn_width=64 --nn_depth=2 --game=tic_tac_toe --replay_buffer_size=16384 --replay_buffer_reuse=4 --checkpoint_freq=25 --max_simulations=50 --actors=2 --evaluators=1 --max_steps=50

The error I was getting before regarding the '-fopenmp' flag and Clang had something to do with the Libtorch library that was downloaded. I believe this is because in open_spiel/scripts/install.sh, we download a version that just doesn't work with Apple's Clang:
https://github.com/deepmind/open_spiel/blob/a961d89273b6ae93e47755cfe3adb54ecb880966/open_spiel/scripts/install.sh#L132-L150
I had to download the macOS Libtorch from https://pytorch.org/get-started/locally/ in order for Libtorch to work on macOS (specifically https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.8.0.zip). I don't know how necessary it would be to change the Libtorch download based on the OS, but just wanted to make you aware.

Thanks again for your patience with this PR, let me know what else can be changed/added!

lanctot · 2021-03-08T13:20:51Z

Thanks @christianjans, this is looking great!

The error I was getting before regarding the '-fopenmp' flag and Clang had something to do with the Libtorch library that was downloaded. I believe this is because in open_spiel/scripts/install.sh, we download a version that just doesn't work with Apple's Clang:

https://github.com/deepmind/open_spiel/blob/a961d89273b6ae93e47755cfe3adb54ecb880966/open_spiel/scripts/install.sh#L132-L150

I had to download the macOS Libtorch from https://pytorch.org/get-started/locally/ in order for Libtorch to work on macOS (specifically https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.8.0.zip). I don't know how necessary it would be to change the Libtorch download based on the OS, but just wanted to make you aware.

Can you add another URL in the list of alternative URLs below in install.sh, something like "For C++ PyTorch AlphaZero on MacOS we recommend this URL: https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.8.0.zip"?

christianjans · 2021-03-09T01:31:25Z

Can you add another URL in the list of alternative URLs below in install.sh, something like "For C++ PyTorch AlphaZero on MacOS we recommend this URL: https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.8.0.zip"?

Good idea! It has been added.

selfsim · 2021-03-19T02:23:47Z

@christianjans Hello Christian,
could you provide some sample code/tips on how you were able to visualize the results of training with 'alpha_zero_torch_example' ?
It would be much appreciated.
-selfsim

christianjans · 2021-03-23T22:49:38Z

Hi @selfsim! Sorry I didn't see your comment earlier. But yes absolutely! There is a script (https://github.com/deepmind/open_spiel/blob/master/open_spiel/python/algorithms/alpha_zero/analysis.py) that you can run that will do all this analysis for you.

Essentially, when you run the Python AlphaZero or the Libtorch AlphaZero, an experiment directory will be created that contains the neural network checkpoints, logs, configs, and analysis data. You can specify this directory with the path flag in both implementations of AlphaZero: https://github.com/deepmind/open_spiel/blob/b07e3ba2838c32be8f598abc705b886e64d75101/open_spiel/python/examples/alpha_zero.py#L45 https://github.com/deepmind/open_spiel/blob/b07e3ba2838c32be8f598abc705b886e64d75101/open_spiel/examples/alpha_zero_torch_example.cc#L23

Once you finish training, you can view the results by running the analysis.py script, ensuring to pass in the experiment directory:

$ python3 open_spiel/python/algorithms/alpha_zero/analysis.py --path=<path-to-the-az-directory>

Let me know if anything was unclear or if you run into any problems. More information about this can be found here: https://github.com/deepmind/open_spiel/blob/master/docs/alpha_zero.md#analysis

selfsim · 2021-04-01T06:15:28Z

@christianjans Thanks for the reply, I will if everything goes well be looking at some cool graphs tonight; I am currently training an agent on a game implementation I created.

I have a few questions, any help is greatly appreciated. How does one load checkpoints to continue training at a later point? How could I set up a 1v1 game between a player (human or otherwise) vs the trained agent?

Thank you, and great work!

christianjans · 2021-04-04T17:10:41Z

Hi @selfsim, thanks, and great questions!

There is, unfortunately, no way to continue from checkpoints with the C++ AlphaZero version without having some sort of weird behaviour. For example, the checkpoints indeed save the model and optimizer which can be used to start from again, however, there is currently no functionality (such as a command-line flag) to provide a checkpoint to resume training from. Additionally, the data that is recorded for analysis will be overwritten (or at the least corrupted) with the resumption of training from a checkpoint. Resuming training from a checkpoint is definitely a good feature to add though. I think I should have some time to look into this.

I had tried doing this for the Python TensorFlow AlphaZero available in OpenSpiel. I was able to provide the functionality to resume training from a checkpoint, and the analysis data continued to record from that checkpoint. However, after continuing to train the model, there were two issues I could identify:

Some of the analysis graphs would show slightly erratic behaviour at the training step it continued from, and
The models from checkpoints after the starting checkpoint would often perform worse than the starting checkpoint model.

The first issue I think has to do with the Python TensorFlow AlphaZero perhaps not saving the state of the optimizer at each checkpoint. This is fixed in the C++ Libtorch AlphaZero as both the model and state of the optimizer are saved for each checkpoint. However, I was unable to identify a potential cause for the second issue (perhaps the two issues are related?). Anyways, here is some of my work that was done with regards to restarting from a checkpoint in the Python version: https://github.com/christianjans/open_spiel_resume_files.

And, like resuming training, there isn't a way to set up a 1v1 game with other players, but this is definitely another good thing to add. The mcts.py example shows how it sets up a TensorFlow AlphaZero checkpoint for a Python AlphaZero bot. I think this can be followed on the C++ side. I will also try and look into this too.

selfsim · 2021-04-07T18:32:32Z

Hi @christianjans , thanks for the insights: I will look into setting up a 1v1 game and extracting a bot from the checkpoints using the mcts.py as a guide. I will get back to you if I make any breakthroughs. I am quite busy currently but am hoping to be able to throw some more hours into this later on in the summer.

selfsim · 2021-04-08T19:56:14Z

@christianjans could you also maybe comment on the output of the analysis script? Particularly, what do the multiple lines mean for the MCTS solver graphs, is it the number of simulations? Also, for the outcome graph, I haven't been able to figure out exactly who player 1 and player 2 are. Thank you!

selfsim · 2021-04-09T01:20:57Z

@michalsustr I noticed in a different PR you mentioned that you have been working only with the LibTorch AZ implementation as opposed to TF. Have you played against a bot that you have trained?

christianjans · 2021-04-09T03:20:40Z

Hi @selfsim,

I am quite busy currently but am hoping to be able to throw some more hours into this later on in the summer.

No worries, I have been looking into the C++ side of things and I have a basic implementation in the works where you can play your Libtorch AlphaZero against a random player or MCTS player. See this branch on my personal OpenSpiel repository. After installing and building (with Libtorch on, of course), you can run it using the command:

$ ./build/examples/alpha_zero_torch_game_example --game=<game_to_play> --player1=random --player2=az_torch --az_path=<your_libtorch_az_experiment_directory>

I will try and continue updating this branch, but will also look into restarting training from checkpoints.

@christianjans could you also maybe comment on the output of the analysis script? Particularly, what do the multiple lines mean for the MCTS solver graphs, is it the number of simulations?

Yes, definitely. The MCTS solver graph is certainly confusing, but I can try and explain it here. During training, your AlphaZero player is constantly being evaluated by playing against MCTS players of varying "levels". The level of the opponent MCTS player is determined by the number of simulations they are allowed to run for each move they make. A higher-level MCTS opponent runs more simulations than a lower-level MCTS opponent. Notice that there are 7 levels in:

The number of simulations the MCTS opponent at level n is allowed to run for each move is c * 10^(n / 2) where c is the number of simulations the AZ player is allowed to do per move. So for example, say you run an AlphaZero experiment and give the argument --max_simulations=1000 and --eval_levels=7. Then, during evaluation, your AlphaZero player will play one MCTS player that performs 1000 * 10^(0 / 2) = 1000 simulations per move (level 0), another that performs 1000 * 10^(1 / 2) ~ 3162 simulations per move (level 1), and so on, with the highest level opponent (level 6) performing 1000 * 10^(6 / 2) = 1000000 simulations per move. All the while, your AlphaZero player is only performing 1000 simulations per move, no matter which level it is playing against.

So what we should expect to see is that we are able to beat lower-level MCTS opponents faster (earlier on in training) than higher-level MCTS opponents. This is what we see in the above graph example.

Also, for the outcome graph, I haven't been able to figure out exactly who player 1 and player 2 are.

The outcome graph shows the results of your AlphaZero player playing against itself in self-play. So player 1 and player 2 are both your AlphaZero algorithm, just from different perspectives.

Feel free to let me know if anything is still confusing or if there are any other questions.

selfsim · 2021-04-09T20:02:57Z

@christianjans Thanks a bunch for the clarification and patience with this barrage of questions.

I looked at your branch and it looks like a great start. Are you working on a human vs bot option currently? If not I could see what I could do. I'd like to test out some of my training runs.

Really appreciate the work!

christianjans · 2021-04-10T19:36:09Z

No worries, @selfsim, happy to help. And yeah that would be a great addition! I was not planning to add a human bot, but I think I could dedicate some time to it too. If you get around to it, feel free to submit a pull request if you like!

christianjans · 2021-04-10T21:47:48Z

There is now an initial implementation of a human bot on the branch.

lanctot · 2021-04-10T23:42:45Z

There is now an initial implementation of a human bot on the branch.

@christianjans, this is awesome! Can you submit a PR so we can add it to the master branch? Seems like a wonderful thing to have for people using your code.

BTW for the previous discussion about checkpoints: I think for AlphaZero to properly restore its full state you would need to store/retain all of the data in the current buffer in addition to everything else.

lanctot · 2021-04-10T23:45:51Z

And of course I would absolutely love to see any checkpointing code also merged into master too if you manage to get it to work! :)

@selfsim if you see any opportunity to improve the docs regarding the formats of the visualization tools, please flag them and/or submit PRs. This code has really been a wonderful addition to OpenSpiel and it'd be great if we can maintain it well!

christianjans · 2021-04-11T04:24:30Z

There is now an initial implementation of a human bot on the branch.

@christianjans, this is awesome! Can you submit a PR so we can add it to the master branch? Seems like a wonderful thing to have for people using your code.

@lanctot, yes for sure! There's still some cleaning up to do, but I should be able to submit a pull request soon. Were you thinking just the C++ human bot implementation? Or also the ability to play a trained Libtorch AlphaZero in a game?

BTW for the previous discussion about checkpoints: I think for AlphaZero to properly restore its full state you would need to store/retain all of the data in the current buffer in addition to everything else.

And oh right, good call 👍🏻. Will keep this in mind as well.

lanctot · 2021-04-11T08:32:26Z

Oh I didn't realize you had both, I was only referring to the ability to play a trained Libtorch agent in a game, but both would be great. No rush, of course!

christianjans added 4 commits August 15, 2020 15:09

Torch AlphaZero files.

67cf45c

Updated CMakeLists.

b42087c

Torch AlphaZero example.

921b017

Change includes causing errors in Libtorch test.

f59ef81

googlebot added the cla: yes label Aug 15, 2020

lanctot self-assigned this Oct 15, 2020

lanctot requested a review from tewalds October 15, 2020 20:14

lanctot requested a review from elkhrt October 26, 2020 14:59

lanctot self-requested a review November 13, 2020 20:08

lanctot requested changes Nov 13, 2020

View reviewed changes

christianjans added 2 commits March 7, 2021 14:56

Merge branch 'master' into alpha_zero_torch

e974b17

Compute policy loss from policy logits

10b63c9

christianjans requested a review from lanctot March 7, 2021 22:44

Add macOS Libtorch download URL

7ade0d5

lanctot approved these changes Mar 9, 2021

View reviewed changes

elkhrt approved these changes Mar 11, 2021

View reviewed changes

lanctot added the imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! label Mar 11, 2021

OpenSpiel merged commit 8cc5e69 into google-deepmind:master Mar 15, 2021

christianjans deleted the alpha_zero_torch branch May 19, 2021 03:11

egeozsoy mentioned this pull request Jan 27, 2022

AlphaZero for Backgammon #774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Libtorch C++ AlphaZero #319

Libtorch C++ AlphaZero #319

christianjans commented Aug 15, 2020

lanctot commented Sep 17, 2020

christianjans commented Sep 18, 2020

lanctot commented Nov 9, 2020

lanctot commented Nov 9, 2020

ssokota commented Nov 9, 2020

lanctot left a comment

christianjans commented Nov 14, 2020

lanctot commented Nov 15, 2020

lanctot commented Mar 2, 2021

christianjans commented Mar 5, 2021

christianjans commented Mar 7, 2021 •

edited

lanctot commented Mar 8, 2021

christianjans commented Mar 9, 2021

selfsim commented Mar 19, 2021

christianjans commented Mar 23, 2021 •

edited

selfsim commented Apr 1, 2021 •

edited

christianjans commented Apr 4, 2021 •

edited

selfsim commented Apr 7, 2021

selfsim commented Apr 8, 2021

selfsim commented Apr 9, 2021

christianjans commented Apr 9, 2021 •

edited

selfsim commented Apr 9, 2021 •

edited

christianjans commented Apr 10, 2021 •

edited

christianjans commented Apr 10, 2021

lanctot commented Apr 10, 2021

lanctot commented Apr 10, 2021

christianjans commented Apr 11, 2021

lanctot commented Apr 11, 2021

Libtorch C++ AlphaZero #319

Libtorch C++ AlphaZero #319

Conversation

christianjans commented Aug 15, 2020

lanctot commented Sep 17, 2020

christianjans commented Sep 18, 2020

lanctot commented Nov 9, 2020

lanctot commented Nov 9, 2020

ssokota commented Nov 9, 2020

lanctot left a comment

Choose a reason for hiding this comment

christianjans commented Nov 14, 2020

lanctot commented Nov 15, 2020

lanctot commented Mar 2, 2021

christianjans commented Mar 5, 2021

christianjans commented Mar 7, 2021 • edited

lanctot commented Mar 8, 2021

christianjans commented Mar 9, 2021

selfsim commented Mar 19, 2021

christianjans commented Mar 23, 2021 • edited

selfsim commented Apr 1, 2021 • edited

christianjans commented Apr 4, 2021 • edited

selfsim commented Apr 7, 2021

selfsim commented Apr 8, 2021

selfsim commented Apr 9, 2021

christianjans commented Apr 9, 2021 • edited

selfsim commented Apr 9, 2021 • edited

christianjans commented Apr 10, 2021 • edited

christianjans commented Apr 10, 2021

lanctot commented Apr 10, 2021

lanctot commented Apr 10, 2021

christianjans commented Apr 11, 2021

lanctot commented Apr 11, 2021

christianjans commented Mar 7, 2021 •

edited

christianjans commented Mar 23, 2021 •

edited

selfsim commented Apr 1, 2021 •

edited

christianjans commented Apr 4, 2021 •

edited

christianjans commented Apr 9, 2021 •

edited

selfsim commented Apr 9, 2021 •

edited

christianjans commented Apr 10, 2021 •

edited