How do I run SAD inside this repo? #10
Comments
Do you mind sharing how you call evaluate and what is the error message? |
Eh, the SAD agents were trained in the old repo and the compatibility is not great. Maybe you can get some hints from this file https://github.com/facebookresearch/off-belief-learning/blob/main/pyhanabi/legacy_agent.py The easiest way would be to train a new SAD agent, or any agent in that matter, using this new repo. The new implementation is more efficient and the performance of all agents should go up. |
Oh wait I am confused, you are saying this script downloads SAD agents https://github.com/facebookresearch/off-belief-learning/blob/main/models/download.sh? It should only download OBL agents. |
Oops, my memory has mistaken me, please ignore what I said before. I actually downloaded the SAD model from the SAD repo, and copied it across to this repo. I also copied across some extra files: eval_model.py, eval.py, r2d2.py, and some other required code, and attempted to run it. I'm getting the following error at this line:
Thanks for this tip, I'm training a SAD agent in the OBL directory right now. This is the training script I wrote: python selfplay.py \
--save_dir exps/sad_1 \
--num_thread 24 \
--num_game_per_thread 80 \
--method iql \
--sad 1 \
--lr 6.25e-05 \
--eps 1.5e-05 \
--grad_clip 5 \
--gamma 0.999 \
--seed 1 \
--burn_in_frames 10000 \
--replay_buffer_size 100000 \
--batchsize 128 \
--epoch_len 1000 \
--num_epoch 2000 \
--num_player 2 \
--net lstm \
--num_lstm_layer 1 \
--multi_step 3 \
--train_device cuda:0 \
--act_device cuda:1,cuda:2 I wrote this script by combining iql.sh from this repo, and dev.sh from the SAD repo. Does this look correct to you? There were a few parameters I wasn't completely sure about: |
|
Thanks for the advice! I'm successfully training SAD agents in the repo now. As a follow-up, I assume this repo is also capable of training an OP agent? When I compare dev.sh and op_2player.sh from the SAD repo, the only difference seems to be the |
Yes the flag to turn on OP is shuffle_color, as it forces the learned policy to be color invariant. VDN may always be (marginally) better than IQL. |
Hey Hengyuan I've got OP working now, but I've just realise that my SAD agents aren't training as expected. I've trained 5 SAD agents using the script described above, with seeds 1, 2, 3, 4, 5. I've run them all together in cross-play, and I'm getting the following scores:
My aim was to train a set of SAD agents that don't perform well together in cross-play, but it seems like these agents are performing well. My first thought is that maybe I'm accidentally training with AUX, but I'm not sure. Would you be able to provide me with some insight? What might be causing these SAD agents to all be performing well together? |
A couple of things that we have previously observed.
Overall this type of “diversity” cannot be reliably reproduced. It depends
on seeds, implementation details or even running speed of the algorithm (an
implementation that generates more data faster is more likely to produce
more consistent policies for example.) I don’t think research should depend
on this type of diversity.
Some more specific comments:
1. Your self play results seem to be low. 23.8 is not high in the current
Hanabi world. With this implementation, we can easily produce self play
agents that get 24.4. Higher self play agents with Q-learning (not
necessarily applicable to policy gradient methods) have sharper policies
and they will be more likely to fail.
2. You can vary different network architectures to add more bias towards
difference policies.
…On Wed, Jul 20, 2022 at 5:26 AM Ravi Hammond ***@***.***> wrote:
Hey Hengyuan I've got OP working now, but I've just realise that my SAD
agents aren't training as expected.
I've trained 5 SAD agents using the script described above
<#10 (comment)>,
with seeds 1, 2, 3, 4, 5.
I've run them all together in cross-play, and I'm getting the following
scores:
SAD 1 SAD 2 SAD 3 SAD 4 SAD 5
SAD 1 23.8 20.8 20.3 22.0 17.9
SAD 2 20.8 23.8 20.9 19.6 14.6
SAD 3 20.3 20.9 23.8 21.8 19.5
SAD 4 22.0 19.6 21.8 23.8 20.2
SAD 5 17.9 14.6 19.5 20.2 23.8
My aim was to train a set of SAD agents that *don't* perform well
together in cross-play, but it seems like these agents are performing well.
My first thought is that maybe I'm accidentally training with AUX, but I'm
not sure.
Would you be able to provide me with some insight? What might be causing
these SAD agents to all be performing well together?
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABECKZIPSW4ZLI3C4KHPBW3VU7A5TANCNFSM53NWSVJA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for your advice @hengyuan-hu! I've figured out how to train the SAD agents now :) |
I have trained my own agent using this repo, and would like to evaluate it when it plays a game with the SAD agent.
The download.sh script downloads a pre-trained SAD model, and I would like to run this model using evaluate(), however, I'm getting a number of errors. How do I run the SAD model in this repo?
Thanks in advance!
The text was updated successfully, but these errors were encountered: