How do I run SAD inside this repo? #10

ravihammond · 2022-07-13T07:41:35Z

I have trained my own agent using this repo, and would like to evaluate it when it plays a game with the SAD agent.
The download.sh script downloads a pre-trained SAD model, and I would like to run this model using evaluate(), however, I'm getting a number of errors. How do I run the SAD model in this repo?

Thanks in advance!

hengyuan-hu · 2022-07-13T15:01:18Z

Do you mind sharing how you call evaluate and what is the error message?

hengyuan-hu · 2022-07-13T15:05:32Z

Eh, the SAD agents were trained in the old repo and the compatibility is not great. Maybe you can get some hints from this file https://github.com/facebookresearch/off-belief-learning/blob/main/pyhanabi/legacy_agent.py

The easiest way would be to train a new SAD agent, or any agent in that matter, using this new repo. The new implementation is more efficient and the performance of all agents should go up.

hengyuan-hu · 2022-07-13T15:29:53Z

Oh wait I am confused, you are saying this script downloads SAD agents https://github.com/facebookresearch/off-belief-learning/blob/main/models/download.sh? It should only download OBL agents.

ravihammond · 2022-07-14T05:36:55Z

Do you mind sharing how you call evaluate and what is the error message?

Oops, my memory has mistaken me, please ignore what I said before. I actually downloaded the SAD model from the SAD repo, and copied it across to this repo. I also copied across some extra files: eval_model.py, eval.py, r2d2.py, and some other required code, and attempted to run it. I'm getting the following error at this line:

terminate called after throwing an instance of 'torch::jit::JITException'
terminate called recursively
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/app/pyhanabi/r2d2_sad.py", line 69, in act
        self, priv_s: torch.Tensor, hid: Dict[str, torch.Tensor]
    ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
        assert priv_s.dim() == 2, "dim should be 2, [batch, dim], get %d" % priv_s.dim()
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
        priv_s = priv_s.unsqueeze(0)
RuntimeError: AssertionError: dim should be 2, [batch, dim], get 1

Aborted (core dumped)

Eh, the SAD agents were trained in the old repo and the compatibility is not great. Maybe you can get some hints from this file https://github.com/facebookresearch/off-belief-learning/blob/main/pyhanabi/legacy_agent.py

The easiest way would be to train a new SAD agent, or any agent in that matter, using this new repo. The new implementation is more efficient and the performance of all agents should go up.

Thanks for this tip, I'm training a SAD agent in the OBL directory right now. This is the training script I wrote:

python selfplay.py \
       --save_dir exps/sad_1 \
       --num_thread 24 \
       --num_game_per_thread 80 \
       --method iql \
       --sad 1 \
       --lr 6.25e-05 \
       --eps 1.5e-05 \
       --grad_clip 5 \
       --gamma 0.999 \
       --seed 1 \
       --burn_in_frames 10000 \
       --replay_buffer_size 100000 \
       --batchsize 128 \
       --epoch_len 1000 \
       --num_epoch 2000 \
       --num_player 2 \
       --net lstm \
       --num_lstm_layer 1 \
       --multi_step 3 \
       --train_device cuda:0 \
       --act_device cuda:1,cuda:2

I wrote this script by combining iql.sh from this repo, and dev.sh from the SAD repo. Does this look correct to you? There were a few parameters I wasn't completely sure about: --net, --num_lstm_layer, and --multi_step. Could you let me know if there is anything I'm missing to train SAD properly?

hengyuan-hu · 2022-07-14T14:41:31Z

--net -> depends on what you want to use the policy for. If you want to do learned belief search or something, you need to publ-lstm. Otherwise it does not matter. However, the network does have a impact on the learned policy when using DQN family methods.

--num_lstm_layer, we used 2, which is the default value if you do not specify it.

--multi_step, does not matter, 3,2,1 all works fine.

--method if you want "slightly" better training, you should use vdn instead of iql. But honestly the difference is not that big in this repo as long as the program runs fast and generate abundant data.

ravihammond · 2022-07-18T08:06:01Z

Thanks for the advice! I'm successfully training SAD agents in the repo now.

As a follow-up, I assume this repo is also capable of training an OP agent? When I compare dev.sh and op_2player.sh from the SAD repo, the only difference seems to be the --shuffle_color flag set to 1, so it looks like setting this flag is the only requirement to train OP. Also, for OP, is it still fine to use either 'iql' or 'vdn' for the --method flag?

hengyuan-hu · 2022-07-18T15:25:54Z

Yes the flag to turn on OP is shuffle_color, as it forces the learned policy to be color invariant. VDN may always be (marginally) better than IQL.

ravihammond · 2022-07-20T09:26:39Z

Hey Hengyuan I've got OP working now, but I've just realise that my SAD agents aren't training as expected.

I've trained 5 SAD agents using the script described above, with seeds 1, 2, 3, 4, 5.

I've run them all together in cross-play, and I'm getting the following scores:


	SAD 1	SAD 2	SAD 3	SAD 4	SAD 5
SAD 1	23.8	20.8	20.3	22.0	17.9
SAD 2	20.8	23.8	20.9	19.6	14.6
SAD 3	20.3	20.9	23.8	21.8	19.5
SAD 4	22.0	19.6	21.8	23.8	20.2
SAD 5	17.9	14.6	19.5	20.2	23.8

My aim was to train a set of SAD agents that don't perform well together in cross-play, but it seems like these agents are performing well. My first thought is that maybe I'm accidentally training with AUX, but I'm not sure.

Would you be able to provide me with some insight? What might be causing these SAD agents to all be performing well together?

hengyuan-hu · 2022-07-20T18:50:47Z

A couple of things that we have previously observed. Overall this type of “diversity” cannot be reliably reproduced. It depends on seeds, implementation details or even running speed of the algorithm (an implementation that generates more data faster is more likely to produce more consistent policies for example.) I don’t think research should depend on this type of diversity. Some more specific comments: 1. Your self play results seem to be low. 23.8 is not high in the current Hanabi world. With this implementation, we can easily produce self play agents that get 24.4. Higher self play agents with Q-learning (not necessarily applicable to policy gradient methods) have sharper policies and they will be more likely to fail. 2. You can vary different network architectures to add more bias towards difference policies.

…

On Wed, Jul 20, 2022 at 5:26 AM Ravi Hammond ***@***.***> wrote: Hey Hengyuan I've got OP working now, but I've just realise that my SAD agents aren't training as expected. I've trained 5 SAD agents using the script described above <#10 (comment)>, with seeds 1, 2, 3, 4, 5. I've run them all together in cross-play, and I'm getting the following scores: SAD 1 SAD 2 SAD 3 SAD 4 SAD 5 SAD 1 23.8 20.8 20.3 22.0 17.9 SAD 2 20.8 23.8 20.9 19.6 14.6 SAD 3 20.3 20.9 23.8 21.8 19.5 SAD 4 22.0 19.6 21.8 23.8 20.2 SAD 5 17.9 14.6 19.5 20.2 23.8 My aim was to train a set of SAD agents that *don't* perform well together in cross-play, but it seems like these agents are performing well. My first thought is that maybe I'm accidentally training with AUX, but I'm not sure. Would you be able to provide me with some insight? What might be causing these SAD agents to all be performing well together? — Reply to this email directly, view it on GitHub <#10 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABECKZIPSW4ZLI3C4KHPBW3VU7A5TANCNFSM53NWSVJA> . You are receiving this because you commented.Message ID: ***@***.***>

ravihammond · 2022-09-25T03:52:03Z

Thanks for your advice @hengyuan-hu!

I've figured out how to train the SAD agents now :)

ravihammond closed this as completed Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I run SAD inside this repo? #10

How do I run SAD inside this repo? #10

ravihammond commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

ravihammond commented Jul 14, 2022 •

edited

hengyuan-hu commented Jul 14, 2022

ravihammond commented Jul 18, 2022 •

edited

hengyuan-hu commented Jul 18, 2022

ravihammond commented Jul 20, 2022

hengyuan-hu commented Jul 20, 2022 via email

ravihammond commented Sep 25, 2022

How do I run SAD inside this repo? #10

How do I run SAD inside this repo? #10

Comments

ravihammond commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

hengyuan-hu commented Jul 13, 2022

ravihammond commented Jul 14, 2022 • edited

hengyuan-hu commented Jul 14, 2022

ravihammond commented Jul 18, 2022 • edited

hengyuan-hu commented Jul 18, 2022

ravihammond commented Jul 20, 2022

hengyuan-hu commented Jul 20, 2022 via email

ravihammond commented Sep 25, 2022

ravihammond commented Jul 14, 2022 •

edited

ravihammond commented Jul 18, 2022 •

edited