Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How do I run SAD inside this repo? #10

Closed
ravihammond opened this issue Jul 13, 2022 · 10 comments
Closed

How do I run SAD inside this repo? #10

ravihammond opened this issue Jul 13, 2022 · 10 comments

Comments

@ravihammond
Copy link

I have trained my own agent using this repo, and would like to evaluate it when it plays a game with the SAD agent.
The download.sh script downloads a pre-trained SAD model, and I would like to run this model using evaluate(), however, I'm getting a number of errors. How do I run the SAD model in this repo?

Thanks in advance!

@hengyuan-hu
Copy link
Contributor

Do you mind sharing how you call evaluate and what is the error message?

@hengyuan-hu
Copy link
Contributor

Eh, the SAD agents were trained in the old repo and the compatibility is not great. Maybe you can get some hints from this file https://github.com/facebookresearch/off-belief-learning/blob/main/pyhanabi/legacy_agent.py

The easiest way would be to train a new SAD agent, or any agent in that matter, using this new repo. The new implementation is more efficient and the performance of all agents should go up.

@hengyuan-hu
Copy link
Contributor

Oh wait I am confused, you are saying this script downloads SAD agents https://github.com/facebookresearch/off-belief-learning/blob/main/models/download.sh? It should only download OBL agents.

@ravihammond
Copy link
Author

ravihammond commented Jul 14, 2022

Do you mind sharing how you call evaluate and what is the error message?

Oops, my memory has mistaken me, please ignore what I said before. I actually downloaded the SAD model from the SAD repo, and copied it across to this repo. I also copied across some extra files: eval_model.py, eval.py, r2d2.py, and some other required code, and attempted to run it. I'm getting the following error at this line:

terminate called after throwing an instance of 'torch::jit::JITException'
terminate called recursively
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/app/pyhanabi/r2d2_sad.py", line 69, in act
        self, priv_s: torch.Tensor, hid: Dict[str, torch.Tensor]
    ) -> Tuple[torch.Tensor, Dict[str, torch.Tensor]]:
        assert priv_s.dim() == 2, "dim should be 2, [batch, dim], get %d" % priv_s.dim()
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
        priv_s = priv_s.unsqueeze(0)
RuntimeError: AssertionError: dim should be 2, [batch, dim], get 1

Aborted (core dumped)

Eh, the SAD agents were trained in the old repo and the compatibility is not great. Maybe you can get some hints from this file https://github.com/facebookresearch/off-belief-learning/blob/main/pyhanabi/legacy_agent.py

The easiest way would be to train a new SAD agent, or any agent in that matter, using this new repo. The new implementation is more efficient and the performance of all agents should go up.

Thanks for this tip, I'm training a SAD agent in the OBL directory right now. This is the training script I wrote:

python selfplay.py \
       --save_dir exps/sad_1 \
       --num_thread 24 \
       --num_game_per_thread 80 \
       --method iql \
       --sad 1 \
       --lr 6.25e-05 \
       --eps 1.5e-05 \
       --grad_clip 5 \
       --gamma 0.999 \
       --seed 1 \
       --burn_in_frames 10000 \
       --replay_buffer_size 100000 \
       --batchsize 128 \
       --epoch_len 1000 \
       --num_epoch 2000 \
       --num_player 2 \
       --net lstm \
       --num_lstm_layer 1 \
       --multi_step 3 \
       --train_device cuda:0 \
       --act_device cuda:1,cuda:2 

I wrote this script by combining iql.sh from this repo, and dev.sh from the SAD repo. Does this look correct to you? There were a few parameters I wasn't completely sure about: --net, --num_lstm_layer, and --multi_step. Could you let me know if there is anything I'm missing to train SAD properly?

@hengyuan-hu
Copy link
Contributor

--net -> depends on what you want to use the policy for. If you want to do learned belief search or something, you need to publ-lstm. Otherwise it does not matter. However, the network does have a impact on the learned policy when using DQN family methods.

--num_lstm_layer, we used 2, which is the default value if you do not specify it.

--multi_step, does not matter, 3,2,1 all works fine.

--method if you want "slightly" better training, you should use vdn instead of iql. But honestly the difference is not that big in this repo as long as the program runs fast and generate abundant data.

@ravihammond
Copy link
Author

ravihammond commented Jul 18, 2022

Thanks for the advice! I'm successfully training SAD agents in the repo now.

As a follow-up, I assume this repo is also capable of training an OP agent? When I compare dev.sh and op_2player.sh from the SAD repo, the only difference seems to be the --shuffle_color flag set to 1, so it looks like setting this flag is the only requirement to train OP. Also, for OP, is it still fine to use either 'iql' or 'vdn' for the --method flag?

@hengyuan-hu
Copy link
Contributor

Yes the flag to turn on OP is shuffle_color, as it forces the learned policy to be color invariant. VDN may always be (marginally) better than IQL.

@ravihammond
Copy link
Author

Hey Hengyuan I've got OP working now, but I've just realise that my SAD agents aren't training as expected.

I've trained 5 SAD agents using the script described above, with seeds 1, 2, 3, 4, 5.

I've run them all together in cross-play, and I'm getting the following scores:

SAD 1 SAD 2 SAD 3 SAD 4 SAD 5
SAD 1 23.8 20.8 20.3 22.0 17.9
SAD 2 20.8 23.8 20.9 19.6 14.6
SAD 3 20.3 20.9 23.8 21.8 19.5
SAD 4 22.0 19.6 21.8 23.8 20.2
SAD 5 17.9 14.6 19.5 20.2 23.8

My aim was to train a set of SAD agents that don't perform well together in cross-play, but it seems like these agents are performing well. My first thought is that maybe I'm accidentally training with AUX, but I'm not sure.

Would you be able to provide me with some insight? What might be causing these SAD agents to all be performing well together?

@hengyuan-hu
Copy link
Contributor

hengyuan-hu commented Jul 20, 2022 via email

@ravihammond
Copy link
Author

Thanks for your advice @hengyuan-hu!

I've figured out how to train the SAD agents now :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants