Playing with other agents and questions regarding the code/experiments #7

51616 · 2020-03-19T18:43:46Z

I really appreaciate your work on this area which I am also interested in.

My question is about the code modification to play with other agents (having two agents playing the same game) as far as i know this code uson
I want to do some experiment similar to other-play where I can choose the partners in the environment.

I've been playing around the code but can't seem to figure out where to change this. Any suggestion on where I should start?

Edit0: Also, I am curious about how did you do hyperparameters tuning since it quite expensive to run the 5k-epoch training to evaluate each hyperparameters. What heuristics you used for this?

Edit1: My guess is to change the ThreadLoop part which is in thread_loop.h to handle multiple actors? Is this correct? Is there a more optimal way to approach this?

Edit2: I can't find the auxiliary loss in the code. Is it provided in this version? And what is the difference between pyhanabi/tools/dev.sh and pyhanabi/tools/sad_2player.sh? Do they produce the same experiment?

Edit3: If two experiements were run on different machines (with different hardware speed), will this affects the experiment results? Because from what I see from the code, it does asynchonus training while actors are doing self-play. If the training takes longer it will adding more observation to the replay_buffer using the old agent.

The text was updated successfully, but these errors were encountered:

hengyuan-hu · 2020-03-23T18:14:43Z

You can modify this line to take 2 different weight files:

hanabi_SAD/pyhanabi/tools/eval_model.py

Line 23 in 6e4ed59

weight_files = [args.weight for _ in range(args.num_player)]

You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.

Q0: We started with the hyper-parameters in provided in the R2D2 paper and that seemed to work quite well. The most important h-param for us was the data generation/consumption ratio, i.e. act_speed/train_speed. Other parameters seemed less important.

Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.

Q2: dev.sh is for fast debugging. It uses less compute and starts training faster. I guess you meant sad_2player vs vdn_2player? sad_2player takes the extra greedy action input, which was the main idea of our SAD paper.

Q3: Yes it may have an impact. But as long as the train_speed/act_speed is close to 1, the performance should be fine. You can try with different num_of_threads, different num_of_games_per_thread, and multiple act_device. Those parameters have strong effect on simulation (act) speed.

51616 · 2020-03-25T11:54:35Z

Thanks for the informative answer!

You might need to hack some settings for loading models trained with/without greedy action input but that should be relatively easy. Our internal repo for the other-play paper has a lot of modifications & tools for running matches between various agents but that is not ready for release yet. Sorry about that.

I guess my questions are a little bit unclear. What you did in the code is training the VDN agents are trained using the shared parameters, but is it possible to use different sets of parameters to train VDN?

Q1: Answered above. You dont have to change anything in the threadloop. The IQL threadloop takes N agents as default and should be good for your task.

What will happen if I use VDN or SAD agents with IQL threadloop? Also, I can't find the auxiliary loss in the code. Is it provided here? @hengyuan-hu

51616 changed the title ~~Playing with other agents~~ Playing with other agents and questions regarding the code/experiments Mar 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Playing with other agents and questions regarding the code/experiments #7

Playing with other agents and questions regarding the code/experiments #7

51616 commented Mar 19, 2020 •

edited

hengyuan-hu commented Mar 23, 2020

51616 commented Mar 25, 2020 •

edited

Playing with other agents and questions regarding the code/experiments #7

Playing with other agents and questions regarding the code/experiments #7

Comments

51616 commented Mar 19, 2020 • edited

hengyuan-hu commented Mar 23, 2020

51616 commented Mar 25, 2020 • edited

51616 commented Mar 19, 2020 •

edited

51616 commented Mar 25, 2020 •

edited