Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about multiwalker #314

Closed
jkterry1 opened this issue Sep 14, 2021 · 7 comments
Closed

Questions about multiwalker #314

jkterry1 opened this issue Sep 14, 2021 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@jkterry1
Copy link

jkterry1 commented Sep 14, 2021

Hey, a quick question- how many timesteps did you train multiwalker for with MAD4PG a few months ago when you were able to learn it so effectively that the environment broke and you created an issue with us?

@jkterry1
Copy link
Author

The code you referenced is here, but unless I'm loosing my mind I'm not seeing where you defined timesteps in it: https://github.com/instadeepai/Mava/blob/develop/examples/petting_zoo/sisl/multiwalker/feedforward/decentralised/run_mad4pg.py

Also this is the GitHub issue I was referring to by the way: Farama-Foundation/PettingZoo#376

@arnupretorius arnupretorius added the question Further information is requested label Sep 15, 2021
@KaleabTessera
Copy link
Contributor

Hi @jkterry1 👋

It took ~1000000 (1e6) executor steps (logged as evaluator/ExecutorSteps) / ~1h to reach a return (logged as evaluator/RawEpisodeReturn or evaluator/MeanEpisodeReturn) of ~40. I think the prev PZ multiwalker env broke around then.

We used this config:

Config Param
batch_size 1024
critic_optimizer Adam
critic_optimizer_lr 0.0001
discount 0.99
executor_variable_update_period 1000
max_gradient_norm None
policy_optimizer Adam
policy_optimizer_lr 0.0001
shared_weights True
sigma 0.3
target_averaging False
target_update_period 100

Relating to timesteps, each system has a max_executor_steps (

max_executor_steps: int = None,
) which limits how many steps/timesteps our executors (more on executors here) run for. If this is none, we let the experiment run until it is manually cancelled. Not sure if this answers your question?

@jkterry1
Copy link
Author

jkterry1 commented Sep 16, 2021

That answers most of it, thank you. One follow up question: is 1 million executor steps steps for each individual agent in the env or a step for all agents at once?

@KaleabTessera
Copy link
Contributor

@jkterry1 It is a step for all agents at once (parallel) since our executors are a collection of agents.

@jkterry1
Copy link
Author

Thanks a ton!

@jkterry1
Copy link
Author

One additional question- what preprocessing did you do and why?

@jkterry1
Copy link
Author

jkterry1 commented Oct 1, 2021

And do you know how much average total reward you were getting when multiwalker broke?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants