Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()). #20571

Merged
merged 9 commits into from
Nov 23, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 19, 2021

Trainer sub-class for PPOTrainer and DDPPOTrainer (instead of build_trainer()).

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look totally awesome, love it.
just to double check, you didn't make any logic change with this PR right?

@@ -21,14 +21,16 @@
import time

import ray
from ray.rllib.agents.ppo import ppo
from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG as PPO_DEFAULT_CONFIG, \
PPOTrainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't fit in the last line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The line is too long for the LINTer and needs to be split by a \

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, it looked really strange on my laptop. my bad.

@@ -41,8 +43,8 @@

# Adds the following updates to the `PPOTrainer` config in
# rllib/agents/ppo/ppo.py.
DEFAULT_CONFIG = ppo.PPOTrainer.merge_trainer_configs(
ppo.DEFAULT_CONFIG,
DEFAULT_CONFIG = PPOTrainer.merge_trainer_configs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda feel like merge_trainer_configs() should be a util on Trainer class?
so all these agents would just do

Trainer.merge_trainer_configs(
  ....
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined in Trainer (not PPOTrainer), but since PPOTrainer is-a Trainer, it works like this, too. But yeah, we should probably call it like Trainer.merge_trainer_configs().

Copy link
Contributor Author

@sven1977 sven1977 Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

raise ValueError("Only gloo, mpi, or nccl is supported for "
"the backend of PyTorch distributed.")
# `num_gpus` must be 0/None, since all optimization happens on Workers.
if config["num_gpus"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious, if my eval worker runs on head node and needs gpu, which param do I use to configure it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config:
   evaluation_config:
      num_gpus_per_worker: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

.batch_across_shards() # List[(grad_info, count)]
.for_each(RecordStats()))

train_op = train_op.for_each(update_worker_global_vars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor minor question, maybe just chain this call right above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't actually touch this code, just moved it into the method. Not sure about too much chaining. Honestly, we should probably chain rather less than more as it makes the already complex execution plans even more confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I can't argue against it 😆

config["rollout_fragment_length"]
if config["train_batch_size"] > 0 and \
config["train_batch_size"] % calculated_min_rollout_size != 0:
new_rollout_fragment_length = config["train_batch_size"] // (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level question, should we do this for all on-policy agents?
train batch size is such a mysterious thing for us.
logics like this living in a specific agent makes things less consistent.
I know some agents don't work like this, but for the ones do, should we put this in a util function, so they can all do this at the beginning of a run.

definitely not belong to this PR, just curious what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you that train batch sizes should be handled not by individual trainers, but in a more generic way, as we discussed offline. This just bubbled up here b/c I had to move that block of code. But yeah, we should make a separate PR in which we implement "guaranteed batch sizes" for all algos.

@sven1977
Copy link
Contributor Author

Hey @gjoliver , correct, no logic change. Just sometimes have to make a few "adjustments" to keep stuff backward compatible wrt build_trainer() vs sub-classing Trainer.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, thanks!

@sven1977 sven1977 merged commit 49cd7ea into ray-project:master Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants