[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). #20571

sven1977 · 2021-11-19T17:08:12Z

Trainer sub-class for PPOTrainer and DDPPOTrainer (instead of build_trainer()).

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gjoliver

look totally awesome, love it.
just to double check, you didn't make any logic change with this PR right?

gjoliver · 2021-11-22T05:20:49Z

rllib/agents/ppo/ddppo.py

@@ -21,14 +21,16 @@
 import time

 import ray
-from ray.rllib.agents.ppo import ppo
+from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG as PPO_DEFAULT_CONFIG, \
+    PPOTrainer


this doesn't fit in the last line?

Not sure what you mean. The line is too long for the LINTer and needs to be split by a \

ah sorry, it looked really strange on my laptop. my bad.

gjoliver · 2021-11-22T05:22:32Z

rllib/agents/ppo/ddppo.py

@@ -41,8 +43,8 @@

 # Adds the following updates to the `PPOTrainer` config in
 # rllib/agents/ppo/ppo.py.
-DEFAULT_CONFIG = ppo.PPOTrainer.merge_trainer_configs(
-    ppo.DEFAULT_CONFIG,
+DEFAULT_CONFIG = PPOTrainer.merge_trainer_configs(


kinda feel like merge_trainer_configs() should be a util on Trainer class?
so all these agents would just do

Trainer.merge_trainer_configs( .... )

It is defined in Trainer (not PPOTrainer), but since PPOTrainer is-a Trainer, it works like this, too. But yeah, we should probably call it like Trainer.merge_trainer_configs().

gjoliver · 2021-11-22T05:28:07Z

rllib/agents/ppo/ddppo.py

+            raise ValueError("Only gloo, mpi, or nccl is supported for "
+                             "the backend of PyTorch distributed.")
+        # `num_gpus` must be 0/None, since all optimization happens on Workers.
+        if config["num_gpus"]:


I am just curious, if my eval worker runs on head node and needs gpu, which param do I use to configure it?

config: evaluation_config: num_gpus_per_worker: ...

gjoliver · 2021-11-22T05:33:38Z

rllib/agents/ppo/ddppo.py

+            .batch_across_shards()  # List[(grad_info, count)]
+            .for_each(RecordStats()))
+
+        train_op = train_op.for_each(update_worker_global_vars)


minor minor question, maybe just chain this call right above?

Didn't actually touch this code, just moved it into the method. Not sure about too much chaining. Honestly, we should probably chain rather less than more as it makes the already complex execution plans even more confusing.

ok, I can't argue against it 😆

gjoliver · 2021-11-22T05:42:21Z

rllib/agents/ppo/ppo.py

+            config["rollout_fragment_length"]
+        if config["train_batch_size"] > 0 and \
+                config["train_batch_size"] % calculated_min_rollout_size != 0:
+            new_rollout_fragment_length = config["train_batch_size"] // (


high level question, should we do this for all on-policy agents?
train batch size is such a mysterious thing for us.
logics like this living in a specific agent makes things less consistent.
I know some agents don't work like this, but for the ones do, should we put this in a util function, so they can all do this at the beginning of a run.

definitely not belong to this PR, just curious what you think.

I completely agree with you that train batch sizes should be handled not by individual trainers, but in a more generic way, as we discussed offline. This just bubbled up here b/c I had to move that block of code. But yeah, we should make a separate PR in which we implement "guaranteed batch sizes" for all algos.

…ner_sub_class_ppo

sven1977 · 2021-11-22T10:07:42Z

Hey @gjoliver , correct, no logic change. Just sometimes have to make a few "adjustments" to keep stuff backward compatible wrt build_trainer() vs sub-classing Trainer.

gjoliver

Thanks, thanks!

…ner_sub_class_ppo

sven1977 added 2 commits November 19, 2021 16:06

wip.

c9712cb

wip.

7c8b369

sven1977 requested a review from gjoliver November 19, 2021 17:08

sven1977 assigned gjoliver Nov 19, 2021

gjoliver reviewed Nov 22, 2021

View reviewed changes

sven1977 added 2 commits November 22, 2021 11:05

wip.

d969777

Merge branch 'master' of https://github.com/ray-project/ray into trai…

07c1e5a

…ner_sub_class_ppo

wip.

2d67a00

gjoliver approved these changes Nov 23, 2021

View reviewed changes

sven1977 added 4 commits November 23, 2021 09:28

Merge branch 'master' of https://github.com/ray-project/ray into trai…

21302e9

…ner_sub_class_ppo

wip.

3625298

wip.

16d4dba

Merge branch 'master' of https://github.com/ray-project/ray into trai…

9bebf91

…ner_sub_class_ppo

sven1977 merged commit 49cd7ea into ray-project:master Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). #20571

[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). #20571

sven1977 commented Nov 19, 2021 •

edited

Loading

gjoliver left a comment

gjoliver Nov 22, 2021

sven1977 Nov 22, 2021

gjoliver Nov 23, 2021

gjoliver Nov 22, 2021

sven1977 Nov 22, 2021

sven1977 Nov 22, 2021 •

edited

Loading

gjoliver Nov 22, 2021

sven1977 Nov 22, 2021

gjoliver Nov 23, 2021

gjoliver Nov 22, 2021

sven1977 Nov 22, 2021

gjoliver Nov 23, 2021

gjoliver Nov 22, 2021

sven1977 Nov 22, 2021

sven1977 commented Nov 22, 2021

gjoliver left a comment

[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()). #20571

[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()). #20571

Conversation

sven1977 commented Nov 19, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

gjoliver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 Nov 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Nov 22, 2021

gjoliver left a comment

Choose a reason for hiding this comment

[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). #20571

[RLlib] Trainer sub-class PPO/DDPPO (instead of `build_trainer()`). #20571

sven1977 commented Nov 19, 2021 •

edited

Loading

sven1977 Nov 22, 2021 •

edited

Loading