[RLlib] DD-PPO training iteration fn #23906

sven1977 · 2022-04-14T08:36:25Z

DD-PPO training iteration fn implementation:

Add Pendulum learning test for this algo to CI
Atari comparative benchmarks (vs execution_plan version) pending ...

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…o_training_itr

avnishn · 2022-04-15T18:19:35Z

rllib/agents/ppo/ddppo.py

+        sample_and_update_results = asynchronous_parallel_requests(
+            remote_requests_in_flight=self.remote_requests_in_flight,
+            actors=self.workers.remote_workers(),
+            ray_wait_timeout_s=1000.0,  # 0.0


Ah, sorry, was just trying some stuff. Basically, this makes it synchronous :) Will revert. ...

in a future commit?

Huh? Ok, now it's fixed ... Forgot to push.

avnishn · 2022-04-15T18:20:21Z

rllib/tuned_examples/ppo/pendulum-ddppo.yaml

+    env: Pendulum-v1
+    run: DDPPO
+    stop:
+        episode_reward_mean: -300


Which part?

reward: It's able to get to -300.

timesteps: Yeah, it does need up to 1M sometimes. DD-PPO is not a very stable algo, it seems. Especially on cont. action tasks. But even on Atari I have yet to find a good choice of hyperparams.

Oh for some reason I read this as CartPole and not Pendulum, my b

Haha, yeah -300 would be pretty bad for CartPole :)

sven1977 · 2022-04-18T10:32:36Z

rllib/BUILD

@@ -249,6 +249,16 @@ py_test(
    args = ["--yaml-dir=tuned_examples/ppo"]
 )

+py_test(


This task is now working properly. Due to proper hyperparam tuning.

awesome! How difficult did you find it to tune hparams?

It was pretty hard, actually.
It's good to start with just one worker and using the exact same hparams as the respective PPO version. Then increase num_workers and at the same time carefully adjust:

rollout_fragment_length num_envs_per_worker sgd_minibatch_size num_sgd_iter

in short, anything that affects (per-worker) batch-size, and the time each worker spends on a decentralized update.

…o_training_itr

avnishn

This looks p much good to me.

I think that we can find good ATARI hparams, but we'll probably need some more logging infos (for stddev and entropy in the case of PPO) and then we'll be able to get a sufficiently working DDPPO agent.

avnishn · 2022-04-18T18:55:11Z

rllib/BUILD

@@ -249,6 +249,16 @@ py_test(
    args = ["--yaml-dir=tuned_examples/ppo"]
 )

+py_test(


awesome! How difficult did you find it to tune hparams?

avnishn · 2022-04-18T20:59:56Z

rllib/agents/ppo/ddppo.py

+        sample_and_update_results = asynchronous_parallel_requests(
+            remote_requests_in_flight=self.remote_requests_in_flight,
+            actors=self.workers.remote_workers(),
+            ray_wait_timeout_s=1000.0,  # 0.0


in a future commit?

avnishn · 2022-04-18T21:01:39Z

rllib/tuned_examples/ppo/pendulum-ddppo.yaml

+    env: Pendulum-v1
+    run: DDPPO
+    stop:
+        episode_reward_mean: -300


Oh for some reason I read this as CartPole and not Pendulum, my b

…o_training_itr

This reverts commit eb54236.

The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does. We'll need to fix the test then re-merge Reverts #23906

This reverts commit 0ddbce6.

…mentation. (#24035)

…on implementation. (#24035)" This reverts commit a337fd9.

…on implementation. (#24035)" (#24103) This reverts commit a337fd9.

sven1977 added 5 commits April 12, 2022 07:56

wip

c0bb1ea

wip

57ea08e

Merge branch 'master' of https://github.com/ray-project/ray into ddpp…

69f6923

…o_training_itr

wip

0f90647

wip

acbfa8e

sven1977 requested review from gjoliver, avnishn, ArturNiederfahrenhorst and smorad as code owners April 14, 2022 08:36

sven1977 added 4 commits April 15, 2022 14:44

wip

376607b

wip

4c7b470

Merge branch 'master' of https://github.com/ray-project/ray into ddpp…

35cf668

…o_training_itr

wip

4b77066

avnishn reviewed Apr 15, 2022

View reviewed changes

sven1977 commented Apr 18, 2022

View reviewed changes

sven1977 added 2 commits April 18, 2022 12:36

wip

5c205b7

Merge branch 'master' of https://github.com/ray-project/ray into ddpp…

cef13cb

…o_training_itr

avnishn approved these changes Apr 18, 2022

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into ddpp…

1630ce1

…o_training_itr

sven1977 merged commit eb54236 into ray-project:master Apr 19, 2022

avnishn added a commit that referenced this pull request Apr 19, 2022

Revert "[RLlib] DD-PPO training iteration fn (#23906)"

6cccab0

This reverts commit eb54236.

avnishn mentioned this pull request Apr 19, 2022

Revert "[RLlib] DD-PPO training iteration fn" #24030

Merged

avnishn mentioned this pull request Apr 20, 2022

Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. #24035

Merged

6 tasks

sven1977 added a commit that referenced this pull request Apr 20, 2022

Revert "Revert "[RLlib] DD-PPO training iteration fn (#23906)" (#24030)"

2d120f7

This reverts commit 0ddbce6.

sven1977 pushed a commit that referenced this pull request Apr 21, 2022

Revert revert #23906 [RLlib] DD-PPO training iteration function imple…

a337fd9

…mentation. (#24035)

krfricke added a commit that referenced this pull request Apr 22, 2022

Revert "Revert revert #23906 [RLlib] DD-PPO training iteration functi…

663f4dd

…on implementation. (#24035)" This reverts commit a337fd9.

krfricke added a commit that referenced this pull request Apr 22, 2022

Revert "Revert revert #23906 [RLlib] DD-PPO training iteration functi…

9f7170e

…on implementation. (#24035)" (#24103) This reverts commit a337fd9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] DD-PPO training iteration fn #23906

[RLlib] DD-PPO training iteration fn #23906

sven1977 commented Apr 14, 2022 •

edited

avnishn Apr 15, 2022

sven1977 Apr 18, 2022

sven1977 Apr 18, 2022

avnishn Apr 18, 2022

sven1977 Apr 19, 2022

avnishn Apr 15, 2022

sven1977 Apr 18, 2022

avnishn Apr 18, 2022

sven1977 Apr 19, 2022

sven1977 Apr 18, 2022

avnishn Apr 18, 2022

sven1977 Apr 19, 2022

avnishn left a comment

avnishn Apr 18, 2022

avnishn Apr 18, 2022

avnishn Apr 18, 2022

[RLlib] DD-PPO training iteration fn #23906

[RLlib] DD-PPO training iteration fn #23906

Conversation

sven1977 commented Apr 14, 2022 • edited

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sven1977 commented Apr 14, 2022 •

edited