[RLlib] PPO runs with EnvRunner w/o old Policy API (also solves KL issues with PPORLModules). #39732

simonsays1980 · 2023-09-18T16:59:25Z

Why are these changes needed?

By time sampling should be performed by the env.EnvRunner class, individually for different algorithms. In the same breath the policy should become obsolete.

Following the example of DreamerV3 this draft PR should develop a way to implement these changes into PPO.

Related issue number

Closes #39174 #39813

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…_weigths, get_weigths from EnvRunner such that EnvRunner can be used equivalently to RolloutWorker in Algorithm. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…w. _Episode needs some fixing for some algorithms as they need extra keys in the sample batch. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

rllib/algorithms/ppo/utils/env_runner.py

rllib/evaluation/worker_set.py

rllib/algorithms/ppo/tests/test_ppo_with_rl_module.py

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…get infos as lists. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…essing for episodes instead of SampleBatches. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

… Training works now. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…o 'MultiAgentRLModule'. Created cases for 'num_remote_workers() <=0', kept logic for 'ModelV2' logic. Removed global vars for 'Learner API' in 'PPO' training step. Test 'ppo_with_rl_module()' runs. Had to remove for this 'check_compute_single_action_from_input_dict()' as 'EnvRunner' has no policy. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…o PPO from SingleAgentEnvRunner. Implemented logic for torch and tf2. Test runs. Tuned example not yet. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

…ey error in postprocessing. Furthermore, modified weight synching to synch not too often. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…om/simonsays1980/ray into solve-kl-issues-with-ppo-rl-module

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…e-kl-issues-with-ppo-rl-module

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977

Awesome work @simonsays1980!
Thanks for thie great PR. Should be all easily rolling downhill from here on :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 · 2023-10-24T14:27:48Z

Happy for the contribution. I am excited about the new sampling API and how it will improve learning performance and user experience. Thanks for the great input @sven1977, @ArturNiederfahrenhorst and @kouroshHakha

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…e-kl-issues-with-ppo-rl-module

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 added 10 commits September 14, 2023 15:24

Added PPOEnvRunner and added implementations to PPO.

2762b77

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Run linter.

041d4c2

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Run linter.

0d0f0dc

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' into solve-kl-issues-with-ppo-rl-module

1192243

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Added EnvRunner to PPO. Added foreach_module to MARL and modified set…

dd679fe

…_weigths, get_weigths from EnvRunner such that EnvRunner can be used equivalently to RolloutWorker in Algorithm. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Had to run linter.

c0bb25d

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Moved 'foreach_module' to utils.rl_module.py'.

0a67532

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Merge branch 'master' into solve-kl-issues-with-ppo-rl-module

8fd93f9

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

EnvRunner needed some modifications as well as PPO. Sampling works no…

d5875b1

…w. _Episode needs some fixing for some algorithms as they need extra keys in the sample batch. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

Running linter.

77b0036

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>

simonsays1980 commented Sep 18, 2023

View reviewed changes

rllib/algorithms/ppo/utils/env_runner.py Outdated Show resolved Hide resolved

sven1977 changed the title ~~Solve kl issues with ppo rl module~~ [RLlib] Solve KL issues with PPORLModules. Sep 20, 2023