Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] PPO runs with EnvRunner w/o old Policy API (also solves KL issues with PPORLModules). #39732

Merged

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Sep 18, 2023

Why are these changes needed?

By time sampling should be performed by the env.EnvRunner class, individually for different algorithms. In the same breath the policy should become obsolete.

Following the example of DreamerV3 this draft PR should develop a way to implement these changes into PPO.

Related issue number

Closes #39174 #39813

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…_weigths, get_weigths from EnvRunner such that EnvRunner can be used equivalently to RolloutWorker in Algorithm.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…w. _Episode needs some fixing for some algorithms as they need extra keys in the sample batch.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
@sven1977 sven1977 changed the title Solve kl issues with ppo rl module [RLlib] Solve KL issues with PPORLModules. Sep 20, 2023
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…get infos as lists.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…essing for episodes instead of SampleBatches.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…o 'MultiAgentRLModule'. Created cases for 'num_remote_workers() <=0', kept logic for 'ModelV2' logic. Removed global vars for 'Learner API' in 'PPO' training step. Test 'ppo_with_rl_module()' runs. Had to remove for this 'check_compute_single_action_from_input_dict()' as 'EnvRunner' has no policy.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
simonsays1980 and others added 9 commits October 6, 2023 17:12
…o PPO from SingleAgentEnvRunner. Implemented logic for torch and tf2. Test runs. Tuned example not yet.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ey error in postprocessing. Furthermore, modified weight synching to synch not too often.

Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
simonsays1980 and others added 6 commits October 23, 2023 10:07
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @simonsays1980!
Thanks for thie great PR. Should be all easily rolling downhill from here on :)

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@simonsays1980
Copy link
Collaborator Author

Happy for the contribution. I am excited about the new sampling API and how it will improve learning performance and user experience. Thanks for the great input @sven1977, @ArturNiederfahrenhorst and @kouroshHakha

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 41714dd into ray-project:master Oct 25, 2023
22 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RLlib] Issues with RLModules/Learner + evaluation workers and not using KL loss
2 participants