-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] PPO runs with EnvRunner w/o old Policy API (also solves KL issues with PPORLModules). #39732
[RLlib] PPO runs with EnvRunner w/o old Policy API (also solves KL issues with PPORLModules). #39732
Conversation
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…_weigths, get_weigths from EnvRunner such that EnvRunner can be used equivalently to RolloutWorker in Algorithm. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…w. _Episode needs some fixing for some algorithms as they need extra keys in the sample batch. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…get infos as lists. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…essing for episodes instead of SampleBatches. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…o 'MultiAgentRLModule'. Created cases for 'num_remote_workers() <=0', kept logic for 'ModelV2' logic. Removed global vars for 'Learner API' in 'PPO' training step. Test 'ppo_with_rl_module()' runs. Had to remove for this 'check_compute_single_action_from_input_dict()' as 'EnvRunner' has no policy. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…o PPO from SingleAgentEnvRunner. Implemented logic for torch and tf2. Test runs. Tuned example not yet. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ey error in postprocessing. Furthermore, modified weight synching to synch not too often. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…om/simonsays1980/ray into solve-kl-issues-with-ppo-rl-module
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-kl-issues-with-ppo-rl-module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @simonsays1980!
Thanks for thie great PR. Should be all easily rolling downhill from here on :)
Happy for the contribution. I am excited about the new sampling API and how it will improve learning performance and user experience. Thanks for the great input @sven1977, @ArturNiederfahrenhorst and @kouroshHakha |
…e-kl-issues-with-ppo-rl-module
Why are these changes needed?
By time sampling should be performed by the
env.EnvRunner
class, individually for different algorithms. In the same breath the policy should become obsolete.Following the example of
DreamerV3
this draft PR should develop a way to implement these changes intoPPO
.Related issue number
Closes #39174 #39813
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.