-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix error in PPO
with use_kl_loss=False
#45031
[RLlib] Fix error in PPO
with use_kl_loss=False
#45031
Conversation
…kl_loss=True' and otherwise throws an error. Furthermore, removed the 'kl_sampled_values' from 'PPOLeanrer.additional_update_for_module' b/c it is unused. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
) | ||
|
||
# Update KL coefficient. | ||
if config.use_kl_loss: | ||
assert sampled_kl_values, "Sampled KL values are empty." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -22,6 +22,7 @@ | |||
lr=0.0003, | |||
num_sgd_iter=6, | |||
vf_loss_coeff=0.01, | |||
use_kl_loss=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True by default, but yes, setting this explicitly here is always better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch @simonsays1980 !
LGTM.
Why are these changes needed?
Using no KL loss in PPO throws an
AssertionError
b/c thesampled_kl_values
are empty. This PR fixes this error by moving the assertion only into the case whereuse_kl_loss=True
. Furthermore, the sampled KL values were passed to thePPOLearner
that does not use them and instead passed them over to theTorchLearner
that has no use of them either. This argument was removed fromPPOLearner.additional_update_for_module
.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.