New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Fix issues with action masking examples #38095
[RLlib] Fix issues with action masking examples #38095
Conversation
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
@sven1977 Need you tensorflow help here.
|
Let me debug the new example script on tf2. ... |
Root cause: The gradients in tf2 - for some reason - for the last action layer (the one producing the 100 action logits) are being computed as NaNs already in the very first training update step. This then leads to a broken NN and the final crash because one of the action labels is 100 (out of bounds, should be 0 to 99). |
Confirmed that torch is working fine ... |
This is fixed now (new example script is running fine on my end). Waiting for tests to pass, then we can merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix @ArturNiederfahrenhorst !
Signed-off-by: NripeshN <nn2012@hw.ac.uk>
@sven1977, @ArturNiederfahrenhorst : Appreciate your work on that topic! Installing latest nightly build to run the new example file, I stumbled across two topics:
Verification outside of the ray cascade shows interesting behavior of torch leading to the issue
|
Signed-off-by: harborn <gangsheng.wu@intel.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Signed-off-by: Victor <vctr.y.m@example.com>
The issue reported earlier does not occur anymore in ray 2.9.1. Tested with Torch 2.0.1 and the examples/action_masking file in the following version: https://github.com/ray-project/ray/blob/84d17cad835665631c2f68f6fb332e2973028fab/rllib/examples/action_masking.py |
Why are these changes needed?
Fixes #37707 and a set of other issues.
The action masking example was broken but this was not picked up by CI because the error did not propagate.
The tf1 code path of the example was broken.
The example needed to be liften into RL Modules API.