New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Fix LSTM regression on truncated sequences and add regression test #2898
Conversation
Test FAILed. |
Test FAILed. |
Test PASSed. |
@@ -0,0 +1,179 @@ | |||
"""Stateless variant of the CartPole gym environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stateless..? Might be the wrong word to use here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially observed?
Test FAILed. |
What do these changes do?
#2700 regressed LSTM performance by erroneously zeroing out the initial state. This breaks training with truncated sequence lengths (e.g., the tuned pong example).
Incidentally, I also fixed an issue where Pong-ram doesn't pick up the right preprocessor by default.
Testing
Before this fix, the pong a3c example is completely broken. I checked that learning does happen after this fix.
Also added a stateless cartpole env test thanks to @richard4912 , and verified that that test fails before this fix.