-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] HER does not sample very last state of episode as achieved goal #666
Comments
Hello,
the index EDIT: this may be related to #400 |
Hello and thank you for your answer! The episode transitions are tuples containing the last observation, action, reward, new observation and the done boolean.
The goal selection strategy final indeed chooses the last transition but then uses the old observation (index 0) which corresponds to the second to last observation of the episode.
In my opinion it would be closer to what the paper suggests to use the third element of the transition (corresponding to the new and thus very last observation) in case of the goal selection strategy final. This could be relevant if the last observation of the episode is important. For example if the episode ends with a contact between objects that only appears in the very last observation. The other goal selection strategies do not use the very last observation either but this is probably not a big problem as long as the episodes are long enough. In issue #578, however, it is reported that the implementation of HER does not work properly if the episode length is 1. I think this is precisely because of the behavior described above. Thank you for your help! |
Thanks for the clarification. For the rest, I need to think more about it. |
It seems to me that when HER samples an achieved goal from the replay buffer it never samples the very last state of the episode. Is this intended?
As a consequence, the sampling strategy "final" seems to always sample the second to last state. My understanding is that the paper suggests using the very last state and in some environments this could actually make a difference.
https://github.com/nicoguertler/stable-baselines/blob/4fada47f1b71b7548c935b1f01c6fb04199b3d54/stable_baselines/her/replay_buffer.py#L99-L125
The text was updated successfully, but these errors were encountered: