-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use eval_max_episode lengths in evalutation #1853
Conversation
This commit fixes a bug where when obtaining eval trajectories with sac, the max_eval_episode_length isn't actually used by the algorithm or anywhere.
@@ -45,7 +45,7 @@ class SAC(RLAlgorithm): | |||
environment that the agent is being trained in. Usually accessable | |||
by calling env.spec. | |||
max_episode_length (int): Max path length of the algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this description doesn't make much sense -- can you make it longer?
@@ -94,7 +94,7 @@ def __init__( | |||
replay_buffer, | |||
*, # Everything after this is numbers. | |||
max_episode_length, | |||
max_eval_path_length=None, | |||
max_eval_episode_length=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_eval_episode_length=None, | |
max_episode_length_eval=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both of these, please explain what happens if the environment has a max_episode_length
which is less than these numbers
@@ -94,7 +94,7 @@ def __init__( | |||
replay_buffer, | |||
*, # Everything after this is numbers. | |||
max_episode_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_episode_length, | |
max_episode_length_explore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i don't set it, does it use the max_episode_length
from the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it is default 1000. I had done this when we were only doing apples to apples comparisons with the paper. A less confusing behavior would be do use max_episode_length_explore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with that as long as max_episode_length_explore
defaults to env.spec.max_episode_length
In general I think people shouldn't interact with these unless they have very specific needs (e.g. your baseline comparisons), e.g. this wouldn't appear in an example.
This commit fixes a bug where when obtaining
eval trajectories with sac, the max_eval_episode_length
isn't actually used by the algorithm or anywhere.