New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport #1975, #1908, and #1905; Fix #2028 #2029
Conversation
It looks like you have some serious problems with your branch. Does this need to be backported to 2020.06? |
yeah my mistake I had accidentally targeted master instead of |
d506339
to
c438663
Compare
Codecov Report
@@ Coverage Diff @@
## release-2020.06 #2029 +/- ##
================================================
Coverage 90.67% 90.68%
================================================
Files 213 213
Lines 11210 11213 +3
Branches 1346 1348 +2
================================================
+ Hits 10165 10168 +3
- Misses 791 792 +1
+ Partials 254 253 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you set the max path lengths to 200 (or do that in a different PR + backport?)
@@ -84,10 +84,11 @@ def mtsac_metaworld_mt50(ctxt=None, seed=1, use_gpu=False, _gpu=0): | |||
qf1=qf1, | |||
qf2=qf2, | |||
gradient_steps_per_itr=150, | |||
max_path_length=250, | |||
max_path_length=150, | |||
max_eval_path_length=150, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
200 here?
I think we should instead focus on getting out our garage examples using the new metaworld api because this would also require making a backport for metaworld, and we don't want people using an older metaworld version to begin with. I think one thing that is related to this that I will do is change the parameters of our mtppo and Mttrpo. I think they both use max path lengths of 128, mainly because we didn't understand the effect that this would have. |
@@ -162,7 +162,8 @@ def step(self, action): | |||
# 'GarageEnv.TimeLimitTerminated' | |||
if 'TimeLimit.truncated' in info: | |||
info['GarageEnv.TimeLimitTerminated'] = done # done = True always | |||
done = not info['TimeLimit.truncated'] | |||
if info['TimeLimit.truncated']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand why this change is needed.
if info['TimeLimit.truncated'] = False
and done = True
then the environment terminated before the horizon limit and we should pass along done = True
, i.e. done = done and not info['TimeLimit.truncated']
i don't think this configuration can actually appear in the current implementation of gym (which only ever populates info['TimeLimit.truncated'] = True
at the horizon limit and omits it otherwise, but it is logically valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanjulian afaict this isn't the case.
The gym time limit truncated wrapper will only ever populate a info['TimeLimit.truncated'] = False iff the environment has been stepped past its max episode length AND done = True.
Another way of phrasing this is that timelimit truncated envs only ever return info['TimeLimit.truncated'] = True when the env has been stepped past its max episode length AND done = False.
See this line inside the timelimit gym wrapper for where this logic is implemented.
https://github.com/openai/gym/blob/abb815c871447a14532558772739ccb88deee109/gym/wrappers/time_limit.py#L19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simply unwinding the logic from the code you linked, recovering the original value of done
, looks like this:
if 'TimeLimit.truncated' in info:
info['GarageEnv.TimeLimitTerminated'] = True
done = not info['TimeLimit.truncated']
else:
info['GarageEnv.TimeLimitTerminated'] = False
info['TimeLimited.truncated'] = False
Why would you change it to be unconditional of the value of info['TimeLimit.truncated']
? Nothing in the code you linked implies that done
MUST be False
when self._elapsed_steps >= self._max_episode_steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanjulian a problem comes if we wrap an environment this way: GarageEnv(MultiEnvwrapper([GarageEnv(env)]))
, which is what happens in all of our MT examples.
Assuming an environment hasn't hit a timelimit here's what ends up happening is the following:
The environment gets stepped and inside the MultiEnvWrapper
the inner GarageEnv
wrapper adds the info entry info['TimeLimit.truncated'] = False
. Then the outer GarageEnv
wrapper evaluates the info that is returned by the inner GarageEnv
wrapper, which has info with the entry info['TimeLimit.truncated'] = False
. The outer GarageEnv
Wrapper now enters the if statement that you linked above:
if 'TimeLimit.truncated' in info:
info['GarageEnv.TimeLimitTerminated'] = True
done = not info['TimeLimit.truncated']
it now sets info['GarageEnv.TimeLimitTerminated'] = True
and done = not False
which is the incorrect behavior. Done
should only be modified when info['TimeLimit.truncated'] = True
never when it is False
.
That is the logic behind why I made this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the bug then that it was wrapped twice with GarageEnv
? what should make me believe this won't break the non-pathological case where there's only one GarageEnv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just detect double-wrapping and then vomit an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the bug then that it was wrapped twice with
GarageEnv
?
I thought this was a valid use case, which is why I wrote MultiEnvWrapper
the way that I did with nested GarageEnvs
what should make me believe this won't break the non-pathological case where there's only one
GarageEnv
There are 3 input cases here afaict if we assume that we are handling only the output of the TimeLimitWrapper
:
-
'TimeLimit.truncated' = True and done = True
we enter the proposed if statement
if info['TimeLimit.truncated']:
and setdone = False
-
'TimeLimit.truncated' = False and done = True
we don't enter the newly proposed if statement, and done is not modified.
-
'TimeLimit.truncated'
isn't inenv_infos
.
we don't enter the newly proposed if statement, and done is not modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a valid use case, it's really strange. What about the definition of GarageEnv
makes it seem like it would be intended to be used twice? That would be really surprising to any user.
In any case, it's apparently not a valid use case, because GarageEnv
is apparently not idempotent. So you either need to make GarageEnv
idempotent, or you can leave it non-idempotent, but detect double-wrapping and throw an error.
Here's the truth table as I see it:
'TimeLimit.truncated' in info |
info['TimeLimit.truncated'] |
done |
done = |
---|---|---|---|
False |
N/A | False |
False |
False |
N/A | True |
True |
True |
False |
False |
True |
True |
False |
True |
True |
True |
True |
False |
False |
True |
True |
True |
False |
As long as your fix implements this truth table for arbitrary nestings of GarageEnv (or only one wrapping, but throws errors on double-wrapping), I'm happy.
Please prove it with a test, since obviously this is getting pretty subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha. The change is meant to implement this table. Let me go ahead and implement a test that checks this on master and on this release branch.
@@ -93,8 +97,9 @@ def __init__( | |||
reward_scale=1.0, | |||
optimizer=torch.optim.Adam, | |||
steps_per_epoch=1, | |||
num_evaluation_trajectories=5, | |||
): | |||
# yapf: disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI was complaining that the placement of the ):
characters. It called it invalid syntax whenever YAPF would place it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just place # yapf: disable
on line 101 to prevent yapf from messing with the entire args block.
If you are going to fix a ton of issues in one PR, please document all of them extensively in the commit message. Right now, the PR title is misleading. |
Sorry I'll take care of that. |
c438663
to
1884720
Compare
Please address Ryan's comments, but otherwise this change looks good to me. |
1884720
to
98c0f63
Compare
Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.
98c0f63
to
e3638d1
Compare
@ryanjulian I investigated whether the GarageEnv Idempotent fix would need to be ported to master, and it doesn't. GymEnv can only wrap envs that implement gym.env, and Eric rewrote Multienv wrapper to no longer use GymEnv internally. |
There was an issue in 2020.06 where if an environment was created using garage env, then that environment was wrapped with multienv wrapper, and then garage env wrapped that multienv wrapper, that there would be errors with how the done signal is modified. Now, this issue has been fixed.
Here is a TBdev link to an mtsac link run with the fix.
This issue is the cause for #2028