Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport #1975, #1908, and #1905; Fix #2028 #2029

Merged
merged 1 commit into from Sep 11, 2020

Conversation

avnishn
Copy link
Member

@avnishn avnishn commented Sep 8, 2020

There was an issue in 2020.06 where if an environment was created using garage env, then that environment was wrapped with multienv wrapper, and then garage env wrapped that multienv wrapper, that there would be errors with how the done signal is modified. Now, this issue has been fixed.

Here is a TBdev link to an mtsac link run with the fix.

This issue is the cause for #2028

@avnishn avnishn requested a review from a team as a code owner September 8, 2020 21:06
@avnishn avnishn requested review from ryanjulian and removed request for a team September 8, 2020 21:06
@avnishn avnishn linked an issue Sep 8, 2020 that may be closed by this pull request
@mergify mergify bot requested review from a team, zequnyu and yeukfu and removed request for a team September 8, 2020 21:06
@ryanjulian
Copy link
Member

It looks like you have some serious problems with your branch.

Does this need to be backported to 2020.06?

@avnishn avnishn changed the base branch from master to release-2020.06 September 8, 2020 21:54
@avnishn
Copy link
Member Author

avnishn commented Sep 8, 2020

It looks like you have some serious problems with your branch.

Does this need to be backported to 2020.06?

yeah my mistake I had accidentally targeted master instead of release-2020.06

@mergify mergify bot requested a review from a team September 8, 2020 22:44
@avnishn avnishn closed this Sep 8, 2020
@avnishn avnishn reopened this Sep 8, 2020
@codecov
Copy link

codecov bot commented Sep 9, 2020

Codecov Report

Merging #2029 into release-2020.06 will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           release-2020.06    #2029   +/-   ##
================================================
  Coverage            90.67%   90.68%           
================================================
  Files                  213      213           
  Lines                11210    11213    +3     
  Branches              1346     1348    +2     
================================================
+ Hits                 10165    10168    +3     
- Misses                 791      792    +1     
+ Partials               254      253    -1     
Impacted Files Coverage Δ
src/garage/envs/garage_env.py 95.00% <100.00%> (+0.08%) ⬆️
src/garage/torch/algos/mtsac.py 93.33% <100.00%> (+0.31%) ⬆️
src/garage/torch/algos/sac.py 98.23% <100.00%> (ø)
src/garage/plotter/plotter.py 61.36% <0.00%> (-2.28%) ⬇️
src/garage/envs/grid_world_env.py 93.65% <0.00%> (+3.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8f0235...e3638d1. Read the comment docs.

Copy link
Member

@ryanjulian ryanjulian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set the max path lengths to 200 (or do that in a different PR + backport?)

@@ -84,10 +84,11 @@ def mtsac_metaworld_mt50(ctxt=None, seed=1, use_gpu=False, _gpu=0):
qf1=qf1,
qf2=qf2,
gradient_steps_per_itr=150,
max_path_length=250,
max_path_length=150,
max_eval_path_length=150,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

200 here?

@avnishn
Copy link
Member Author

avnishn commented Sep 9, 2020

Can you set the max path lengths to 200 (or do that in a different PR + backport?)

I think we should instead focus on getting out our garage examples using the new metaworld api because this would also require making a backport for metaworld, and we don't want people using an older metaworld version to begin with.

I think one thing that is related to this that I will do is change the parameters of our mtppo and Mttrpo. I think they both use max path lengths of 128, mainly because we didn't understand the effect that this would have.

ryanjulian
ryanjulian previously approved these changes Sep 9, 2020
@@ -162,7 +162,8 @@ def step(self, action):
# 'GarageEnv.TimeLimitTerminated'
if 'TimeLimit.truncated' in info:
info['GarageEnv.TimeLimitTerminated'] = done # done = True always
done = not info['TimeLimit.truncated']
if info['TimeLimit.truncated']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand why this change is needed.

if info['TimeLimit.truncated'] = False and done = True then the environment terminated before the horizon limit and we should pass along done = True, i.e. done = done and not info['TimeLimit.truncated']

i don't think this configuration can actually appear in the current implementation of gym (which only ever populates info['TimeLimit.truncated'] = True at the horizon limit and omits it otherwise, but it is logically valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanjulian afaict this isn't the case.

The gym time limit truncated wrapper will only ever populate a info['TimeLimit.truncated'] = False iff the environment has been stepped past its max episode length AND done = True.

Another way of phrasing this is that timelimit truncated envs only ever return info['TimeLimit.truncated'] = True when the env has been stepped past its max episode length AND done = False.

See this line inside the timelimit gym wrapper for where this logic is implemented.
https://github.com/openai/gym/blob/abb815c871447a14532558772739ccb88deee109/gym/wrappers/time_limit.py#L19

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply unwinding the logic from the code you linked, recovering the original value of done, looks like this:

if 'TimeLimit.truncated' in info:
    info['GarageEnv.TimeLimitTerminated'] = True
    done = not info['TimeLimit.truncated']
else:
    info['GarageEnv.TimeLimitTerminated'] = False
    info['TimeLimited.truncated'] = False

Why would you change it to be unconditional of the value of info['TimeLimit.truncated']? Nothing in the code you linked implies that done MUST be False when self._elapsed_steps >= self._max_episode_steps

Copy link
Member Author

@avnishn avnishn Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanjulian a problem comes if we wrap an environment this way: GarageEnv(MultiEnvwrapper([GarageEnv(env)])), which is what happens in all of our MT examples.

Assuming an environment hasn't hit a timelimit here's what ends up happening is the following:

The environment gets stepped and inside the MultiEnvWrapper the inner GarageEnv wrapper adds the info entry info['TimeLimit.truncated'] = False. Then the outer GarageEnv wrapper evaluates the info that is returned by the inner GarageEnv wrapper, which has info with the entry info['TimeLimit.truncated'] = False. The outer GarageEnv Wrapper now enters the if statement that you linked above:

if 'TimeLimit.truncated' in info:
    info['GarageEnv.TimeLimitTerminated'] = True
    done = not info['TimeLimit.truncated']

it now sets info['GarageEnv.TimeLimitTerminated'] = True and done = not False which is the incorrect behavior. Done should only be modified when info['TimeLimit.truncated'] = True never when it is False.

That is the logic behind why I made this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the bug then that it was wrapped twice with GarageEnv? what should make me believe this won't break the non-pathological case where there's only one GarageEnv?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just detect double-wrapping and then vomit an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the bug then that it was wrapped twice with GarageEnv?

I thought this was a valid use case, which is why I wrote MultiEnvWrapper the way that I did with nested GarageEnvs

what should make me believe this won't break the non-pathological case where there's only one GarageEnv

There are 3 input cases here afaict if we assume that we are handling only the output of the TimeLimitWrapper:

  1. 'TimeLimit.truncated' = True and done = True

    we enter the proposed if statement if info['TimeLimit.truncated']: and set done = False

  2. 'TimeLimit.truncated' = False and done = True

    we don't enter the newly proposed if statement, and done is not modified.

  3. 'TimeLimit.truncated' isn't in env_infos.
    we don't enter the newly proposed if statement, and done is not modified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a valid use case, it's really strange. What about the definition of GarageEnv makes it seem like it would be intended to be used twice? That would be really surprising to any user.

In any case, it's apparently not a valid use case, because GarageEnv is apparently not idempotent. So you either need to make GarageEnv idempotent, or you can leave it non-idempotent, but detect double-wrapping and throw an error.

Here's the truth table as I see it:

'TimeLimit.truncated' in info info['TimeLimit.truncated'] done done =
False N/A False False
False N/A True True
True False False True
True False True True
True True False False
True True True False

As long as your fix implements this truth table for arbitrary nestings of GarageEnv (or only one wrapping, but throws errors on double-wrapping), I'm happy.

Please prove it with a test, since obviously this is getting pretty subtle.

Copy link
Member Author

@avnishn avnishn Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. The change is meant to implement this table. Let me go ahead and implement a test that checks this on master and on this release branch.

@@ -93,8 +97,9 @@ def __init__(
reward_scale=1.0,
optimizer=torch.optim.Adam,
steps_per_epoch=1,
num_evaluation_trajectories=5,
):
# yapf: disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI was complaining that the placement of the ): characters. It called it invalid syntax whenever YAPF would place it automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just place # yapf: disable on line 101 to prevent yapf from messing with the entire args block.

@ryanjulian
Copy link
Member

If you are going to fix a ton of issues in one PR, please document all of them extensively in the commit message. Right now, the PR title is misleading.

@avnishn
Copy link
Member Author

avnishn commented Sep 9, 2020

If you are going to fix a ton of issues in one PR, please document all of them extensively in the commit message. Right now, the PR title is misleading.

Sorry I'll take care of that.

@ryanjulian ryanjulian linked an issue Sep 9, 2020 that may be closed by this pull request
@ryanjulian ryanjulian changed the title Timelimit.truncated modified only when necessary Backport #1975, #1908, and #1905; Address #2028 Sep 9, 2020
@ryanjulian ryanjulian changed the title Backport #1975, #1908, and #1905; Address #2028 Backport #1975, #1908, and #1905; Fix #2028 Sep 9, 2020
@krzentner
Copy link
Contributor

Please address Ryan's comments, but otherwise this change looks good to me.

@mergify mergify bot requested a review from a team September 10, 2020 16:50
Backport #1905, #1975, #1908 to fix problems
with max_eval_path_length being not used by
mtsac and sac, and add checking for incorrect
num_tasks being set in mtsac.

Timelimit.truncated modified only when necessary

This issue occurs when there are multiple garage
envs that are nested or timelimit truncated = False
is included in the environment keys.
Previously, our timelimit
truncated logic was written with the idea in mind
that the key was only added when a time limit
truncation occured. If an environment already
has timelimit truncated = False in its keys
then the previous behavior was to set Done = True
which is the incorrect behavior.

That was causing performance degradation
in MTSAC and MTPPO/TRPO.

Now Done is only true in the normal/trivial case,
never if timelimit truncated is False.
@avnishn
Copy link
Member Author

avnishn commented Sep 11, 2020

@ryanjulian I investigated whether the GarageEnv Idempotent fix would need to be ported to master, and it doesn't. GymEnv can only wrap envs that implement gym.env, and Eric rewrote Multienv wrapper to no longer use GymEnv internally.

@mergify mergify bot merged commit 5ce8850 into release-2020.06 Sep 11, 2020
@mergify mergify bot deleted the avnish-fix-nested-garage-env branch September 11, 2020 22:55
@avnishn avnishn mentioned this pull request Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants