Backport #1975, #1908, and #1905; Fix #2028 #2029

avnishn · 2020-09-08T21:06:18Z

There was an issue in 2020.06 where if an environment was created using garage env, then that environment was wrapped with multienv wrapper, and then garage env wrapped that multienv wrapper, that there would be errors with how the done signal is modified. Now, this issue has been fixed.

Here is a TBdev link to an mtsac link run with the fix.

This issue is the cause for #2028

ryanjulian · 2020-09-08T21:09:06Z

It looks like you have some serious problems with your branch.

Does this need to be backported to 2020.06?

avnishn · 2020-09-08T21:55:21Z

It looks like you have some serious problems with your branch.

Does this need to be backported to 2020.06?

yeah my mistake I had accidentally targeted master instead of release-2020.06

codecov · 2020-09-09T01:37:25Z

Codecov Report

Merging #2029 into release-2020.06 will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           release-2020.06    #2029   +/-   ##
================================================
  Coverage            90.67%   90.68%           
================================================
  Files                  213      213           
  Lines                11210    11213    +3     
  Branches              1346     1348    +2     
================================================
+ Hits                 10165    10168    +3     
- Misses                 791      792    +1     
+ Partials               254      253    -1

Impacted Files	Coverage Δ
src/garage/envs/garage_env.py	`95.00% <100.00%> (+0.08%)`	⬆️
src/garage/torch/algos/mtsac.py	`93.33% <100.00%> (+0.31%)`	⬆️
src/garage/torch/algos/sac.py	`98.23% <100.00%> (ø)`
src/garage/plotter/plotter.py	`61.36% <0.00%> (-2.28%)`	⬇️
src/garage/envs/grid_world_env.py	`93.65% <0.00%> (+3.17%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d8f0235...e3638d1. Read the comment docs.

ryanjulian

Can you set the max path lengths to 200 (or do that in a different PR + backport?)

ryanjulian · 2020-09-09T08:15:15Z

examples/torch/mtsac_metaworld_mt50.py

@@ -84,10 +84,11 @@ def mtsac_metaworld_mt50(ctxt=None, seed=1, use_gpu=False, _gpu=0):
                  qf1=qf1,
                  qf2=qf2,
                  gradient_steps_per_itr=150,
-                  max_path_length=250,
+                  max_path_length=150,
+                  max_eval_path_length=150,


avnishn · 2020-09-09T14:44:57Z

Can you set the max path lengths to 200 (or do that in a different PR + backport?)

I think we should instead focus on getting out our garage examples using the new metaworld api because this would also require making a backport for metaworld, and we don't want people using an older metaworld version to begin with.

I think one thing that is related to this that I will do is change the parameters of our mtppo and Mttrpo. I think they both use max path lengths of 128, mainly because we didn't understand the effect that this would have.

ryanjulian · 2020-09-09T18:36:36Z

src/garage/envs/garage_env.py

@@ -162,7 +162,8 @@ def step(self, action):
        # 'GarageEnv.TimeLimitTerminated'
        if 'TimeLimit.truncated' in info:
            info['GarageEnv.TimeLimitTerminated'] = done  # done = True always
-            done = not info['TimeLimit.truncated']
+            if info['TimeLimit.truncated']:


i don't understand why this change is needed.

if info['TimeLimit.truncated'] = False and done = True then the environment terminated before the horizon limit and we should pass along done = True, i.e. done = done and not info['TimeLimit.truncated']

i don't think this configuration can actually appear in the current implementation of gym (which only ever populates info['TimeLimit.truncated'] = True at the horizon limit and omits it otherwise, but it is logically valid.

@ryanjulian afaict this isn't the case.

The gym time limit truncated wrapper will only ever populate a info['TimeLimit.truncated'] = False iff the environment has been stepped past its max episode length AND done = True.

Another way of phrasing this is that timelimit truncated envs only ever return info['TimeLimit.truncated'] = True when the env has been stepped past its max episode length AND done = False.

See this line inside the timelimit gym wrapper for where this logic is implemented.
https://github.com/openai/gym/blob/abb815c871447a14532558772739ccb88deee109/gym/wrappers/time_limit.py#L19

simply unwinding the logic from the code you linked, recovering the original value of done, looks like this:

if 'TimeLimit.truncated' in info: info['GarageEnv.TimeLimitTerminated'] = True done = not info['TimeLimit.truncated'] else: info['GarageEnv.TimeLimitTerminated'] = False info['TimeLimited.truncated'] = False

Why would you change it to be unconditional of the value of info['TimeLimit.truncated']? Nothing in the code you linked implies that done MUST be False when self._elapsed_steps >= self._max_episode_steps

@ryanjulian a problem comes if we wrap an environment this way: GarageEnv(MultiEnvwrapper([GarageEnv(env)])), which is what happens in all of our MT examples.

Assuming an environment hasn't hit a timelimit here's what ends up happening is the following:

The environment gets stepped and inside the MultiEnvWrapper the inner GarageEnv wrapper adds the info entry info['TimeLimit.truncated'] = False. Then the outer GarageEnv wrapper evaluates the info that is returned by the inner GarageEnv wrapper, which has info with the entry info['TimeLimit.truncated'] = False. The outer GarageEnv Wrapper now enters the if statement that you linked above:

if 'TimeLimit.truncated' in info: info['GarageEnv.TimeLimitTerminated'] = True done = not info['TimeLimit.truncated']

it now sets info['GarageEnv.TimeLimitTerminated'] = True and done = not False which is the incorrect behavior. Done should only be modified when info['TimeLimit.truncated'] = True never when it is False.

That is the logic behind why I made this change.

isn't the bug then that it was wrapped twice with GarageEnv? what should make me believe this won't break the non-pathological case where there's only one GarageEnv?

why not just detect double-wrapping and then vomit an error?

isn't the bug then that it was wrapped twice with GarageEnv?

I thought this was a valid use case, which is why I wrote MultiEnvWrapper the way that I did with nested GarageEnvs

what should make me believe this won't break the non-pathological case where there's only one GarageEnv

There are 3 input cases here afaict if we assume that we are handling only the output of the TimeLimitWrapper:

'TimeLimit.truncated' = True and done = True

we enter the proposed if statement if info['TimeLimit.truncated']: and set done = False

'TimeLimit.truncated' = False and done = True

we don't enter the newly proposed if statement, and done is not modified.

'TimeLimit.truncated' isn't in env_infos.
we don't enter the newly proposed if statement, and done is not modified.

If it's a valid use case, it's really strange. What about the definition of GarageEnv makes it seem like it would be intended to be used twice? That would be really surprising to any user.

In any case, it's apparently not a valid use case, because GarageEnv is apparently not idempotent. So you either need to make GarageEnv idempotent, or you can leave it non-idempotent, but detect double-wrapping and throw an error.

Here's the truth table as I see it:

'TimeLimit.truncated' in info info['TimeLimit.truncated'] done done =

False N/A False False

False N/A True True

True False False True

True False True True

True True False False

True True True False

As long as your fix implements this truth table for arbitrary nestings of GarageEnv (or only one wrapping, but throws errors on double-wrapping), I'm happy.

Please prove it with a test, since obviously this is getting pretty subtle.

gotcha. The change is meant to implement this table. Let me go ahead and implement a test that checks this on master and on this release branch.

ryanjulian · 2020-09-09T18:37:03Z

src/garage/torch/algos/mtsac.py

@@ -93,8 +97,9 @@ def __init__(
            reward_scale=1.0,
            optimizer=torch.optim.Adam,
            steps_per_epoch=1,
-            num_evaluation_trajectories=5,
-    ):
+            # yapf: disable


The CI was complaining that the placement of the ): characters. It called it invalid syntax whenever YAPF would place it automatically.

just place # yapf: disable on line 101 to prevent yapf from messing with the entire args block.

new question

ryanjulian · 2020-09-09T19:18:04Z

If you are going to fix a ton of issues in one PR, please document all of them extensively in the commit message. Right now, the PR title is misleading.

avnishn · 2020-09-09T19:23:37Z

If you are going to fix a ton of issues in one PR, please document all of them extensively in the commit message. Right now, the PR title is misleading.

Sorry I'll take care of that.

krzentner · 2020-09-09T22:52:21Z

Please address Ryan's comments, but otherwise this change looks good to me.

Backport #1905, #1975, #1908 to fix problems with max_eval_path_length being not used by mtsac and sac, and add checking for incorrect num_tasks being set in mtsac. Timelimit.truncated modified only when necessary This issue occurs when there are multiple garage envs that are nested or timelimit truncated = False is included in the environment keys. Previously, our timelimit truncated logic was written with the idea in mind that the key was only added when a time limit truncation occured. If an environment already has timelimit truncated = False in its keys then the previous behavior was to set Done = True which is the incorrect behavior. That was causing performance degradation in MTSAC and MTPPO/TRPO. Now Done is only true in the normal/trivial case, never if timelimit truncated is False.

avnishn · 2020-09-11T22:54:57Z

@ryanjulian I investigated whether the GarageEnv Idempotent fix would need to be ported to master, and it doesn't. GymEnv can only wrap envs that implement gym.env, and Eric rewrote Multienv wrapper to no longer use GymEnv internally.

avnishn requested a review from a team as a code owner September 8, 2020 21:06

avnishn requested review from ryanjulian and removed request for a team September 8, 2020 21:06

avnishn linked an issue Sep 8, 2020 that may be closed by this pull request

MTSAC not learning #2028

Closed

mergify bot requested review from a team, zequnyu and yeukfu and removed request for a team September 8, 2020 21:06

avnishn changed the base branch from master to release-2020.06 September 8, 2020 21:54

yeukfu approved these changes Sep 8, 2020

View reviewed changes

mergify bot requested a review from a team September 8, 2020 22:44

avnishn closed this Sep 8, 2020

avnishn reopened this Sep 8, 2020

avnishn force-pushed the avnish-fix-nested-garage-env branch from d506339 to c438663 Compare September 9, 2020 00:30

ryanjulian approved these changes Sep 9, 2020

View reviewed changes

ryanjulian previously approved these changes Sep 9, 2020

View reviewed changes

ryanjulian reviewed Sep 9, 2020

View reviewed changes

avnishn mentioned this pull request Sep 9, 2020

Backport #1905, #1975, #1908 #2002

Closed

avnishn force-pushed the avnish-fix-nested-garage-env branch from c438663 to 1884720 Compare September 9, 2020 20:05

This was linked to issues Sep 9, 2020

Backport #1975 to release-2020.06 #1988

Closed

Backport #1908 to release-2020.06 #1950

Closed

ryanjulian linked an issue Sep 9, 2020 that may be closed by this pull request

Backport #1905 to release-2020.06 #1949

Closed

ryanjulian changed the title ~~Timelimit.truncated modified only when necessary~~ Backport #1975, #1908, and #1905; Address #2028 Sep 9, 2020

ryanjulian changed the title ~~Backport #1975, #1908, and #1905; Address #2028~~ Backport #1975, #1908, and #1905; Fix #2028 Sep 9, 2020

zequnyu approved these changes Sep 10, 2020

View reviewed changes

mergify bot requested a review from a team September 10, 2020 16:50

avnishn force-pushed the avnish-fix-nested-garage-env branch from 1884720 to 98c0f63 Compare September 11, 2020 19:53

avnishn force-pushed the avnish-fix-nested-garage-env branch from 98c0f63 to e3638d1 Compare September 11, 2020 20:34

avnishn added the ready-to-merge label Sep 11, 2020

mergify bot merged commit 5ce8850 into release-2020.06 Sep 11, 2020

mergify bot deleted the avnish-fix-nested-garage-env branch September 11, 2020 22:55

avnishn mentioned this pull request Sep 11, 2020

MTSAC not learning #2028

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport #1975, #1908, and #1905; Fix #2028 #2029

Backport #1975, #1908, and #1905; Fix #2028 #2029

avnishn commented Sep 8, 2020 •

edited

ryanjulian commented Sep 8, 2020

avnishn commented Sep 8, 2020

codecov bot commented Sep 9, 2020 •

edited

ryanjulian left a comment

ryanjulian Sep 9, 2020

avnishn commented Sep 9, 2020

ryanjulian Sep 9, 2020

avnishn Sep 9, 2020

ryanjulian Sep 9, 2020

avnishn Sep 9, 2020 •

edited

ryanjulian Sep 9, 2020

ryanjulian Sep 10, 2020

avnishn Sep 10, 2020

ryanjulian Sep 10, 2020

avnishn Sep 10, 2020 •

edited

ryanjulian Sep 9, 2020

avnishn Sep 9, 2020

ryanjulian Sep 9, 2020

ryanjulian commented Sep 9, 2020

avnishn commented Sep 9, 2020

krzentner commented Sep 9, 2020

avnishn commented Sep 11, 2020

`'TimeLimit.truncated' in info`	`info['TimeLimit.truncated']`	`done`	`done =`
`False`	N/A	`False`	`False`
`False`	N/A	`True`	`True`
`True`	`False`	`False`	`True`
`True`	`False`	`True`	`True`
`True`	`True`	`False`	`False`
`True`	`True`	`True`	`False`

Backport #1975, #1908, and #1905; Fix #2028 #2029

Backport #1975, #1908, and #1905; Fix #2028 #2029

Conversation

avnishn commented Sep 8, 2020 • edited

ryanjulian commented Sep 8, 2020

avnishn commented Sep 8, 2020

codecov bot commented Sep 9, 2020 • edited

Codecov Report

ryanjulian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn commented Sep 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn Sep 9, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

avnishn Sep 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanjulian commented Sep 9, 2020

avnishn commented Sep 9, 2020

krzentner commented Sep 9, 2020

avnishn commented Sep 11, 2020

avnishn commented Sep 8, 2020 •

edited

codecov bot commented Sep 9, 2020 •

edited

avnishn Sep 9, 2020 •

edited

avnishn Sep 10, 2020 •

edited