Clarification on Ant/Humanoid environments #14

bamos · 2020-01-21T19:51:21Z

Hi -- this dir has modified gym ~v2 environments for the ant/humanoid that have a modified observation space and early termination and this file calls into the parameterized gym v3 environments with no early termination or time-based rewards, and if I understand correctly, also inherits the original observation space of the v3 environments. These two files seem to be in conflict with each other as the mbpo/env environments don't have the same parameters as examples/development/base.py. Can you clarify what envs/observation spaces/rewards you use for training and report in the paper?

My current assumption is that the base.py code is the latest and that the mbpo/env code is out-dated and that you are using the v3 envs with the default observation space there, no early termination/alive bonus, and are training on and directly reporting the reward from these environments (rather than re-running the evaluation in the default v3 environments with time bonus and ET). Is this correct?

The text was updated successfully, but these errors were encountered:

bamos · 2020-01-21T20:03:21Z

Ah, and also this config file points to the Humanoid-v2, so perhaps the mbpo/env version is being used somewhere?

bamos · 2020-01-21T20:18:33Z

Sorry one last question -- the configs for the hopper/walker also point to the gym v2 versions of these environments, which have early termination and alive bonuses. Do you actually use the parameterized v3 environments that don't have early termination or alive bonuses?

jannerm · 2020-01-22T02:37:15Z

Thanks for the questions!

We're using the v2 environments. The changes in Ant and Humanoid should be limited to truncating the observations, and the termination conditions should be the same as in the originals (Ant, Humanoid for reference).

I just pushed a commit that hopefully makes this clearer from reading the code. It changes the environment registration so that the modified environments have unique names instead of overwriting the defaults, and removes environment parameters that are not actually used because we test on different versions of the environments. (It is a bit unfortunate that those unused parameters were there in the first place; thanks for catching that.)

bamos · 2020-01-22T18:17:12Z

Great, thanks for the quick response and clarification! It may be worth making this difference more visible somewhere as the paper says the 1000-step versions of these environments are used but most of these v2 environments use early termination

jannerm · 2020-01-22T19:51:04Z

We included the sentence about using the standard 1000-step benchmarks because it's common to modify them to have a shorter horizon (eg, here and here), and this caused some of the baselines to look like they had different performance than originally reported. I can see how this is a bit underspecified now that there are newer versions of the environments that always run for 1000 steps, so I'll make note of this in the paper. Thanks for the catch!

bamos · 2020-01-22T20:57:16Z

Ah yeah, makes sense! Also I copied the new envs into my code and had to add 'max_episode_steps': 1000 to the specs to get the time-limited versions of the environments by default

bamos closed this as completed Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Ant/Humanoid environments #14

Clarification on Ant/Humanoid environments #14

bamos commented Jan 21, 2020

bamos commented Jan 21, 2020

bamos commented Jan 21, 2020 •

edited

Loading

jannerm commented Jan 22, 2020 •

edited

Loading

bamos commented Jan 22, 2020

jannerm commented Jan 22, 2020

bamos commented Jan 22, 2020

Clarification on Ant/Humanoid environments #14

Clarification on Ant/Humanoid environments #14

Comments

bamos commented Jan 21, 2020

bamos commented Jan 21, 2020

bamos commented Jan 21, 2020 • edited Loading

jannerm commented Jan 22, 2020 • edited Loading

bamos commented Jan 22, 2020

jannerm commented Jan 22, 2020

bamos commented Jan 22, 2020

bamos commented Jan 21, 2020 •

edited

Loading

jannerm commented Jan 22, 2020 •

edited

Loading