Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Ant/Humanoid environments #14

Closed
bamos opened this issue Jan 21, 2020 · 6 comments
Closed

Clarification on Ant/Humanoid environments #14

bamos opened this issue Jan 21, 2020 · 6 comments

Comments

@bamos
Copy link

bamos commented Jan 21, 2020

Hi -- this dir has modified gym ~v2 environments for the ant/humanoid that have a modified observation space and early termination and this file calls into the parameterized gym v3 environments with no early termination or time-based rewards, and if I understand correctly, also inherits the original observation space of the v3 environments. These two files seem to be in conflict with each other as the mbpo/env environments don't have the same parameters as examples/development/base.py. Can you clarify what envs/observation spaces/rewards you use for training and report in the paper?

My current assumption is that the base.py code is the latest and that the mbpo/env code is out-dated and that you are using the v3 envs with the default observation space there, no early termination/alive bonus, and are training on and directly reporting the reward from these environments (rather than re-running the evaluation in the default v3 environments with time bonus and ET). Is this correct?

@bamos
Copy link
Author

bamos commented Jan 21, 2020

Ah, and also this config file points to the Humanoid-v2, so perhaps the mbpo/env version is being used somewhere?

@bamos
Copy link
Author

bamos commented Jan 21, 2020

Sorry one last question -- the configs for the hopper/walker also point to the gym v2 versions of these environments, which have early termination and alive bonuses. Do you actually use the parameterized v3 environments that don't have early termination or alive bonuses?

@jannerm
Copy link
Owner

jannerm commented Jan 22, 2020

Thanks for the questions!

We're using the v2 environments. The changes in Ant and Humanoid should be limited to truncating the observations, and the termination conditions should be the same as in the originals (Ant, Humanoid for reference).

I just pushed a commit that hopefully makes this clearer from reading the code. It changes the environment registration so that the modified environments have unique names instead of overwriting the defaults, and removes environment parameters that are not actually used because we test on different versions of the environments. (It is a bit unfortunate that those unused parameters were there in the first place; thanks for catching that.)

@bamos
Copy link
Author

bamos commented Jan 22, 2020

Great, thanks for the quick response and clarification! It may be worth making this difference more visible somewhere as the paper says the 1000-step versions of these environments are used but most of these v2 environments use early termination

@bamos bamos closed this as completed Jan 22, 2020
@jannerm
Copy link
Owner

jannerm commented Jan 22, 2020

We included the sentence about using the standard 1000-step benchmarks because it's common to modify them to have a shorter horizon (eg, here and here), and this caused some of the baselines to look like they had different performance than originally reported. I can see how this is a bit underspecified now that there are newer versions of the environments that always run for 1000 steps, so I'll make note of this in the paper. Thanks for the catch!

@bamos
Copy link
Author

bamos commented Jan 22, 2020

Ah yeah, makes sense! Also I copied the new envs into my code and had to add 'max_episode_steps': 1000 to the specs to get the time-limited versions of the environments by default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants