-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on Ant/Humanoid environments #14
Comments
Ah, and also this config file points to the Humanoid-v2, so perhaps the |
Sorry one last question -- the configs for the hopper/walker also point to the gym v2 versions of these environments, which have early termination and alive bonuses. Do you actually use the parameterized v3 environments that don't have early termination or alive bonuses? |
Thanks for the questions! We're using the v2 environments. The changes in Ant and Humanoid should be limited to truncating the observations, and the termination conditions should be the same as in the originals (Ant, Humanoid for reference). I just pushed a commit that hopefully makes this clearer from reading the code. It changes the environment registration so that the modified environments have unique names instead of overwriting the defaults, and removes environment parameters that are not actually used because we test on different versions of the environments. (It is a bit unfortunate that those unused parameters were there in the first place; thanks for catching that.) |
Great, thanks for the quick response and clarification! It may be worth making this difference more visible somewhere as the paper says the 1000-step versions of these environments are used but most of these v2 environments use early termination |
We included the sentence about using the standard 1000-step benchmarks because it's common to modify them to have a shorter horizon (eg, here and here), and this caused some of the baselines to look like they had different performance than originally reported. I can see how this is a bit underspecified now that there are newer versions of the environments that always run for 1000 steps, so I'll make note of this in the paper. Thanks for the catch! |
Ah yeah, makes sense! Also I copied the new envs into my code and had to add |
Hi -- this dir has modified gym ~v2 environments for the ant/humanoid that have a modified observation space and early termination and this file calls into the parameterized gym v3 environments with no early termination or time-based rewards, and if I understand correctly, also inherits the original observation space of the v3 environments. These two files seem to be in conflict with each other as the
mbpo/env
environments don't have the same parameters asexamples/development/base.py
. Can you clarify what envs/observation spaces/rewards you use for training and report in the paper?My current assumption is that the
base.py
code is the latest and that thembpo/env
code is out-dated and that you are using thev3
envs with the default observation space there, no early termination/alive bonus, and are training on and directly reporting the reward from these environments (rather than re-running the evaluation in the default v3 environments with time bonus and ET). Is this correct?The text was updated successfully, but these errors were encountered: