Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Observation Space for SawyerReachPushPickPlaceEnv #89

Conversation

adibellathur
Copy link
Contributor

Fix for issue #39 for a single environment.
A detailed explanation of the logic behind these changes is found here: #39 (comment)

Fix for issue Farama-Foundation#39
A detailed explanation of the logic behind these changes is found here: Farama-Foundation#39 (comment)
@avnishn avnishn requested a review from ryanjulian May 29, 2020 01:00
@ryanjulian
Copy link
Contributor

Out of an abundance of caution, you should first run garage SAC on this environment (random_init=True) and ensure that it can still train Reach and Push (PickPlace is flaky, i think).

@avnishn
Copy link
Contributor

avnishn commented May 29, 2020

Is it possible to write a test that verifies that your fix works, and that would break the previous committed code? It’s hard to merge fixes like this without something like that.

@haydenshively
Copy link
Contributor

haydenshively commented May 29, 2020

Our MujocoEnv class has the following interesting comment:

This is a simplified version of the gym MujocoEnv class.
Some differences are:
- Do not automatically set the observation/action space.

I looked at the gym implementation here, and it looks like they set their action space based on an actuator's control range:

bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)
low, high = bounds.T
self.action_space = spaces.Box(low=low, high=high, dtype=np.float32)
return self.action_space

If I understand correctly, @ryanjulian and others were discussing the use of arm + workspace constraints (#39) to analytically determine the observation space (as opposed to empirically). As such, could we make use of something* like model.actuator_ctrlrange? It looks like the parameters in this PR are correct, but if we need to fix this for every environment, manual experimentation doesn't seem efficient.

*If you search this page for ctrlrange, you'll find that many objects include it as a property. I suppose we may not use that one in particular, but there's got to be something in the XML that can help us parameterize this

@haydenshively
Copy link
Contributor

If you look at sawyer_reach_push_pick_and_place.xml, it links to sawyer_xyz_base.xml. At the top of that file, the tabletop is defined as follows:

<geom name="tableTop" type="plane" pos="0 0.6 0" size="0.4 0.4 0.5" rgba=".6 .6 .5 1" contype="1" conaffinity="1" friction="2 0.1 0.002" material="light_wood_v3"/>

Combining the pos and size, we get the X & Y obj values that @adibellathur found here. The same goes for Z obj values, which can be found in sawyer_reach_push_pick_and_place.xml:

<geom name="objGeom" type="cylinder" pos="0 0 0" solimp="0.99 0.99 0.01" size="0.03 0.015" rgba="1 0 0 1" solref="0.01 1" contype="1" conaffinity="1" friction="1 0.1 0.002" condim="4" material="wood" />

This seems to confirm his numbers for obj_low and obj_high. Solving for hand_low and hand_high may be a bit more involved since the sawyer arm has multiple sections. In any case, we should be able to access these parameters via something like self.model.tableTop.foo, though I haven't confirmed that yet.

@adibellathur
Copy link
Contributor Author

Hello All,

@avnishn the test for this change was in the previous PR, but due to confusion on getting the observation space for the active env in an ML1 env, that push was not approved. I will make a change and update that PR, hopefully allowing for a test that does not change MultitTask Env's handling of the observation space. Once that is done I can write a test script here.

@haydenshively Currently I'm using the .xml files to get a lot of the observation space bounds. I'm just using experimentation to simply verify they are correct (both visually and through error checking). It was also easier to explain issues with experimentation. Also, as far as I am aware the action_space and observation_space are 2 separate entities. the action space tells the range for possible actions the model can take (for example sawyer_reach_push_pick_and_place uses it to set the mocap values) while observation space is the positions the objects can be in (e.g. robot arm and puck xyz values). Maybe there is something we can find in the gym implementation though so I will take a look there. Thanks!

@haydenshively
Copy link
Contributor

Yeah action_space and observation_space are definitely separate; I was just trying to find examples of using XML to parameterize things. But if you’ve already been looking at it that’s great!

@adibellathur
Copy link
Contributor Author

adibellathur commented May 30, 2020

Out of an abundance of caution, you should first run garage SAC on this environment (random_init=True) and ensure that it can still train Reach and Push (PickPlace is flaky, i think).

@ryanjulian Sounds good. @avnishn is helping me run it as my personal computer runs out of memory after 30 epochs. How long should we run SAC for?

@ryanjulian
Copy link
Contributor

You should run it until the environment converges. I think that can take up to 5M steps.

@avnishn
Copy link
Contributor

avnishn commented Jun 8, 2020

https://tensorboard.dev/experiment/Zi4NkYjNQWqUPLljyXGoIQ/

@ryanjulian, @adibellathur's results look good to me, I'm inclined to merge this PR.

@ryanjulian
Copy link
Contributor

Can you send an updated tensorboard.dev with a before/after comparison?

@avnishn
Copy link
Contributor

avnishn commented Jun 9, 2020

@ryanjulian @adibellathur here are is a tensorboard with old and new runs:

https://tensorboard.dev/experiment/dD6gjzVYTniVw4GbTwrsxA/#scalars

@avnishn
Copy link
Contributor

avnishn commented Jun 9, 2020

@ryanjulian the results are in on the tensorboard in the above comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants