-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Observation Space for SawyerReachPushPickPlaceEnv #89
Updated Observation Space for SawyerReachPushPickPlaceEnv #89
Conversation
Fix for issue Farama-Foundation#39 A detailed explanation of the logic behind these changes is found here: Farama-Foundation#39 (comment)
Out of an abundance of caution, you should first run garage SAC on this environment (random_init=True) and ensure that it can still train Reach and Push (PickPlace is flaky, i think). |
Is it possible to write a test that verifies that your fix works, and that would break the previous committed code? It’s hard to merge fixes like this without something like that. |
Our
I looked at the gym implementation here, and it looks like they set their action space based on an actuator's control range:
If I understand correctly, @ryanjulian and others were discussing the use of arm + workspace constraints (#39) to analytically determine the observation space (as opposed to empirically). As such, could we make use of something* like *If you search this page for ctrlrange, you'll find that many objects include it as a property. I suppose we may not use that one in particular, but there's got to be something in the XML that can help us parameterize this |
If you look at
Combining the pos and size, we get the X & Y obj values that @adibellathur found here. The same goes for Z obj values, which can be found in
This seems to confirm his numbers for |
Hello All, @avnishn the test for this change was in the previous PR, but due to confusion on getting the observation space for the active env in an ML1 env, that push was not approved. I will make a change and update that PR, hopefully allowing for a test that does not change MultitTask Env's handling of the observation space. Once that is done I can write a test script here. @haydenshively Currently I'm using the .xml files to get a lot of the observation space bounds. I'm just using experimentation to simply verify they are correct (both visually and through error checking). It was also easier to explain issues with experimentation. Also, as far as I am aware the action_space and observation_space are 2 separate entities. the action space tells the range for possible actions the model can take (for example sawyer_reach_push_pick_and_place uses it to set the mocap values) while observation space is the positions the objects can be in (e.g. robot arm and puck xyz values). Maybe there is something we can find in the gym implementation though so I will take a look there. Thanks! |
Yeah action_space and observation_space are definitely separate; I was just trying to find examples of using XML to parameterize things. But if you’ve already been looking at it that’s great! |
@ryanjulian Sounds good. @avnishn is helping me run it as my personal computer runs out of memory after 30 epochs. How long should we run SAC for? |
You should run it until the environment converges. I think that can take up to 5M steps. |
https://tensorboard.dev/experiment/Zi4NkYjNQWqUPLljyXGoIQ/ @ryanjulian, @adibellathur's results look good to me, I'm inclined to merge this PR. |
Can you send an updated tensorboard.dev with a before/after comparison? |
@ryanjulian @adibellathur here are is a tensorboard with old and new runs: https://tensorboard.dev/experiment/dD6gjzVYTniVw4GbTwrsxA/#scalars |
@ryanjulian the results are in on the tensorboard in the above comment. |
Fix for issue #39 for a single environment.
A detailed explanation of the logic behind these changes is found here: #39 (comment)