New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All environments produce observations outside of observation space. #39
Comments
I would try to submit a PR for this myself, but it seems like it may require rewriting the observation boundaries for each environment, and I don't know where those values came from. Any help? |
They are mostly from the MuJoCo models themselves. For the robot, they come from the kinematics of each joint. For elements of the environment, they come from the state variables of those objects. Technically, they could appear anywhere so those values might be |
Thanks for the reply! When I said that I don't know where those values come from, I was referring to the hard coded values of the bounds of the observation space in each environment. For example, in hand_low=(-0.5, 0.40, 0.05)
hand_high=(0.5, 1, 0.5)
obj_low=(0, 0.6, 0.02)
obj_high=(0, 0.6, 0.02) These values seem more or less arbitrarily chosen, and they aren't actual bounds on the observations, because MuJoCo doesn't consider these bounds and often violates them. It seems that the easiest fix would be to set each low and high value to |
I tried overwriting those values and there are still issues. In BIG = 10000
hand_low = (-BIG, -BIG, -BIG)
hand_high = (BIG, BIG, BIG)
obj_low = (-BIG, -BIG, -BIG)
obj_high = (BIG, BIG, BIG)
goal_low = (-BIG, -BIG, -BIG)
goal_high = (BIG, BIG, BIG) Note that I think that this may be emblematic of the larger problem that MuJoco doesn't seem to interact with those observation bounds in any way, so there is no way to guarantee that the objects will actually stay within those bounds. This isn't really something I'm personally interested in working on, I'm just going to ditch RLlib and look for/make an implementation that isn't conditioned on the observation space not being violated. I suppose I'll leave this issue open, since it does seem like this might be an issue in the future for code that expects this condition to be met. |
@ryanjulian @avnishn I was working on issue #31 and realized this and that understanding this issue would help with that one as well. I have been recreating the initial issue with a slightly modified version of the launcher mtcrawshaw wrote above. The first change is that Mujoco environments have a
yet in the script it thinks high and low are arrays of I modified the script one more time to just hardcode the values for testing purposes. the final launcher script is below.
And I was able to recreate the error. I now have a few questions regarding how to fix this. What do these observation space upper and lower bounds represent? Will I need to add a bounds check to every step function for every environment? And how does the reward get modified? Do I limit the observation to be within the bounds or do I raise an error of some sort? Any advice would be greatly appreciated. |
@ryanjulian @avnishn Update: I have a quick fix which at every step checks if the bounds are within every step count. This is essentially adding the following code to every environment in the def _get_obs(self):
hand = self.get_endeff_pos()
objPos = self.data.get_geom_xpos('objGeom')
flat_obs = np.concatenate((hand, objPos))
# My Fix - Check if observation space is violated
if not self.observation_space.contains(flat_obs):
raise ObservationSpaceBoundsError However this is not very intelligent, as this does not make the model less likely to go out of bounds, and in the case of many models (for example Who can I talk to to either get the real bounds of each of these environments, and/or give advice on updating the reward function to prevent the bounds from ever being broken? |
@ryanjulian @avnishn I spent some more time looking at where the values used to define an environment's observation space are being updated, and they are being changed when we call For example,
|
Hi @adibellathur -- sorry for taking so long to respond. thank you so much for making an effort to fix this. i really appreciate it. your fix in #39 (comment) is a good start, especially to help you find the roots of the problem. it would also be a great feature for an automated test which makes sure this doesn't happen again (i.e. it could be included in as you intuited, it's probably not a great permanent fix for a couple reasons:
(2) is why it's important to ensure that this issue is fixed by making the code correct from the start. to be trivially-correct, you can just set all the observation spaces to +/- inf in all axes. this may seem a little daft, but is actually really common in these simulations. the observation space bounds (as opposed to the action space bounds) are not even used by common algorithms, but they can be helpful for debugging. if you want to be a little more helpful, you can deduce some real observation space bounds which are loose but correct. these don't have to be particularly-tight to be correct, for instance you could make them all the maximum/minimum limits of the workspace or robot arm gripper reach (whichever is largest) for all dimensions, assuming that they are all cartesian positions. how tight you can make them depends on how much time you invest in determining the model's actual limits. to determine tight bounds, you will have to delve into the MuJoCo model XMLs and probably run the simulation interactively, and use your judgement to figure out each of these. unfortunately, the original designer just didn't do this, which is how we got here. here's a quick blueprint of how this would work
|
Thanks for the input Ryan. I haven't looked at this issue in awhile but I am the OP. I don't think that setting the observation bounds to +/- inf will solve the issue without some more modifications. Somewhere up in the first couple comments I wrote about how in |
Hello @ryanjulian Thank you so much for all the insight! This all makes a lot of sense. From what I can understand, the issue is really that the bounds hardcoded into the environments are not representative of the actual environment simulation, as opposed to the simulation not following the hardcoded bounds. So the proper fix for this issue is to update the bounds so they more accurately reflect the bounds of the actual simulation environment as opposed to modifying the simulation so they adhere to the arbitrary bounds. Is this the correct way to look at the problem? Overall, my plan forward is as follows:
|
@mtcrawshaw thanks for following up so fast! and thanks for pointing out my error. you are correct -- we need to update the initial arm position sampling code to avoid this issue. @adibellathur that sounds right to me! |
hello @mtcrawshaw Thanks for the insight! I'll try seeing if there is a finite set of bounds that can be used for the observation_space, so we don't have to worry about passing |
@ryanjulian @avnishn Update on the observation space: I've been working on getting the The observation_space for this env is made up of:
and I updated the bounds to the following values:
For the object, it the bounds needed to be on the table plane in the environment, since the initial position is randomly instantiated from these bounds (@mtcrawshaw can you confirm?), so I found the x and y bounds for the table plane and used those values. Anything outside of the table plane caused the disk to behave weirdly. The z values were found by finding the shortest and highest point of the middle of the disk, as explained in the below image (the highest z value is when the object gets tipped over and is balancing on an edge) I added a small amount of buffer, as oftentimes the bounds would be met, but due to rounding the number wasn't exact and broke the observation space bounds (for example resting state of the puck had a z of 0.01499 most of the time, and occasionally clipped through the bottom of the floor with a minimum z value of 0.013) For the arm, I stuck the arm out as far as it could go, and it never seemed to top 1.0 units in any direction. I made the arm's z bound 0 just so it couldn't clip through the table plane. These bounds don't seem particularly strict, but since they are not used for any random initialization and the arm is much more strongly constrained by Mujoco I didn't worry too much about finding the exact bounds. So far I've been running this environment for 15000 steps just on repeat for the past hour and haven't come across any times where these bounds are broken, with one exception. Occasionally the arm will hit the puck, and the puck will start to roll, it then rolls off the table and breaks the bounds. I figure this should be flagged anyway so I didn't worry about that. Let me know if this approach is correct. I can start doing this for more environments if it seems like I'm solving the problem correctly. |
Fix for issue Farama-Foundation#39 A detailed explanation of the logic behind these changes is found here: Farama-Foundation#39 (comment)
Excellent work. This is definitely the correct approach. In my experience, the radius of the sawyer is around 1.2m at most. I think the origin of the environment is at the Sawyer base, so the gripper tip could never be much greater than 1.2. Of course, because of geometry, many arm motions are not possible at that radius (basically just pushing and pointing). If you haven't yet, you should go look at the XML files associated with each environment in the assets directory. This will contain a lot of the coordinates and sizes which you might currently be determining experimentally? |
@adibellathur @haydenshively is this still an issue? |
Yes I believe it is. @adibellathur please correct me if I'm wrong, but I believe what we report as an "observation space" is really an "initialization space." Though we've changed its dimensions in the past weeks, we haven't changed its meaning. |
Yeah the root of this issue initially is because self.observation_space is really a misnomer. It’s the space the initialized env first used to be within when random_init=True, not the true observation space of the environment. @haydenshively did this get renamed in the new api? If not we should rename the variable to something else, and either leave the observation space implicit like gym does, or add a new object variable to let people access the true observation space of the env. |
Hi all! I was wondering if there was any more progress here? I have been trying, as a familiarizing exercise, to run RLLib's PPO on
which does appear to be outside the bounds of the |
Hi @Shushman thanks for checking out Metaworld! I think to begin with, in order to answer your question better, can you please tell us the commit of metaworld that you are using? The reason why I ask is that in the last 3 weeks, we've updated Metaworld to a brand new API and changed some things internally with environments. To accompany this change, we've been trying to solve this issue via #181. I'm sure that there are modifications that are good places to start, and the set of modifications that youre looking for may in fact exist. @haydenshively may be able to comment more. Did this help? Lmk 😄 |
@Shushman and @mtcrawshaw, I believe that we have closed this issue via #181. Please do let us know if you have anymore questions or problems, we're happy to help! |
Thanks @avnishn I had temporarily reverted to the v1 API for now but I'll check out the v2 soon. I also wanted to confirm that the plots in the paper on Arxiv are on the v1 problems right? |
That's a good question @Shushman . We don't have access to the version of metaworld that was used to generate the plots. The version with the new API, (what is most recently on master) is most true to the original paper. We strongly urge you to use that when using metaworld for your benchmarking. |
@Shushman note that the v2 envs (referring to any environment with modifications from the published work) exist in this repository but are not yet part of the benchmark. All environments in the benchmark are still faithful to the original paper. If you are referring to the "new" revised API, that's a different way of accessing exactly the same challenges from the paper. The "new" API is actually more faithful to the one used in the paper, and I urge you to use it. |
I updated to the latest master of Aug 17.
But then if I check the low and high of the observation space
Is this not out of bounds for the last but one element? (0.0 < 0.8) |
The observation spaces were written and tested for the case where the environments are fully observable. When they are partially observable, the last 3 elements of the observation get zeroed out, which is what's happening here... and then the observation space is incorrect. I can push a fix tonight. |
The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of
TIMESTEPS_PER_ENV
, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.The text was updated successfully, but these errors were encountered: