Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptions of action spaces & observation spaces #585

Open
econti opened this issue May 9, 2017 · 7 comments
Open

Descriptions of action spaces & observation spaces #585

econti opened this issue May 9, 2017 · 7 comments
Labels

Comments

@econti
Copy link

@econti econti commented May 9, 2017

Do descriptions of different environment's action spaces & observation spaces exist anywhere? For example, with Humanoid-V1 the action space is a 17-D vector that presumably maps to different body parts, but are these numbers torques, angles, etc.? Same goes with the observation space - a brief description of what the 376 dimensions correspond to would be incredibly useful to know.

@olegklimov

This comment has been minimized.

Copy link
Contributor

@olegklimov olegklimov commented May 9, 2017

You don't want to know this, unless you're engineering (as opposed to learning) a solution.

If you still need to know which is which, just try those actions one by one, watch the robot.

@stevenschmatz

This comment has been minimized.

Copy link
Contributor

@stevenschmatz stevenschmatz commented May 11, 2017

A description could still be useful to understand what the RL algorithm learned.

@econti

This comment has been minimized.

Copy link
Author

@econti econti commented May 12, 2017

I see your point @olegklimov, but to @stevenschmatz's point, just for the sake of understanding what your network has learned, it helps to know some detail about the action space and observation space. I do agree with you though that you technically don't need to know these details if you're learning a solution.

@filmo

This comment has been minimized.

Copy link

@filmo filmo commented Jul 2, 2017

I'm with @econti. I think a description would be good. There may be a case where we want to neutralize certain actions or parts of the observed state space and it's far easier to zero them out if we could consult a description table than to iterate through them all to manually figure them out.

@olegklimov is of course right in that it's not needed for RL learning, but I disagree that there isn't a valid use-case.

@erwincoumans

This comment has been minimized.

Copy link

@erwincoumans erwincoumans commented Mar 2, 2018

I needed similar info, and share what I found out so far:

For the Ant, the observation is:

 def _get_obs(self):
        return np.concatenate([
            self.sim.data.qpos.flat[2:],
            self.sim.data.qvel.flat,
            np.clip(self.sim.data.cfrc_ext, -1, 1).flat,
        ])

self.sim.data.qpos are the positions, with the first 7 element the 3D position (x,y,z) and orientation (quaternion x,y,z,w) of the torso, and the remaining 8 positions are the joint angles.

The [2:], operation removes the first 2 elements from the position, which is the X and Y position of the agent's torso.

self.sim.data.qvel are the velocities, with the first 6 elements the 3D velocity (x,y,z) and 3D angular velocity (x,y,z) and the remaining 8 are the joint velocities.

The cfrc_ext are the external forces (force x,y,z and torque x,y,z) applied to each of the links at the center of mass. For the Ant, this is 14*6: the ground link, the torso link, and 12 links for all legs (3 links for each leg).

For the Humanoid, the observation adds some more fields:

def _get_obs(self):
        data = self.sim.data
        return np.concatenate([data.qpos.flat[2:],
                               data.qvel.flat,
                               data.cinert.flat,
                               data.cvel.flat,
                               data.qfrc_actuator.flat,
                               data.cfrc_ext.flat])

The qfrc_actuator are likely the actuator forces. cinert seems the center of mass based inertia and cvel the center of mass based velocity.

You can track the meaning of the actual joints from the xml file, but it requires some effort. For the humanoid, my PyBullet script that reads the MuJoCo XML file gives:

b'abdomen_z'
b'abdomen_y'
b'abdomen_x'
b'right_hip_x'
b'right_hip_z'
b'right_hip_y'
b'right_knee'
b'right_ankle_y'
b'right_ankle_x'
b'left_hip_x'
b'left_hip_z'
b'left_hip_y'
b'left_knee'
b'left_ankle_y'
b'left_ankle_x'
b'right_shoulder1'
b'right_shoulder2'
b'right_elbow'
b'left_shoulder1'
b'left_shoulder2'
b'left_elbow'
@benelot

This comment has been minimized.

Copy link

@benelot benelot commented May 23, 2018

Maybe this helps to some of you:
-https://github.com/openai/mujoco-py/blob/master/mujoco_py/pxd/mjdata.pxd

There are smaller description at the side of each field and it tells you what it is computed by. Unfortunately, it might only help people working intensively with physics engines (like @erwincoumans).

@QuXinghuaNTU

This comment has been minimized.

Copy link

@QuXinghuaNTU QuXinghuaNTU commented Jul 17, 2019

Is there any document that describes such physical meaning of other environments (e.g., Walker2D, Hopper and HalfCheetah)? The humanoid in pybullet has 44 dimensions in state space, but only 21 of them are explained in physical meaning. Does that mean some dimensions are unknown with respect to the physical meaning? Additionally, the xml file for urdf is a little bit hard to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.