Shift in actions sequence between agent and simulator #238

elasriz · 2024-03-29T16:46:32Z

elasriz
Mar 29, 2024

I'm working on a model-based reinforcement learning project on upkie, and I have observed a shift between the actions sequence provided by the agent and the actions sequence executed by the simulator.

I think this is caused by the Spine::simulate( nb_substeps) function, where 3 actuation cycles are executed during the reset.

To illustrate, I have run the following code and displayed the action (torque) and (observation left_wheel_velocity) inside the env.step().

def run(env: upkie.envs.UpkieGroundVelocity):
    action = env.get_neutral_action()

    # Position commands to keep the legs extended
    action["left_hip"]["position"] = 0.0
    action["left_knee"]["position"] = 0.0
    action["right_hip"]["position"] = 0.0
    action["right_knee"]["position"] = 0.0

    # Disable velocity feedback in the wheels
    # (we don't set kp_scale as the neutral action has no position command)
    action["left_wheel"]["kd_scale"] = 0.0
    action["right_wheel"]["kd_scale"] = 0.0
    action["right_wheel"]["maximum_torque"] = 1.0
    action["right_wheel"]["maximum_torque"] = 1.0

    state, info = env.reset()  # connects to the spine

    for step in range(16):
        
        force = env.action_space.sample()

        action["left_wheel"]["feedforward_torque"] = +force
        action["right_wheel"]["feedforward_torque"] = -force
        _, _, terminated, truncated, info = env.step(action)
        if step == 6:

            state, info = env.reset()

I compared the results with the observation and action input of Spine::cycle_actuation() (Spine.cpp file) and BulletInterface::cycle() (BulletInterface.cpp)

I have summarized the results of the comparison in the following table:

could you please help me on this point ?

Additional informations:

I am running the simulation with nb_substeps = 1
frequency: 50 ( I observed the same behaviour with frequency = 200.0 as well)
Env= UpkieGroundVelocity-v3

Answered by stephane-caron

Apr 2, 2024

Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.

Details on the three-cycle reset behavior

Upon reset, Spine::simulate cycles the actuation three times, but that is a consequence rather than a cause of how Spine::cycle_actuation is implemented. The internal state of the spine (looking at actuation_output_ and latest_replies_) during these three steps looks like this:

(Because actuation_output_ is a promise, I wrote between brackets values that will actually become available at the next call.)

So far we haven't considered reducing thi…

View full answer

stephane-caron · 2024-04-02T13:42:07Z

stephane-caron
Apr 2, 2024
Maintainer

Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.

Details on the three-cycle reset behavior

Upon reset, Spine::simulate cycles the actuation three times, but that is a consequence rather than a cause of how Spine::cycle_actuation is implemented. The internal state of the spine (looking at actuation_output_ and latest_replies_) during these three steps looks like this:

(Because actuation_output_ is a promise, I wrote between brackets values that will actually become available at the next call.)

So far we haven't considered reducing this delay. For instance in the PPO balancer we rather add delays to training environments (although that's an action delay rather than an observation one). The spine was mainly designed for 1 ms substeps (to match the real robot), but in the setting you describe substeps are much longer. So I guess your question is: can we reduce the delay between the simulation state and observation dictionary?

Reducing substeps delay within the Bullet interface

Looking at BulletInterface::cycle again, one straightforward way in which we could reduce this number by one would be to step the simulation before reading sensors:

  read_joint_sensors();  // currently
  read_imu_data(imu_data_, bullet_, robot_, imu_link_index_, params_.dt);
  send_commands(data);
  bullet_.stepSimulation();

We would need to double check that this doesn't break other things. Stepping last looks like a waste, but I may have forgotten other decision factors behind this 😅

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upkie

Shift in actions sequence between agent and simulator #238

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Upkie

Shift in actions sequence between agent and simulator #238

elasriz Mar 29, 2024

Details on the three-cycle reset behavior

Replies: 1 comment

stephane-caron Apr 2, 2024 Maintainer

Details on the three-cycle reset behavior

Reducing substeps delay within the Bullet interface

elasriz
Mar 29, 2024

stephane-caron
Apr 2, 2024
Maintainer