Shift in actions sequence between agent and simulator #238
-
I'm working on a model-based reinforcement learning project on upkie, and I have observed a shift between the actions sequence provided by the agent and the actions sequence executed by the simulator. I think this is caused by the Spine::simulate( nb_substeps) function, where 3 actuation cycles are executed during the reset. To illustrate, I have run the following code and displayed the action (torque) and (observation left_wheel_velocity) inside the env.step().
I compared the results with the observation and action input of Spine::cycle_actuation() (Spine.cpp file) and BulletInterface::cycle() (BulletInterface.cpp) I have summarized the results of the comparison in the following table: could you please help me on this point ? Additional informations:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state. Details on the three-cycle reset behaviorUpon reset, (Because So far we haven't considered reducing this delay. For instance in the PPO balancer we rather add delays to training environments (although that's an action delay rather than an observation one). The spine was mainly designed for 1 ms substeps (to match the real robot), but in the setting you describe substeps are much longer. So I guess your question is: can we reduce the delay between the simulation state and observation dictionary? Reducing substeps delay within the Bullet interfaceLooking at read_joint_sensors(); // currently
read_imu_data(imu_data_, bullet_, robot_, imu_link_index_, params_.dt);
send_commands(data);
bullet_.stepSimulation(); We would need to double check that this doesn't break other things. Stepping last looks like a waste, but I may have forgotten other decision factors behind this 😅 |
Beta Was this translation helpful? Give feedback.
Thank you for taking a look at the details here. That is indeed correct. With the Bullet spine, there is a delay of 3 substep durations between the observation dictionary and the internal simulation state.
Details on the three-cycle reset behavior
Upon reset,
Spine::simulate
cycles the actuation three times, but that is a consequence rather than a cause of howSpine::cycle_actuation
is implemented. The internal state of the spine (looking atactuation_output_
andlatest_replies_
) during these three steps looks like this:(Because
actuation_output_
is a promise, I wrote between brackets values that will actually become available at the next call.)So far we haven't considered reducing thi…