Hello,
right now the viewer.launch function expects its policy parameter to be a function mapping observation to action.
When pressing the Return-key the environment resets but the policy might also have some state that should reset.
How can this functionality be added?
Right now the only workaround i could think of would be to explicitly padd time to the observation which seems a bit weird since it requires a modification of the environment itself.
Thank you!
Hello,
right now the
viewer.launchfunction expects its policy parameter to be a function mapping observation to action.When pressing the Return-key the environment resets but the policy might also have some state that should reset.
How can this functionality be added?
Right now the only workaround i could think of would be to explicitly padd time to the observation which seems a bit weird since it requires a modification of the environment itself.
Thank you!