Save and Restore Environment State #402

ricklove · 2016-11-13T04:13:54Z

Is it possible to save the environment state (for example every 10 steps) so that the environment state could be restored to the state.

This could be useful for retrying alternative actions without having to replay the preceding environment states that lead to that state.

This would be especially useful for learning from the state space that preceded failure to focus on learning actions that avoid failure in risky states.

It seems most reinforcement learning systems always restart the environment state upon failure, but it would be much better to restore to a state n steps before failure to learn from important state space.

ricklove · 2016-11-15T13:12:23Z

To be more clear:

Let's say I am training on a simulation of a game that has a failure state where the character dies and the game resets at the beginning of the level.

The additional training needs to focus on the situation that caused the failure state. Because the machine is forced to waste time going through the entire simulation, it is not learning anything.

Even worse, if the simulation uses randomness, it may take a very long time for the machine to ever encounter a similar state and have a chance to try alternative actions to discover the results.

Ideally, the environment could be saved and restored at any point.

I want to be build a training algorithm that explores backward from failures to focus on exploration around failure states. However, it would require the ability to save and restore the environment at any point.

All that is needed is the ability to serialize and deserialize the environment state:

envData = SaveEnvironment(env)
env = RestoreEnvironemnt(envData)

tlbtlbtlb · 2016-11-15T18:59:05Z

Not in general. Some kinds of environments allow this, but it's environment-specific. For Mujoco environments, the state can be gotten at with env.data. For Atari environments, something like

   snapshot = env.ale.cloneState()
   ...
   env.ale.restoreState(snapshot)

but you may have to dig into the Atari / ALE code to get it right.

shelhamer · 2016-11-15T19:41:11Z

Note that for Atari/ALE the cloneSystemState/restoreSystemState methods are necessary to serialize the complete state (including pseudorandomness) if you want a fully deterministic return to the saved state.

I'm trying to reconcile atari-py and the gym with the version of ALE required to do this right, and will make a PR soon.

colinmorris · 2016-12-13T00:34:48Z

This MCTS implementation is an example of a solution that uses this kind of technique. They just use python's copy library to make copies of the env.

ricklove · 2017-01-17T22:26:11Z

@tlbtlbtlb, @shelhamer , @colinmorris Thanks for your input!

It's too bad there isn't a general solution for this. I would think it would be useful to the whole community to try to make a standard method to accomplish this for each major environment.

yenchenlin · 2017-02-04T16:54:03Z

Hello @shelhamer, any update on this?

adamwtow · 2017-02-08T09:24:05Z

I got it to work (in CartPole at least, should work elsewhere) using save_point_env = copy.deepcopy(env). If doing any rendering, need to make sure you close the viewer after every render with: env.render(close=True).

justheuristic · 2017-04-25T02:40:56Z

@adamwtow
Confirmed, here's a small wrapper for the matter - url. (built for the mcts tutorial)

ethanabrooks · 2018-12-26T22:40:06Z

@tlbtlbtlb, I know this is an old issue. Let me know if you would like me to start a new one.

For Mujoco environments, the state can be gotten at with env.data.

Is there a better way to do this than iterating through all the attributes of sim.data and doing some kind of try/except copy them?

henrycharlesworth · 2019-04-12T13:27:14Z

I think the best way to do this with Mujoco is the following (from here: https://openai.github.io/mujoco-py/build/html/reference.html )
You just do:
saved_state = env.sim.get_state()
Then to restore:
env.sim.set_state(saved_state)

sebtac · 2019-05-07T00:10:51Z

for cartpole

import gym
env = gym.make("CartPole-v0")
env.reset()
print(env.env.state)

stale · 2019-09-28T23:10:47Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

DanielTakeshi · 2021-02-08T22:17:09Z

@henrycharlesworth @christopherhesse

Is it possible to retrieve the state if we only have the observation? This minimum working example:

import gym
env = gym.make('HalfCheetah-v3')
o = env.reset()
env.sim.set_state(o)

will result in:

Traceback (most recent call last):
  File "mwe.py", line 4, in <module>
    env.sim.set_state(o)
  File "mujoco_py/mjsim.pyx", line 271, in mujoco_py.cymj.MjSim.set_state
AttributeError: 'numpy.ndarray' object has no attribute 'time'

Actually for HalfCheetah it's clear how to use qpos and qvel and combine them to form the observation. But the MjSimState will also have time, act, and udd_state properties.

samiede · 2021-07-05T08:56:02Z

I have a problem with env.ale.restoreState(s):

I can save and load game states of the same emulator instance, but if I save game states to disc and then try to restore the state, the program crashes with a Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

hahahaCode · 2022-10-20T08:28:52Z

@henrycharlesworth @christopherhesse

Is it possible to retrieve the state if we only have the observation? This minimum working example:
import gym
env = gym.make('HalfCheetah-v3')
o = env.reset()
env.sim.set_state(o)
will result in:
Traceback (most recent call last):
  File "mwe.py", line 4, in <module>
    env.sim.set_state(o)
  File "mujoco_py/mjsim.pyx", line 271, in mujoco_py.cymj.MjSim.set_state
AttributeError: 'numpy.ndarray' object has no attribute 'time'
Actually for HalfCheetah it's clear how to use qpos and qvel and combine them to form the observation. But the MjSimState will also have time, act, and udd_state properties.

Did you solve it please? I have the same problem

rebecajohn · 2023-08-12T05:48:16Z

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: should_run_async will not call transform_cell automatically in the future. Please pass the result to transformed_cell argument and any exception that happen during thetransform in preprocessing_exc_tuple in IPython 7.17 and above.
and should_run_async(code)

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:128: UserWarning: WARN: Box bound precision lowered by casting to float32
upp_bounded = ~self.bounded_below & self.bounded_above

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.
@AbstractMethod

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.

AttributeError Traceback (most recent call last)

in <cell line: 7>()
6
7 for obs in observations:
----> 8 env.sim.set_state(obs)
9 env.render(mode='rgb_array')
10

mjsim.pyx in mujoco_py.cymj.MjSim.set_state()

AttributeError: 'numpy.ndarray' object has no attribute 'time'

nwayt001 mentioned this issue Jan 19, 2017

is resetting to a random frame possible? #371

Closed

shelhamer mentioned this issue Apr 13, 2017

Update ALE for ROM fixes and snapshot/restore of state openai/atari-py#15

Merged

shelhamer mentioned this issue May 2, 2017

Save and Restore Atari Environment State #575

Merged

hagrid67 mentioned this issue Sep 27, 2017

Save / restore environment (wishlist) stanfordnmbl/osim-rl#79

Closed

christopherhesse mentioned this issue Feb 9, 2019

Make a copy of current env #1292

Closed

christopherhesse mentioned this issue Mar 29, 2019

Restore the MuJoCo env #991

Closed

christopherhesse mentioned this issue May 3, 2019

Access underlying dynamics function #1463

Closed

christopherhesse mentioned this issue Jul 26, 2019

Reset the environment to a specific state in Mujoco tasks. #1617

Closed

christopherhesse mentioned this issue Aug 16, 2019

done = True when ale.lives != 0 #1638

Closed

stale bot added the stale label Sep 28, 2019

stale bot closed this as completed Oct 12, 2019

floringogianu mentioned this issue Jan 28, 2020

Save/Resume support benelot/pybullet-gym#42

Closed

hfeniser mentioned this issue May 28, 2020

Saving and restoring states Farama-Foundation/Minigrid#95

Closed

qgallouedec mentioned this issue Jun 22, 2022

Physics client closed during environment destruction qgallouedec/panda-gym#32

Closed

sdpkjc mentioned this issue Oct 12, 2023

[Proposal] Deepcopy of an Environment Object Farama-Foundation/Gymnasium#737

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and Restore Environment State #402

Save and Restore Environment State #402

ricklove commented Nov 13, 2016

ricklove commented Nov 15, 2016

tlbtlbtlb commented Nov 15, 2016

shelhamer commented Nov 15, 2016

colinmorris commented Dec 13, 2016

ricklove commented Jan 17, 2017

yenchenlin commented Feb 4, 2017

adamwtow commented Feb 8, 2017

justheuristic commented Apr 25, 2017 •

edited

Loading

ethanabrooks commented Dec 26, 2018 •

edited

Loading

henrycharlesworth commented Apr 12, 2019

sebtac commented May 7, 2019

stale bot commented Sep 28, 2019

DanielTakeshi commented Feb 8, 2021 •

edited

Loading

samiede commented Jul 5, 2021

hahahaCode commented Oct 20, 2022

rebecajohn commented Aug 12, 2023

Save and Restore Environment State #402

Save and Restore Environment State #402

Comments

ricklove commented Nov 13, 2016

ricklove commented Nov 15, 2016

tlbtlbtlb commented Nov 15, 2016

shelhamer commented Nov 15, 2016

colinmorris commented Dec 13, 2016

ricklove commented Jan 17, 2017

yenchenlin commented Feb 4, 2017

adamwtow commented Feb 8, 2017

justheuristic commented Apr 25, 2017 • edited Loading

ethanabrooks commented Dec 26, 2018 • edited Loading

henrycharlesworth commented Apr 12, 2019

sebtac commented May 7, 2019

stale bot commented Sep 28, 2019

DanielTakeshi commented Feb 8, 2021 • edited Loading

samiede commented Jul 5, 2021

hahahaCode commented Oct 20, 2022

rebecajohn commented Aug 12, 2023

justheuristic commented Apr 25, 2017 •

edited

Loading

ethanabrooks commented Dec 26, 2018 •

edited

Loading

DanielTakeshi commented Feb 8, 2021 •

edited

Loading