Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and Restore Environment State #402

Closed
ricklove opened this issue Nov 13, 2016 · 16 comments
Closed

Save and Restore Environment State #402

ricklove opened this issue Nov 13, 2016 · 16 comments
Labels

Comments

@ricklove
Copy link

Is it possible to save the environment state (for example every 10 steps) so that the environment state could be restored to the state.

This could be useful for retrying alternative actions without having to replay the preceding environment states that lead to that state.

This would be especially useful for learning from the state space that preceded failure to focus on learning actions that avoid failure in risky states.

It seems most reinforcement learning systems always restart the environment state upon failure, but it would be much better to restore to a state n steps before failure to learn from important state space.

@ricklove
Copy link
Author

To be more clear:

Let's say I am training on a simulation of a game that has a failure state where the character dies and the game resets at the beginning of the level.

The additional training needs to focus on the situation that caused the failure state. Because the machine is forced to waste time going through the entire simulation, it is not learning anything.

Even worse, if the simulation uses randomness, it may take a very long time for the machine to ever encounter a similar state and have a chance to try alternative actions to discover the results.

Ideally, the environment could be saved and restored at any point.

I want to be build a training algorithm that explores backward from failures to focus on exploration around failure states. However, it would require the ability to save and restore the environment at any point.

All that is needed is the ability to serialize and deserialize the environment state:

  • envData = SaveEnvironment(env)
  • env = RestoreEnvironemnt(envData)

@tlbtlbtlb
Copy link
Contributor

Not in general. Some kinds of environments allow this, but it's environment-specific. For Mujoco environments, the state can be gotten at with env.data. For Atari environments, something like

   snapshot = env.ale.cloneState()
   ...
   env.ale.restoreState(snapshot)

but you may have to dig into the Atari / ALE code to get it right.

@shelhamer
Copy link
Contributor

Note that for Atari/ALE the cloneSystemState/restoreSystemState methods are necessary to serialize the complete state (including pseudorandomness) if you want a fully deterministic return to the saved state.

I'm trying to reconcile atari-py and the gym with the version of ALE required to do this right, and will make a PR soon.

@colinmorris
Copy link
Contributor

This MCTS implementation is an example of a solution that uses this kind of technique. They just use python's copy library to make copies of the env.

@ricklove
Copy link
Author

@tlbtlbtlb, @shelhamer , @colinmorris Thanks for your input!

It's too bad there isn't a general solution for this. I would think it would be useful to the whole community to try to make a standard method to accomplish this for each major environment.

@yenchenlin
Copy link

Hello @shelhamer, any update on this?

@adamwtow
Copy link

adamwtow commented Feb 8, 2017

I got it to work (in CartPole at least, should work elsewhere) using save_point_env = copy.deepcopy(env). If doing any rendering, need to make sure you close the viewer after every render with: env.render(close=True).

@justheuristic
Copy link
Contributor

justheuristic commented Apr 25, 2017

@adamwtow
Confirmed, here's a small wrapper for the matter - url. (built for the mcts tutorial)

@ethanabrooks
Copy link
Contributor

ethanabrooks commented Dec 26, 2018

@tlbtlbtlb, I know this is an old issue. Let me know if you would like me to start a new one.

For Mujoco environments, the state can be gotten at with env.data.

Is there a better way to do this than iterating through all the attributes of sim.data and doing some kind of try/except copy them?

@henrycharlesworth
Copy link

I think the best way to do this with Mujoco is the following (from here: https://openai.github.io/mujoco-py/build/html/reference.html )
You just do:
saved_state = env.sim.get_state()
Then to restore:
env.sim.set_state(saved_state)

@sebtac
Copy link

sebtac commented May 7, 2019

for cartpole

import gym
env = gym.make("CartPole-v0")
env.reset()
print(env.env.state)

@stale
Copy link

stale bot commented Sep 28, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@DanielTakeshi
Copy link

DanielTakeshi commented Feb 8, 2021

@henrycharlesworth @christopherhesse

Is it possible to retrieve the state if we only have the observation? This minimum working example:

import gym
env = gym.make('HalfCheetah-v3')
o = env.reset()
env.sim.set_state(o)

will result in:

Traceback (most recent call last):
  File "mwe.py", line 4, in <module>
    env.sim.set_state(o)
  File "mujoco_py/mjsim.pyx", line 271, in mujoco_py.cymj.MjSim.set_state
AttributeError: 'numpy.ndarray' object has no attribute 'time'

Actually for HalfCheetah it's clear how to use qpos and qvel and combine them to form the observation. But the MjSimState will also have time, act, and udd_state properties.

@samiede
Copy link

samiede commented Jul 5, 2021

I have a problem with env.ale.restoreState(s):

I can save and load game states of the same emulator instance, but if I save game states to disc and then try to restore the state, the program crashes with a Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

@hahahaCode
Copy link

@henrycharlesworth @christopherhesse

Is it possible to retrieve the state if we only have the observation? This minimum working example:

import gym
env = gym.make('HalfCheetah-v3')
o = env.reset()
env.sim.set_state(o)

will result in:

Traceback (most recent call last):
  File "mwe.py", line 4, in <module>
    env.sim.set_state(o)
  File "mujoco_py/mjsim.pyx", line 271, in mujoco_py.cymj.MjSim.set_state
AttributeError: 'numpy.ndarray' object has no attribute 'time'

Actually for HalfCheetah it's clear how to use qpos and qvel and combine them to form the observation. But the MjSimState will also have time, act, and udd_state properties.

Did you solve it please? I have the same problem

@rebecajohn
Copy link

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: should_run_async will not call transform_cell automatically in the future. Please pass the result to transformed_cell argument and any exception that happen during thetransform in preprocessing_exc_tuple in IPython 7.17 and above.
and should_run_async(code)

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:128: UserWarning: WARN: Box bound precision lowered by casting to float32
upp_bounded = ~self.bounded_below & self.bounded_above

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.
@AbstractMethod

WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future.


AttributeError Traceback (most recent call last)

in <cell line: 7>()
6
7 for obs in observations:
----> 8 env.sim.set_state(obs)
9 env.render(mode='rgb_array')
10

mjsim.pyx in mujoco_py.cymj.MjSim.set_state()

AttributeError: 'numpy.ndarray' object has no attribute 'time'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests