-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save and Restore Environment State #402
Comments
To be more clear: Let's say I am training on a simulation of a game that has a failure state where the character dies and the game resets at the beginning of the level. The additional training needs to focus on the situation that caused the failure state. Because the machine is forced to waste time going through the entire simulation, it is not learning anything. Even worse, if the simulation uses randomness, it may take a very long time for the machine to ever encounter a similar state and have a chance to try alternative actions to discover the results. Ideally, the environment could be saved and restored at any point. I want to be build a training algorithm that explores backward from failures to focus on exploration around failure states. However, it would require the ability to save and restore the environment at any point. All that is needed is the ability to serialize and deserialize the environment state:
|
Not in general. Some kinds of environments allow this, but it's environment-specific. For Mujoco environments, the state can be gotten at with env.data. For Atari environments, something like
but you may have to dig into the Atari / ALE code to get it right. |
Note that for Atari/ALE the I'm trying to reconcile |
This MCTS implementation is an example of a solution that uses this kind of technique. They just use python's |
@tlbtlbtlb, @shelhamer , @colinmorris Thanks for your input! It's too bad there isn't a general solution for this. I would think it would be useful to the whole community to try to make a standard method to accomplish this for each major environment. |
Hello @shelhamer, any update on this? |
I got it to work (in CartPole at least, should work elsewhere) using |
@adamwtow |
@tlbtlbtlb, I know this is an old issue. Let me know if you would like me to start a new one.
Is there a better way to do this than iterating through all the attributes of |
I think the best way to do this with Mujoco is the following (from here: https://openai.github.io/mujoco-py/build/html/reference.html ) |
for cartpole import gym |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@henrycharlesworth @christopherhesse Is it possible to retrieve the state if we only have the observation? This minimum working example:
will result in:
Actually for HalfCheetah it's clear how to use |
I have a problem with env.ale.restoreState(s): I can save and load game states of the same emulator instance, but if I save game states to disc and then try to restore the state, the program crashes with a |
Did you solve it please? I have the same problem |
WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/spaces/box.py:128: UserWarning: WARN: Box bound precision lowered by casting to float32 WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set AttributeError Traceback (most recent call last) in <cell line: 7>() mjsim.pyx in mujoco_py.cymj.MjSim.set_state() AttributeError: 'numpy.ndarray' object has no attribute 'time' |
Is it possible to save the environment state (for example every 10 steps) so that the environment state could be restored to the state.
This could be useful for retrying alternative actions without having to replay the preceding environment states that lead to that state.
This would be especially useful for learning from the state space that preceded failure to focus on learning actions that avoid failure in risky states.
It seems most reinforcement learning systems always restart the environment state upon failure, but it would be much better to restore to a state n steps before failure to learn from important state space.
The text was updated successfully, but these errors were encountered: