-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepcopy env not working as expected #3254
Comments
This is an interesting research question, the problem is you have a bug and an incorrect assumption The bug is after you take the alternative action then you need to modify you env to use the The incorrect assumption is that if action X fails, then if I revert and take action Y then this won't fail. The problem is that in this example for certain states, it is possible that both action X and Y will cause the environment to terminate. My code
Repeating this experiment a 1000 times then I didn't find a single case where the alternative action didn't also cause the environment to terminate |
Thanks for looking! It does seem that all terminated states cannot be replayed with a different action. I wonder why not. Anyway, can you rewind 2 actions and it won't terminate :) I'm considering storing these states and rewarding differently. The state is on a boundary where if one action is taken it leads to an unrecoverable termination.... The bug still stands: I think you are running into the same bug. In my original code I do use So, instead I just store all the actions in a replay memory and make a new environment with the same seed, and then replay those actions until 1 before I terminate. Try this and you'll find states that you can recover from:
|
I can't replicate the problem that you are talking about using both Gym and Gymnasium import gym
import gymnasium
print("Gym")
env = gym.make("CartPole-v1")
env.reset()
print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state)
print(f'Gymnasium')
env = gymnasium.make("CartPole-v1")
env.reset()
print(env.unwrapped.state)
copied_env = copy.deepcopy(env)
env.step(env.action_space.sample())
print(copied_env.unwrapped.state) |
Here's how to see the issue:
Output to look at:
Why are parts of the next_state equal to the next_state2 which came from a different action on a copied environment? |
Looking at the observation space (https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space), all this means is a single action hasn't caused the cart position or angular velocity to change. But going back to your original post, this is not a bug and deepcopy of CartPole works as expected. |
This has been bugging me. How can it be that if you go: right, right, right, right, right, right This isn't a new issue:
I'm not sure why the team decided back then to not change the behavior? I'd recommend doing so. If if changing the behavior to be the semi-implicit by default isn't possible, then at least document that it's the preferred method and how game engines and reality works. |
Here's an attempt at being able to replay with a different action. If the step terminates, use a copy of the environment and use a different action.
Two problems:
prev_env.step()
causes a WARN:print("state = ", prev_env.unwrapped.state)
will print out the NEXT STATE, meaning it's internal state changed even when.step()
was called onenv
notprev_env
.Any help is appreciated!
The text was updated successfully, but these errors were encountered: