Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewards Not Zero after Done #43

Closed
kayuksel opened this issue Aug 27, 2021 · 6 comments
Closed

Rewards Not Zero after Done #43

kayuksel opened this issue Aug 27, 2021 · 6 comments

Comments

@kayuksel
Copy link

Hello,

I have extended the PyTorch example with an Augmented Random Search implementation:
https://github.com/kayuksel/braxars/blob/main/braxars_multi.py

However, what I have noticed that the reward of a batch-member is not zero after being done.
What are the values that are returned for "done" members? Should I treat their reward as zero?
I am now resetting the environment when that happens, I couldn't find how to reset done-members.

Another question I have is on the rendering. Are we able to render while using a notebook only?
Is it possible to render a selected (e.g. best-performing) batch member or all in the same render?

@lebrice
Copy link
Contributor

lebrice commented Aug 27, 2021

When using the gym.vector.VectorEnv API (which is what you're using in this case since batch_size=2048), you don't need to reset the individual envs because they already auto-reset when the episode in any given env is done.

Also just FYI:

  • when you have done=True for the environment at a given index env_idx, then obs[env_idx] is the first observation of the next episode, not the final observation of the previous episode.
  • Likewise, the reward for env env_idx when done[env_idx]==True is the reward associated with the last action you sent. You will have non-zero rewards at that index after that step: since those will be the rewards of the next episode!

Hope this helps :)

@kayuksel
Copy link
Author

@lebrice Thank you very much for the help!
But if they're on an auto-reset mode, done[env_idx] of next episode should be False.
I have checked the next value of the same env_idx but it was still True. That's a bug?

@lebrice
Copy link
Contributor

lebrice commented Aug 28, 2021

done[env_idx] of next episode should be False.

No! (Edit: maybe I'm misunderstanding your problem though)

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

@kayuksel
Copy link
Author

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

@lebrice Yes.

@cdfreeman-google
Copy link
Collaborator

We have some new reset logic plumbing that should resolve this issue in the next couple of days. Originally, we found that we didn't really need to reset during rollouts--we'd just run a rollout for a fixed episode length and then mask out frames after the point at which a done was triggered. Admittedly, this is not what folks are used to, so we'll introduce a wrapper that does that actual resetting logic as you'd usually expect.

@erikfrey
Copy link
Collaborator

OK! This should be addressed. Envs by default now reset after done=True. You can still get the old behavior if you wish to control auto-resetting yourself, by calling envs.create(..., auto_reset=False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants