Rewards Not Zero after Done #43

kayuksel · 2021-08-27T17:13:15Z

Hello,

I have extended the PyTorch example with an Augmented Random Search implementation:
https://github.com/kayuksel/braxars/blob/main/braxars_multi.py

However, what I have noticed that the reward of a batch-member is not zero after being done.
What are the values that are returned for "done" members? Should I treat their reward as zero?
I am now resetting the environment when that happens, I couldn't find how to reset done-members.

Another question I have is on the rendering. Are we able to render while using a notebook only?
Is it possible to render a selected (e.g. best-performing) batch member or all in the same render?

lebrice · 2021-08-27T21:08:18Z

When using the gym.vector.VectorEnv API (which is what you're using in this case since batch_size=2048), you don't need to reset the individual envs because they already auto-reset when the episode in any given env is done.

Also just FYI:

when you have done=True for the environment at a given index env_idx, then obs[env_idx] is the first observation of the next episode, not the final observation of the previous episode.
Likewise, the reward for env env_idx when done[env_idx]==True is the reward associated with the last action you sent. You will have non-zero rewards at that index after that step: since those will be the rewards of the next episode!

Hope this helps :)

kayuksel · 2021-08-27T21:39:32Z

@lebrice Thank you very much for the help!
But if they're on an auto-reset mode, done[env_idx] of next episode should be False.
I have checked the next value of the same env_idx but it was still True. That's a bug?

lebrice · 2021-08-28T01:59:41Z

done[env_idx] of next episode should be False.

No! (Edit: maybe I'm misunderstanding your problem though)

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

kayuksel · 2021-08-29T17:51:14Z

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

@lebrice Yes.

cdfreeman-google · 2021-08-31T18:44:47Z

We have some new reset logic plumbing that should resolve this issue in the next couple of days. Originally, we found that we didn't really need to reset during rollouts--we'd just run a rollout for a fixed episode length and then mask out frames after the point at which a done was triggered. Admittedly, this is not what folks are used to, so we'll introduce a wrapper that does that actual resetting logic as you'd usually expect.

erikfrey · 2021-09-11T18:38:35Z

OK! This should be addressed. Envs by default now reset after done=True. You can still get the old behavior if you wish to control auto-resetting yourself, by calling envs.create(..., auto_reset=False)

erikfrey closed this as completed Sep 11, 2021

erikfrey mentioned this issue Sep 11, 2021

autoreset batch environments when done=True #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewards Not Zero after Done #43

Rewards Not Zero after Done #43

kayuksel commented Aug 27, 2021

lebrice commented Aug 27, 2021 •

edited

Loading

kayuksel commented Aug 27, 2021

lebrice commented Aug 28, 2021 •

edited

Loading

kayuksel commented Aug 29, 2021

cdfreeman-google commented Aug 31, 2021

erikfrey commented Sep 11, 2021

Rewards Not Zero after Done #43

Rewards Not Zero after Done #43

Comments

kayuksel commented Aug 27, 2021

lebrice commented Aug 27, 2021 • edited Loading

kayuksel commented Aug 27, 2021

lebrice commented Aug 28, 2021 • edited Loading

kayuksel commented Aug 29, 2021

cdfreeman-google commented Aug 31, 2021

erikfrey commented Sep 11, 2021

lebrice commented Aug 27, 2021 •

edited

Loading

lebrice commented Aug 28, 2021 •

edited

Loading