-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replay buffer dtype #554
Conversation
Codecov Report
@@ Coverage Diff @@
## master #554 +/- ##
==========================================
- Coverage 58.37% 58.35% -0.03%
==========================================
Files 135 135
Lines 9159 9159
Branches 1361 1361
==========================================
- Hits 5347 5345 -2
+ Misses 3430 3423 -7
- Partials 382 391 +9
Continue to review full report at Codecov.
|
Replay buffer should not have a default dtype, since each of the element in the replay buffer should have dtype same as the source, e.g. observation should have dtype same as env.observation_space.
0643db5
to
6dd4c69
Compare
obs = env.reset() | ||
replay_buffer = SimpleReplayBuffer( | ||
env_spec=env, size_in_transitions=100, time_horizon=1) | ||
replay_buffer.add_transition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this API operates on collections instead of single values, shouldn't it be add_transitions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API works specifically for VecEnvExecutor
, where it adds a single transition for all of the n vec_env
, resulting in a list of length n. If I have two vec_env
, I will be calling replay_buffer.add_transition(observation=[obs1, obs2], action=[act1, act2])
. But I think this assumption is not clear. We should add docstring that this replay buffer only works this way, rename it to something like VecEnvReplayBuffer
and create another replay buffer which works with a single environment.
@CatherineSue Please correct me if I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No there is no such assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It adds a collection of transitions instead of a single transition. I agree the api should be add_transitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we can also add more than one transition even when there is only one vec_env
. I will rename the api and submit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you would like to also have the add_transition
API, it's trivial to define given add_transitions
(just wrap the args in a list and call add_transitions
)
Replay buffer should not have a default dtype, since each of the element in the replay buffer should have dtype same as the source, e.g. observation should have dtype same as env.observation_space. One example is DQN with pixel environment, we want the observation in replay buffer to be type
np.uint8
, same as the observation space.