Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix replay buffer dtype #554

Merged
merged 3 commits into from
Mar 5, 2019
Merged

Fix replay buffer dtype #554

merged 3 commits into from
Mar 5, 2019

Conversation

ahtsan
Copy link
Contributor

@ahtsan ahtsan commented Mar 2, 2019

Replay buffer should not have a default dtype, since each of the element in the replay buffer should have dtype same as the source, e.g. observation should have dtype same as env.observation_space. One example is DQN with pixel environment, we want the observation in replay buffer to be type np.uint8, same as the observation space.

@ahtsan ahtsan requested a review from a team as a code owner March 2, 2019 05:26
@codecov
Copy link

codecov bot commented Mar 2, 2019

Codecov Report

Merging #554 into master will decrease coverage by 0.02%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #554      +/-   ##
==========================================
- Coverage   58.37%   58.35%   -0.03%     
==========================================
  Files         135      135              
  Lines        9159     9159              
  Branches     1361     1361              
==========================================
- Hits         5347     5345       -2     
+ Misses       3430     3423       -7     
- Partials      382      391       +9
Impacted Files Coverage Δ
garage/replay_buffer/simple_replay_buffer.py 100% <100%> (ø) ⬆️
garage/replay_buffer/base.py 83.05% <100%> (ø) ⬆️
...arage/tf/samplers/off_policy_vectorized_sampler.py 71.79% <50%> (ø) ⬆️
garage/tf/optimizers/first_order_optimizer.py 65.27% <0%> (-2.78%) ⬇️
garage/tf/policies/categorical_gru_policy.py 80% <0%> (ø) ⬆️
garage/misc/krylov.py 20% <0%> (ø) ⬆️
garage/sampler/stateful_pool.py 53.78% <0%> (ø) ⬆️
garage/tf/policies/gaussian_lstm_policy.py 78.83% <0%> (ø) ⬆️
garage/tf/policies/gaussian_gru_policy.py 78.67% <0%> (ø) ⬆️
garage/tf/policies/categorical_lstm_policy.py 79.83% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8ba37e...be27825. Read the comment docs.

Replay buffer should not have a default dtype, since
each of the element in the replay buffer should have
dtype same as the source, e.g. observation should
have dtype same as env.observation_space.
obs = env.reset()
replay_buffer = SimpleReplayBuffer(
env_spec=env, size_in_transitions=100, time_horizon=1)
replay_buffer.add_transition(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this API operates on collections instead of single values, shouldn't it be add_transitions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API works specifically for VecEnvExecutor, where it adds a single transition for all of the n vec_env, resulting in a list of length n. If I have two vec_env, I will be calling replay_buffer.add_transition(observation=[obs1, obs2], action=[act1, act2]). But I think this assumption is not clear. We should add docstring that this replay buffer only works this way, rename it to something like VecEnvReplayBuffer and create another replay buffer which works with a single environment.
@CatherineSue Please correct me if I am wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is no such assumption.

Copy link
Member

@CatherineSue CatherineSue Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds a collection of transitions instead of a single transition. I agree the api should be add_transitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we can also add more than one transition even when there is only one vec_env. I will rename the api and submit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you would like to also have the add_transition API, it's trivial to define given add_transitions (just wrap the args in a list and call add_transitions)

@ahtsan ahtsan merged commit 16d2d04 into master Mar 5, 2019
@ahtsan ahtsan deleted the replay_buffer_fix branch March 5, 2019 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants