prioritized experience replay bug #527

xffxff · 2018-08-20T00:09:40Z

The implementation of prioritized experience replay is a bit wrong. To smaple a minibatch of size k, the range [0, p_total] is divided equally into k ranges. Next, a value is uniformly sampled from each range. Finally the transitions that correspond to each of these sampled values are retrieved from the tree.
Refer to the paper Prioritized Experience Replay. See Appendix B.2.1 for more details

pzhokhov · 2018-08-23T23:04:04Z

Looks legit to me... @siemanko, what do you think?

prioritized experience replay bug

31726f9

pzhokhov requested a review from siemanko August 28, 2018 21:29

pzhokhov merged commit 7859f60 into openai:master Sep 20, 2018

huiwenn pushed a commit to huiwenn/baselines that referenced this pull request Mar 20, 2019

prioritized experience replay bug (openai#527)

59ecb66

kkonen pushed a commit to kkonen/baselines-1 that referenced this pull request Sep 26, 2019

prioritized experience replay bug (openai#527)

8387a79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prioritized experience replay bug #527

prioritized experience replay bug #527

xffxff commented Aug 20, 2018

pzhokhov commented Aug 23, 2018

prioritized experience replay bug #527

prioritized experience replay bug #527

Conversation

xffxff commented Aug 20, 2018

pzhokhov commented Aug 23, 2018