Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test Prioritised Experience Replay Memory #16

Closed
Kaixhin opened this issue Apr 18, 2018 · 5 comments
Closed

Unit test Prioritised Experience Replay Memory #16

Kaixhin opened this issue Apr 18, 2018 · 5 comments
Labels

Comments

@Kaixhin
Copy link
Owner

Kaixhin commented Apr 18, 2018

PER was reported to cause issues (decreasing the performance of a DQN) when ported to another codebase. Although PER can cause performance to decrease, it is still likely that there exists a bug within it.

@Kaixhin Kaixhin added the bug label Apr 18, 2018
@Kaixhin Kaixhin changed the title Unit Test Prioritised Experience Replay Memory Unit test Prioritised Experience Replay Memory Apr 18, 2018
@Ashutosh-Adhikari
Copy link

Ashutosh-Adhikari commented Apr 23, 2018

I am not sure whether what I am going to say is the correct logic behind PER or not.

What current code does : In the training loop, when we do mem.append(), we are keeping the priority to be some default priority, transitions.max().

Shouldn't we do this? : Calculate the priority before appending, and append with that priority. This will keep the complexity same. And attach the priority to the sample right away.

Such level of specification is not found in the paper, to the best of my knowledge.

@Kaixhin
Copy link
Owner Author

Kaixhin commented Apr 23, 2018

Adding new transitions with the max priority is in line 6 of the algorithm in the PER paper; the initial value, 1, is given in line 2. Also, calculating the priority means having access to the future states (even more states when calculating multi-step returns) and doing the whole target calculation on a single sample, so it's not that cheap.

@marintoro
Copy link

Just read that in the paper DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY from D. Horgan.
"In Prioritized DQN (Schaul et al., 2016) priorities for new transitions were initialized to the maximum priority seen so far, and only updated once they were sampled."

But it's interesting to notice that they changed it cause this was not scaling well (this article is all about learning with a lot of different actors).

@Ashutosh-Adhikari
Copy link

@Kaixhin Yep, I understand that now when you say so about n-step TD.

@Kaixhin
Copy link
Owner Author

Kaixhin commented May 6, 2018

Results on 3 games so far look promising, so closing unless a specific problem is identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants