Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate Prioritized Experience Replay's reported performance improvements #278

Closed
muupan opened this issue Jun 14, 2018 · 9 comments
Closed

Comments

@muupan
Copy link
Member

muupan commented Jun 14, 2018

Missing details

  • "all weights w_i were scaled so that max_i w_i = 1". Is max_i w_i computed over a minibatch or the whole buffer?
  • What is the value of epsilon that is added to absolute TD errors?
@muupan
Copy link
Member Author

muupan commented Jun 15, 2018

I asked the author via email and confirmed

  • max_i w_i is computed over a minibatch
  • epsilon=0.01

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

breakout: 🆗
image
screen shot 2018-10-03 at 20 52 05
screen shot 2018-10-03 at 22 19 42

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

space invaders: 🆗
image
screen shot 2018-10-03 at 22 21 10

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

seaquest: a bit worse
image
screen shot 2018-10-03 at 22 22 33

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

beam rider: a bit worse
image
screen shot 2018-10-03 at 22 24 56

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

asterix: 🆗
image
screen shot 2018-10-03 at 22 26 39

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

qbert: 🆗
image
screen shot 2018-10-03 at 22 27 55

@muupan
Copy link
Member Author

muupan commented Oct 3, 2018

I compared "Double DQN tuned prioritized lr/4" vs "proportional" in the paper.
similar results: breakout, space invaders, asterix, qbert
a bit worse than the paper: seaquest, beam rider

They seem to use 500,000 frames instead of 108,000 frames for evaluation (B.2.3 of http://arxiv.org/abs/1511.05952), so trying 500,000 frames may fill the gap.

@prabhatnagarajan
Copy link
Contributor

Tested 7 games: Breakout, Space Invaders, Seaquest, Asterix, Beam Rider, Qbert.

Results:

Breakout: 🆗
Space Invaders: 🆗
Seaquest: worse
Beam Rider: slightly worse
Asterix:🆗
Qbert: 🆗

@muupan suggested that it appears that the evaluations used in the paper permitted longer episodes during evaluations, potentially explaining our slightly worse performance in 2 domains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants