Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get the current max priority in a table. #91

Open
ethanluoyc opened this issue Feb 17, 2022 · 7 comments
Open

Get the current max priority in a table. #91

ethanluoyc opened this issue Feb 17, 2022 · 7 comments

Comments

@ethanluoyc
Copy link
Contributor

Hi Reverb team,

I am interested in using a prioritized experience replay on top of an Acme agent that inserts a new experience by setting the priority to the current maximum priority in the buffer. I have looked around but haven't found a good way to do this. Is there a recommended approach to do this in reverb?

Many thanks in advance!

@sabelaraga
Copy link
Collaborator

Hi Yicheng,

Could you give us more details on how are you planning to sample? From what you describe, if the latest inserted is the one with higher priority, you may want to take a look at the Lifo selectors.

Sabela.

@ethanluoyc
Copy link
Contributor Author

Hi Sabela,

Thanks for the reply! I am basically trying to implement the exact PER scheme used in https://arxiv.org/abs/1511.05952.

The dqn_zoo prioritized DQN agent does what I want.

https://github.com/deepmind/dqn_zoo/blob/master/dqn_zoo/prioritized/agent.py

However, compared with prioritized DQN in Acme, the Acme agent uses a default priority of 1.0 to add a new transition. See e.g.
https://github.com/deepmind/acme/blob/076e8e1c1b8e13e8aae9708e94d3e2dca4a7cd03/acme/agents/jax/dqn/builder.py#L137

I believe that this is different from what is in the DQN zoo implementation, not sure if there is a practical difference in performance but I would hope to implement PER in a way that's as close to the original paper as possible.

@sabelaraga
Copy link
Collaborator

sabelaraga commented Feb 17, 2022

It is inserted with priority 1.0, but then it uses the PER implementation of Reverb (prioritized sampler here, the interesting part of the code is this

@ethanluoyc
Copy link
Contributor Author

ethanluoyc commented Feb 17, 2022

I see, so in some sense, the priority values is normalized such that the maximum priority would be 1.0, is that correct?

@ethanluoyc
Copy link
Contributor Author

Actually, not quite. Let's say if we use the td_error to update the priorities, then these values would not be normalized between 0.0 and 1.0. For example, consider an example where the td_error is 5.0, updating the priorities would result in the new priority to be 5.0. Using the priority of 1.0 would not ensure that the newly inserted item has a higher priority than the old samples with large td error.

@sabelaraga
Copy link
Collaborator

That would not be a problem if the td_error is capped between [-1, 1]. Do you have an example where it is not? I'm trying to verify on the code, but it may make sense to ask in the Acme repo as well (to make sure there is no issues with the DQN implementation).

@ethanluoyc
Copy link
Contributor Author

ethanluoyc commented Feb 22, 2022

I don't think the DQN in acme clips the td_error. I know some Atari agents clip the max absolute reward to be between -1 and 1, but that doesn't mean the td_error is in anyway bounded.

https://github.com/deepmind/acme/blob/master/acme/agents/jax/dqn/losses.py#L74

I have cross posted to dm-acme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants