Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive clamping #3

Closed
Kaixhin opened this issue Apr 6, 2018 · 4 comments
Closed

Excessive clamping #3

Kaixhin opened this issue Apr 6, 2018 · 4 comments

Comments

@Kaixhin
Copy link
Contributor

Kaixhin commented Apr 6, 2018

It seems like the clamping of the network's output in the update is a bit excessive? Given values of x in [0, 1] (valid probabilities), as x -> 0, log(x) -> -infinity, so clamping the minimum value makes sense, but log(1) = 0, so there's no issues with the max value. Pinging @tudor-berariu as well.

Empirically, this might be an issue. I'm running my Rainbow agent with a minimum clamp of 0.001 (arbitrarily chosen), and get the following rewards and Q-values on Space Invaders (the Q-values are in line with what is reported in the Double DQN paper; unfortunately I do not have reported Q-values for Rainbow):
newplot
newplot 1

Whereas when I use a minimum clamp of 0.01 and maximum clamp of 0.99 as in this repo, I get the following, which indicates that this prevents the network from accurately estimating Q (note that this is the first time I've ever seen Q-values so far from what I got above, so the issue clearly lies with the clamping):
newplot 2
newplot 3

@floringogianu
Copy link
Owner

Really nice issue report, thanks for opening it. I would never have assumed this to be a problem. Let me know if you want to open a pull request for this one too. Incidentally I am back working on RL these days so I will check it out on Space Invaders too.

@Kaixhin
Copy link
Contributor Author

Kaixhin commented Apr 7, 2018

I can send in a PR but first I'd like to see if Tudor has any more insights on this.

@tudor-berariu
Copy link
Collaborator

I don't. Your observation is correct. There seems to be no reason for the maximum clamp. Although I am surprised by the gravity of that decision in your experiments.

@Kaixhin
Copy link
Contributor Author

Kaixhin commented Apr 30, 2018

Update: I now suspect that clamping at all, despite the potential problems, is detrimental. Comparing two ongoing runs, the top having a lower minimum clamp of 1e-5 and the bottom having no clamping at all, it seems to indicate that clamping should be removed altogether.

Minimum clamp = 1e-5:
newplot 3
newplot 2

No minimum clamp:
newplot 1
newplot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants