New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive clamping #3
Comments
Really nice issue report, thanks for opening it. I would never have assumed this to be a problem. Let me know if you want to open a pull request for this one too. Incidentally I am back working on RL these days so I will check it out on Space Invaders too. |
I can send in a PR but first I'd like to see if Tudor has any more insights on this. |
I don't. Your observation is correct. There seems to be no reason for the maximum clamp. Although I am surprised by the gravity of that decision in your experiments. |
Update: I now suspect that clamping at all, despite the potential problems, is detrimental. Comparing two ongoing runs, the top having a lower minimum clamp of 1e-5 and the bottom having no clamping at all, it seems to indicate that clamping should be removed altogether. |
It seems like the clamping of the network's output in the update is a bit excessive? Given values of x in [0, 1] (valid probabilities), as x -> 0, log(x) -> -infinity, so clamping the minimum value makes sense, but log(1) = 0, so there's no issues with the max value. Pinging @tudor-berariu as well.
Empirically, this might be an issue. I'm running my Rainbow agent with a minimum clamp of 0.001 (arbitrarily chosen), and get the following rewards and Q-values on Space Invaders (the Q-values are in line with what is reported in the Double DQN paper; unfortunately I do not have reported Q-values for Rainbow):
Whereas when I use a minimum clamp of 0.01 and maximum clamp of 0.99 as in this repo, I get the following, which indicates that this prevents the network from accurately estimating Q (note that this is the first time I've ever seen Q-values so far from what I got above, so the issue clearly lies with the clamping):
The text was updated successfully, but these errors were encountered: