Excessive clamping #3

Kaixhin · 2018-04-06T02:14:10Z

It seems like the clamping of the network's output in the update is a bit excessive? Given values of x in [0, 1] (valid probabilities), as x -> 0, log(x) -> -infinity, so clamping the minimum value makes sense, but log(1) = 0, so there's no issues with the max value. Pinging @tudor-berariu as well.

Empirically, this might be an issue. I'm running my Rainbow agent with a minimum clamp of 0.001 (arbitrarily chosen), and get the following rewards and Q-values on Space Invaders (the Q-values are in line with what is reported in the Double DQN paper; unfortunately I do not have reported Q-values for Rainbow):

Whereas when I use a minimum clamp of 0.01 and maximum clamp of 0.99 as in this repo, I get the following, which indicates that this prevents the network from accurately estimating Q (note that this is the first time I've ever seen Q-values so far from what I got above, so the issue clearly lies with the clamping):

floringogianu · 2018-04-07T19:33:46Z

Really nice issue report, thanks for opening it. I would never have assumed this to be a problem. Let me know if you want to open a pull request for this one too. Incidentally I am back working on RL these days so I will check it out on Space Invaders too.

Kaixhin · 2018-04-07T19:39:06Z

I can send in a PR but first I'd like to see if Tudor has any more insights on this.

tudor-berariu · 2018-04-07T21:39:14Z

I don't. Your observation is correct. There seems to be no reason for the maximum clamp. Although I am surprised by the gravity of that decision in your experiments.

Kaixhin · 2018-04-30T00:03:35Z

Update: I now suspect that clamping at all, despite the potential problems, is detrimental. Comparing two ongoing runs, the top having a lower minimum clamp of 1e-5 and the bottom having no clamping at all, it seems to indicate that clamping should be removed altogether.

Minimum clamp = 1e-5:

No minimum clamp:

Kaixhin mentioned this issue Apr 7, 2018

Reduce clamping #4

Merged

floringogianu closed this as completed in #4 Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive clamping #3

Excessive clamping #3

Kaixhin commented Apr 6, 2018

floringogianu commented Apr 7, 2018

Kaixhin commented Apr 7, 2018

tudor-berariu commented Apr 7, 2018

Kaixhin commented Apr 30, 2018

Excessive clamping #3

Excessive clamping #3

Comments

Kaixhin commented Apr 6, 2018

floringogianu commented Apr 7, 2018

Kaixhin commented Apr 7, 2018

tudor-berariu commented Apr 7, 2018

Kaixhin commented Apr 30, 2018