Question on Loss function of Critic Network training #7

RuofanKong · 2016-09-06T08:26:03Z

Hello,

I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train Inverted Pendulum with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed that Inverted Pendulum did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.

Thank you so much!

The text was updated successfully, but these errors were encountered:

stevenpjg · 2016-09-07T09:54:20Z

I have never tried to check the plot of Q loss function with time. My interpretation is, we usually expect Q loss function to decrease with time. This holds if we have a perfect supervisor to give us the target value (perhaps expected return). Since we are approximating this with only the Q value at next instant (and also approximating value function with NN), we cannot expect a steady pattern for Q loss function with time. It may fluctuate (sometimes diverges a bit and recover) but eventually the loss decreases.

This implementation does not diverge. Specifically, I found good improvement in terms of the convergence speed after using batch normalization.

From my experience, I used the the following checks to debug the divergence issue.

Set the learning rate to zero (both actor and critic network) and check if it is still diverging. If it does, then there is a divide by zero. (Also check if you have not initialized weights with zeros)
If it is not diverging, it might diverge due to exploding gradient, in that case try to clip the gradient.
(or) Use grad inverter link to bound the parameter. Check the implementation here: https://github.com/stevenpjg/ddpg-aigym/blob/master/tensorflow_grad_inverter.py

stevenpjg closed this as completed Sep 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on Loss function of Critic Network training #7

Question on Loss function of Critic Network training #7

RuofanKong commented Sep 6, 2016

stevenpjg commented Sep 7, 2016

Question on Loss function of Critic Network training #7

Question on Loss function of Critic Network training #7

Comments

RuofanKong commented Sep 6, 2016

stevenpjg commented Sep 7, 2016