Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Loss function of Critic Network training #7

Closed
RuofanKong opened this issue Sep 6, 2016 · 1 comment
Closed

Question on Loss function of Critic Network training #7

RuofanKong opened this issue Sep 6, 2016 · 1 comment

Comments

@RuofanKong
Copy link

Hello,

I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train Inverted Pendulum with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed that Inverted Pendulum did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.

Thank you so much!

@stevenpjg
Copy link
Owner

I have never tried to check the plot of Q loss function with time. My interpretation is, we usually expect Q loss function to decrease with time. This holds if we have a perfect supervisor to give us the target value (perhaps expected return). Since we are approximating this with only the Q value at next instant (and also approximating value function with NN), we cannot expect a steady pattern for Q loss function with time. It may fluctuate (sometimes diverges a bit and recover) but eventually the loss decreases.

This implementation does not diverge. Specifically, I found good improvement in terms of the convergence speed after using batch normalization.

From my experience, I used the the following checks to debug the divergence issue.

  1. Set the learning rate to zero (both actor and critic network) and check if it is still diverging. If it does, then there is a divide by zero. (Also check if you have not initialized weights with zeros)
  2. If it is not diverging, it might diverge due to exploding gradient, in that case try to clip the gradient.
  3. (or) Use grad inverter link to bound the parameter. Check the implementation here: https://github.com/stevenpjg/ddpg-aigym/blob/master/tensorflow_grad_inverter.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants