You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train Inverted Pendulum with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed that Inverted Pendulum did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.
Thank you so much!
The text was updated successfully, but these errors were encountered:
I have never tried to check the plot of Q loss function with time. My interpretation is, we usually expect Q loss function to decrease with time. This holds if we have a perfect supervisor to give us the target value (perhaps expected return). Since we are approximating this with only the Q value at next instant (and also approximating value function with NN), we cannot expect a steady pattern for Q loss function with time. It may fluctuate (sometimes diverges a bit and recover) but eventually the loss decreases.
This implementation does not diverge. Specifically, I found good improvement in terms of the convergence speed after using batch normalization.
From my experience, I used the the following checks to debug the divergence issue.
Set the learning rate to zero (both actor and critic network) and check if it is still diverging. If it does, then there is a divide by zero. (Also check if you have not initialized weights with zeros)
If it is not diverging, it might diverge due to exploding gradient, in that case try to clip the gradient.
Hello,
I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train
Inverted Pendulum
with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed thatInverted Pendulum
did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.Thank you so much!
The text was updated successfully, but these errors were encountered: