-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential errors in the loss funktion #12
Comments
And I win for misspelling function in the issue title ;). |
@stokasto, probably not for 1. : look closely at :
I assume the function is equivalent to multiply by one-hot encoded target. |
@etienne87 Ah yes, that is fine then. As I wrote I did not overly carefully study the code during my re-implementation but just thought that I would bring these two issues up since (in case they are true) they are hard to detect and could be lurking in the code-base for a while. |
Thank you for your comments.
|
Hey, a great that explains it, I was wondering how the implementation could have converged to a reasonable value function if that error were indeed there. |
Hey,
first off: great work. I just re-implemented the paper myself using tensorflow and your code provided great "side-information" for doing so :).
In the process I also realized that there may be two subtle bugs in your implementation (although I have never used chainer before so I might be misunderstanding things):
#103 a3c.py :
pi_loss -= log_prob * float(advantage.data)
I believe this is incorrect as you should multiply log_prob with a one-hot encoding of the action (since only one of the actions was selected)
advantage = R - v
where
v
comes fromself.past_values[i]
which, in turn, is the output of the value network. As I wrote I am no expert regarding chainer but you need to make sure that no gradient flows throughv
here (as the value function should only be updated according to thev_loss
in your code). In theano/tensorflow this would be handled with adisconnected_grad()
orstop_gradient()
operation respectively.I will push my implementation to github sometime during this week as soon as I have more thoroughly tested it and can then reference it here for you to compare.
The text was updated successfully, but these errors were encountered: