Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong gradient flow in bias correction term of ACER? #15

Closed
wwiiiii opened this issue Jul 31, 2019 · 1 comment
Closed

Wrong gradient flow in bias correction term of ACER? #15

wwiiiii opened this issue Jul 31, 2019 · 1 comment

Comments

@wwiiiii
Copy link

wwiiiii commented Jul 31, 2019

loss2 = -correction_coeff * pi * torch.log(pi) * (q.detach()-v) # bias correction term

According to original paper, gradient for bias correction term is define as below,
image
and as pi serves as the probability for expectation calculation, it seems it's not the target of optimization.

Shouldn't we detach the pi from computational graph at above line?

@seungeunrho
Copy link
Owner

Wow, you're correct.
Thanks for such a sharp comment.
I updated the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants