Wrong gradient flow in bias correction term of ACER? #15

wwiiiii · 2019-07-31T09:21:56Z

Line 104 in 46f9b32

    
           loss2 = -correction_coeff * pi * torch.log(pi) * (q.detach()-v) # bias correction term

According to original paper, gradient for bias correction term is define as below,

and as pi serves as the probability for expectation calculation, it seems it's not the target of optimization.

Shouldn't we detach the pi from computational graph at above line?

The text was updated successfully, but these errors were encountered:

seungeunrho · 2019-07-31T14:04:12Z

Wow, you're correct.
Thanks for such a sharp comment.
I updated the code.

seungeunrho closed this as completed Jul 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong gradient flow in bias correction term of ACER? #15

Wrong gradient flow in bias correction term of ACER? #15

wwiiiii commented Jul 31, 2019

seungeunrho commented Jul 31, 2019

Wrong gradient flow in bias correction term of ACER? #15

Wrong gradient flow in bias correction term of ACER? #15

Comments

wwiiiii commented Jul 31, 2019

seungeunrho commented Jul 31, 2019