The log_prob is not corrected #9

typoverflow · 2023-02-13T12:13:58Z

Hi,
Thanks for releasing the code. I noticed that in the policy network, you simply squash the mean with tanh without correcting the log-probability as, for example, SAC did in their parameterization of the policy. Will this cause bias to the estimation of the gradient of the policy?

implicit_q_learning/policy.py

Line 56 in 09d7002

base_dist = tfd.MultivariateNormalDiag(loc=means,

I'm debugging my implementation of IQL and XQL, and I'm not sure whether this causes the performance gap or not. Please correct me if there is any mis-understanding.

The text was updated successfully, but these errors were encountered:

typoverflow · 2023-02-14T07:48:50Z

Sorry, seems that iql is not using tanh-squashed distributions, hence no correction is needed.
closing this issue =)

typoverflow closed this as completed Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The log_prob is not corrected #9

The log_prob is not corrected #9

typoverflow commented Feb 13, 2023

typoverflow commented Feb 14, 2023

The log_prob is not corrected #9

The log_prob is not corrected #9

Comments

typoverflow commented Feb 13, 2023

typoverflow commented Feb 14, 2023