Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The log_prob is not corrected #9

Closed
typoverflow opened this issue Feb 13, 2023 · 1 comment
Closed

The log_prob is not corrected #9

typoverflow opened this issue Feb 13, 2023 · 1 comment

Comments

@typoverflow
Copy link

Hi,
Thanks for releasing the code. I noticed that in the policy network, you simply squash the mean with tanh without correcting the log-probability as, for example, SAC did in their parameterization of the policy. Will this cause bias to the estimation of the gradient of the policy?

base_dist = tfd.MultivariateNormalDiag(loc=means,

I'm debugging my implementation of IQL and XQL, and I'm not sure whether this causes the performance gap or not. Please correct me if there is any mis-understanding.

@typoverflow
Copy link
Author

Sorry, seems that iql is not using tanh-squashed distributions, hence no correction is needed.
closing this issue =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant