Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you explain why the energy is divided by b0? #8

Open
swyoon opened this issue Nov 23, 2022 · 1 comment
Open

Can you explain why the energy is divided by b0? #8

swyoon opened this issue Nov 23, 2022 · 1 comment

Comments

@swyoon
Copy link

swyoon commented Nov 23, 2022

In the following code for computing the (unnormalized) log probability, the network output is divided by b0.

return self.net(y, t, dropout=dropout) / tf.reshape(b0, [-1]) - tf.reduce_sum((y - tilde_x) ** 2 / 2 / sigma ** 2 * is_recovery, axis=[1, 2, 3])

I wonder if there is a legitimate explanation for this division.

b0 is supposed to be step_size_square, which usually has a very small value.

grad_y_new, log_p_y_new = self.grad_f(y_new, t, tilde_x, step_size_square, sigma, is_recovery, dropout=dropout)

I wonder if dividing by this b0 makes the gradient too large and harms the training in some settings.

@h2o64
Copy link

h2o64 commented Aug 29, 2023

I think that's the scaling trick explained in "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models" see Appendix A here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants