Can you explain why the energy is divided by b0? #8

swyoon · 2022-11-23T04:37:53Z

In the following code for computing the (unnormalized) log probability, the network output is divided by b0.

Line 154 in c77cc05

    
           return self.net(y, t, dropout=dropout) / tf.reshape(b0, [-1]) - tf.reduce_sum((y - tilde_x) ** 2 / 2 / sigma ** 2 * is_recovery, axis=[1, 2, 3])

I wonder if there is a legitimate explanation for this division.

b0 is supposed to be step_size_square, which usually has a very small value.

recovery_likelihood/model.py

Line 184 in c77cc05

    
           grad_y_new, log_p_y_new = self.grad_f(y_new, t, tilde_x, step_size_square, sigma, is_recovery, dropout=dropout)

I wonder if dividing by this b0 makes the gradient too large and harms the training in some settings.

The text was updated successfully, but these errors were encountered:

h2o64 · 2023-08-29T12:49:14Z

I think that's the scaling trick explained in "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models" see Appendix A here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you explain why the energy is divided by b0? #8

Can you explain why the energy is divided by b0? #8

swyoon commented Nov 23, 2022

h2o64 commented Aug 29, 2023

Can you explain why the energy is divided by b0? #8

Can you explain why the energy is divided by b0? #8

Comments

swyoon commented Nov 23, 2022

h2o64 commented Aug 29, 2023