Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing Flows Module - Order of Operations #10

Closed
areiner222 opened this issue Oct 29, 2021 · 5 comments
Closed

Normalizing Flows Module - Order of Operations #10

areiner222 opened this issue Oct 29, 2021 · 5 comments

Comments

@areiner222
Copy link

Hi @nkolot ,

I really enjoyed this project - a very unique spin on probabilistic modeling for human 3d pose reconstruction, and the resulting context-based, strong prior seems to work great for your new version of SMPLify!

I also am a big fan of your conditional normalizing flows approach, and I am trying to understand clearly the order of operations from noise->sample.

In the supp material, I see that the forward mode norm flow bijector is depicted to act (for one block) in the order of z -> [ act norm -> linear layer -> conditional shift coupling ]-> pose theta.
Screen Shot 2021-10-29 at 8 42 03 AM

However, when looking through your nflows fork, I am coming out somewhere different.
Allow me to walk through what I'm seeing:

  1. Arrange each glow block as [norm, linear, coupling]
  2. When sampling / computing log probs you call sample_and_log_prob which calls the inverse of the glow bijector on noise generated from standard normal
  3. The inverse mode of the CompositeTransform inverts the component bijectors and reverses the order
  4. In forward mode, smpl flow uses the sample_and_log_prob method on noise

My understanding from the code is that it acts in the inverse way from "noise" to "theta sample" as depicted in the supp materials (i.e., order of operations for "sampling phase" vs "evaluation phase" are flipped).

Do you know if I am missing something ? I'd really appreciate your help!

Alex

@nkolot
Copy link
Owner

nkolot commented Oct 29, 2021

Yes you are correct. nflows actually uses the inverse of the function for sampling. This however should not really change the results. I am not sure why they went with this design choice. Maybe for a particular class of transformations computing the inverse could be slower, so they wanted to have a fast way of going from the output to latent to maximize the log-probability during training.

@areiner222
Copy link
Author

Understood. thanks for your help!

I've been working with a comparable tensorflow implementation of the conditional glow normalizing flow and have had some trouble with nan losses (and have not done a deep-dive yet to try and identify the issue) when I do not use the order of operations they use in nflows. Curious if you have tried inverting the order and attempted training?

@nkolot
Copy link
Owner

nkolot commented Oct 29, 2021

The key thing to make training stable is to run a dummy forward pass so that the ActNorm layers are initialized properly as I do here. In nflows the first time you go through them they are initialized based on the activations for the first batch.
So you might want to do something similar in your implementation.

@areiner222
Copy link
Author

Ah, that's really helpful.

So I'm understanding - at init you compute the log_prob of a ground truth batch which forces the normalizing flow bijector to convert from theta to z. Because of the aforementioned inverting that nflows does in terms of how it composes the component bijectors, the first operation is forward mode for act norm. Therefore the input to the forward mode of the act norm bijector is the rot6d representation of the smpl pose parameters, and, upon first run, it updates log_scale/shift parameters to enforce that the post-activation has zero mean / unit variance (as your code comment says).

Am I getting that correctly? You still allow for the scale / shift parameters to be trainable after init, correct?

@nkolot
Copy link
Owner

nkolot commented Oct 29, 2021

Yes, what you said is correct.
The parameters are trainable after the initialization. The initialization trick that ActNorm uses is probably needed to improve the convergence properties by “whitening” the activations in the intermediate layers.

@nkolot nkolot closed this as completed Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants