Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codebook initialization #19

Closed
ramyamounir opened this issue Jul 10, 2022 · 3 comments
Closed

codebook initialization #19

ramyamounir opened this issue Jul 10, 2022 · 3 comments

Comments

@ramyamounir
Copy link

Hi, Thank you for this great work. It's quite useful!

I have been having problems with index collapse and I'm not sure where it's coming from. But upon digging into the code, it seems that when we're not using k-means to initialize the codebook vectors, randn (normal distribution) is used to initialize them. The vqvae paper specifically uses uniform distribution for initialization, which allows the authors to ignore KL divergence when training.

This is from the vqvae paper: "Since we assume a uniform prior for z, the KL term that usually appears in the ELBO is constant w.r.t. the encoder parameters and can thus be ignored for training."

Is there any reason why you changed to Normal distribution here?

Thanks!

@lucidrains
Copy link
Owner

lucidrains commented Jul 11, 2022

@ramyamounir Hi Ramy! I checked the paper as well as deepmind's original sonnet code (from which the implementation is based) and you are correct, I should have been using uniform init

I've corrected it with this commit Thank you for raising this issue!

@ramyamounir
Copy link
Author

Awesome! Thanks for the quick reply and fix.

@gorold
Copy link

gorold commented Jul 22, 2022

From the VQGAN paper: " The codebook variables are initialized from a normal distribution."

Should CosineSimCodebook continue to use normal distribution for initialisation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants