Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting KL divergence to work #92

Open
CDitzel opened this issue Mar 16, 2021 · 4 comments
Open

getting KL divergence to work #92

CDitzel opened this issue Mar 16, 2021 · 4 comments

Comments

@CDitzel
Copy link

CDitzel commented Mar 16, 2021

in the train_vae script the kl_loss is set to zero via the weight parameter and also in my elaborate runs of experiments, I found that including the KL term does more harm than it helps. @karpathy also mentioned trouble in getting it to work properly.

did anyone achieve any progress on this matter?

Also, this

log_qy = F.log_softmax(logits, dim = -1)

rather use the soft_one_hot values than the raw logits?

Also, I find it a little confusing that we are actually annealing the temperature of gumbel-softmax, thus steering the it towards one_hot sampling when at the same time we are trying to encourage the distribution to be close to a uniform prior. Isnt this a contradiction?

@LinLanbo
Copy link

I argee with you. So I temporarily set kl_weight to zero. Otherwise the recon_loss cannot be reduced. In this version, kl_loss is contradict to recon_loss.

@lucidrains
Copy link
Owner

maybe someone can email the paper authors to see if this loss was used at all?

@CDitzel
Copy link
Author

CDitzel commented Mar 17, 2021

I must have been used as they mention an increasing weight parameter in the paper.

Still, I am trying, but I cant seem to figure out his e mail adress. On the paper it says

Aditya Ramesh <_@adityaramesh.com

so I tried Aditya_Ramesh@adityaramesh.com, Aditya.Ramesh@adityaramesh.com

but they dont exist...

@samringer
Copy link

Has anyone had any more insights/updates on this? I'm running into the exact same issue (on an independent DALL-E repro) and bashing my head against the wall trying to understand the behaviour!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants