getting KL divergence to work #92

CDitzel · 2021-03-16T11:02:32Z

in the train_vae script the kl_loss is set to zero via the weight parameter and also in my elaborate runs of experiments, I found that including the KL term does more harm than it helps. @karpathy also mentioned trouble in getting it to work properly.

did anyone achieve any progress on this matter?

Also, this

DALLE-pytorch/dalle_pytorch/dalle_pytorch.py

Line 196 in 7658e60

log_qy = F.log_softmax(logits, dim = -1)

rather use the soft_one_hot values than the raw logits?

Also, I find it a little confusing that we are actually annealing the temperature of gumbel-softmax, thus steering the it towards one_hot sampling when at the same time we are trying to encourage the distribution to be close to a uniform prior. Isnt this a contradiction?

LinLanbo · 2021-03-17T07:59:51Z

I argee with you. So I temporarily set kl_weight to zero. Otherwise the recon_loss cannot be reduced. In this version, kl_loss is contradict to recon_loss.

lucidrains · 2021-03-17T17:35:34Z

maybe someone can email the paper authors to see if this loss was used at all?

CDitzel · 2021-03-17T20:05:38Z

I must have been used as they mention an increasing weight parameter in the paper.

Still, I am trying, but I cant seem to figure out his e mail adress. On the paper it says

Aditya Ramesh <_@adityaramesh.com

so I tried Aditya_Ramesh@adityaramesh.com, Aditya.Ramesh@adityaramesh.com

but they dont exist...

samringer · 2021-10-02T12:51:03Z

Has anyone had any more insights/updates on this? I'm running into the exact same issue (on an independent DALL-E repro) and bashing my head against the wall trying to understand the behaviour!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting KL divergence to work #92

getting KL divergence to work #92

CDitzel commented Mar 16, 2021 •

edited

LinLanbo commented Mar 17, 2021

lucidrains commented Mar 17, 2021

CDitzel commented Mar 17, 2021

samringer commented Oct 2, 2021

getting KL divergence to work #92

getting KL divergence to work #92

Comments

CDitzel commented Mar 16, 2021 • edited

LinLanbo commented Mar 17, 2021

lucidrains commented Mar 17, 2021

CDitzel commented Mar 17, 2021

samringer commented Oct 2, 2021

CDitzel commented Mar 16, 2021 •

edited