You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this awesome work. Could I ask a question about the training part?
I was wondering after the fisrt step, will you train the decoder again in the second step? Coz it seems the upper boundry of the performance is determined by how good the decoder is, right? Does this also mean if we build a stronger autoencoder, the performance can be also improved? Many thanks.
The text was updated successfully, but these errors were encountered:
Hi, sorry for the delay, I’ve been taking a break over Christmas/New Year. The decoder is only trained in the first step as is typical with Vector-Quantized models and so yes, it is very true that the performance is bounded by the decoder performance. Since, in terms of FID, we are relatively close to AE reconstruction quality, it is definitely important to improve the decoder. Some recent papers have focused on this part, using Vision Transformers and DDPMs. Hope this helps.
Gongratulations! Just realize that your paper is accepted by ECCV. I noticed your paper early before CVPR2022, and found a later paper, latent DPM, shares the same idea with your paper but got much more attention than yours. Sad but research do need some luck at sometime. Anyway, glad it was accepted!
Thanks for this awesome work. Could I ask a question about the training part?
I was wondering after the fisrt step, will you train the decoder again in the second step? Coz it seems the upper boundry of the performance is determined by how good the decoder is, right? Does this also mean if we build a stronger autoencoder, the performance can be also improved? Many thanks.
The text was updated successfully, but these errors were encountered: