Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

Closed
GonzaloMartinGarcia opened this issue Jan 18, 2024 · 1 comment

Comments

@GonzaloMartinGarcia
Copy link

Hello everybody,

The Stable Diffusion v2 VAE encoder outputs a mean and log variance of a Gaussian distribution, from which the latent encoding is drawn. In the field of generative AI, this process adds another stochastic element to the sampling process, resulting in a greater variety of generated images.

For the case of Marigold, when applying the Stable Diffusion Encoder, the reparameterization trick is made deterministic by directly taking the mean. I presume this is a step to remove randomness from the sampling process, as the task is to estimate depth maps, and we are interested in minimizing the variance of the generated maps as much as possible. On the other hand, Marigold already performs an optimization ensemble step, which might benefit from a variety of feasible estimates.

Was this a deliberate change, or would it have been something like rgb_latent = (mean + torch.exp(0.5 * logvar)*torch.randn(mean.shape).to(self.device)) * self.rgb_latent_scale_factor?

Thanks

@markkua
Copy link
Member

markkua commented Feb 1, 2024

Hello,
You are right that it's a design choice to remove the randomness from the sampling process. However, only by this means the predictions are not as consistent as we expected, which is the main reason for introducing the ensemble.

@markkua markkua closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants