Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

GonzaloMartinGarcia · 2024-01-18T13:59:56Z

Hello everybody,

The Stable Diffusion v2 VAE encoder outputs a mean and log variance of a Gaussian distribution, from which the latent encoding is drawn. In the field of generative AI, this process adds another stochastic element to the sampling process, resulting in a greater variety of generated images.

For the case of Marigold, when applying the Stable Diffusion Encoder, the reparameterization trick is made deterministic by directly taking the mean. I presume this is a step to remove randomness from the sampling process, as the task is to estimate depth maps, and we are interested in minimizing the variance of the generated maps as much as possible. On the other hand, Marigold already performs an optimization ensemble step, which might benefit from a variety of feasible estimates.

Was this a deliberate change, or would it have been something like rgb_latent = (mean + torch.exp(0.5 * logvar)*torch.randn(mean.shape).to(self.device)) * self.rgb_latent_scale_factor?

Thanks

markkua · 2024-02-01T11:45:10Z

Hello,
You are right that it's a design choice to remove the randomness from the sampling process. However, only by this means the predictions are not as consistent as we expected, which is the main reason for introducing the ensemble.

markkua closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

GonzaloMartinGarcia commented Jan 18, 2024

markkua commented Feb 1, 2024

Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

Regarding the Stochastic Nature of the Stable Diffusion v2 VAE's Encoder #35

Comments

GonzaloMartinGarcia commented Jan 18, 2024

markkua commented Feb 1, 2024