You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Stable Diffusion v2 VAE encoder outputs a mean and log variance of a Gaussian distribution, from which the latent encoding is drawn. In the field of generative AI, this process adds another stochastic element to the sampling process, resulting in a greater variety of generated images.
For the case of Marigold, when applying the Stable Diffusion Encoder, the reparameterization trick is made deterministic by directly taking the mean. I presume this is a step to remove randomness from the sampling process, as the task is to estimate depth maps, and we are interested in minimizing the variance of the generated maps as much as possible. On the other hand, Marigold already performs an optimization ensemble step, which might benefit from a variety of feasible estimates.
Was this a deliberate change, or would it have been something like rgb_latent = (mean + torch.exp(0.5 * logvar)*torch.randn(mean.shape).to(self.device)) * self.rgb_latent_scale_factor?
Thanks
The text was updated successfully, but these errors were encountered:
Hello,
You are right that it's a design choice to remove the randomness from the sampling process. However, only by this means the predictions are not as consistent as we expected, which is the main reason for introducing the ensemble.
Hello everybody,
The Stable Diffusion v2 VAE encoder outputs a mean and log variance of a Gaussian distribution, from which the latent encoding is drawn. In the field of generative AI, this process adds another stochastic element to the sampling process, resulting in a greater variety of generated images.
For the case of Marigold, when applying the Stable Diffusion Encoder, the reparameterization trick is made deterministic by directly taking the mean. I presume this is a step to remove randomness from the sampling process, as the task is to estimate depth maps, and we are interested in minimizing the variance of the generated maps as much as possible. On the other hand, Marigold already performs an optimization ensemble step, which might benefit from a variety of feasible estimates.
Was this a deliberate change, or would it have been something like rgb_latent = (mean + torch.exp(0.5 * logvar)*torch.randn(mean.shape).to(self.device)) * self.rgb_latent_scale_factor?
Thanks
The text was updated successfully, but these errors were encountered: