-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is the scaling_factor
?
#3
Comments
There is no (The |
Thanks for your reply! I am trying to integrate your work in With the following code ( import torch
from diffusers import DiffusionPipeline, TinyAutoencoder
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
)
pipe.vae = TinyAutoencoder.from_pretrained("sayakpaul/taesd-diffusers", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25, height=512, width=512, guidance_scale=3.0).images[0]
image I am getting: Is the quality somewhat expected? To give you some more context here's what we do in the standard pipeline settings. After we get the latents from the UNet,
From your example notebook, comparing this line: res_taesd = taesd_dec(latents).cpu().permute(0, 2, 3, 1).float().clamp(0, 1).numpy() to this one in diffusers, it feels like that the additional Would be amazing to get your thoughts here. |
Seems like it's indeed the case. When I do: import PIL
pipe.vae = TinyAutoencoder.from_pretrained(
"sayakpaul/taesd-diffusers", torch_dtype=torch.float16
).to("cuda")
latents = pipe(
prompt, num_inference_steps=25, height=512, width=512, guidance_scale=3.0,
generator=torch.manual_seed(0), output_type="latent"
).images
decoded_image = pipe.vae.decode(
latents / pipe.vae.config.scaling_factor, return_dict=False
)[0]
decoded_image = decoded_image.permute(0, 2, 3, 1).float().clamp(0, 1).cpu().detach().numpy().squeeze(0)
PIL.Image.fromarray((decoded_image * 255).round().astype("uint8")) With this, I am getting: |
When I use the original VAE, I get: from diffusers import AutoencoderKL
original_vae = AutoencoderKL.from_pretrained(
"stabilityai/stable-diffusion-2-1-base", subfolder="vae", torch_dtype=torch.float16
).to("cuda")
pipe.vae = original_vae
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(
prompt, num_inference_steps=25, height=512, width=512, guidance_scale=3.0,
generator=torch.manual_seed(0)
).images[0]
image |
Closing the issue. |
Yup, TAESD directly predicts values in [0, 1] so you don't need the additional denormalization step (though clamping is still recommended). The image here looks correct to me 👍 |
We have
latent_shift
andlatent_magnitude
values here:https://github.com/madebyollin/taesd/blob/main/taesd.py#L44C1-L45C23
But is there a
scaling_factor
as well or is it just one?scaling_factor
as observed in https://github.com/huggingface/diffusers/blob/ea5b0575f8f91b76f32fb6f6930c0bc30e42865e/src/diffusers/models/autoencoder_kl.py#L61.The text was updated successfully, but these errors were encountered: