-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sana] Add Sana, including SanaPipeline
, SanaPAGPipeline
, LinearAttentionProcessor
, Flow-based DPM-sovler
and so on.
#9982
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # src/diffusers/models/normalization.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
…uggingface#9932) * fix progress bar updates in SD 1.5 PAG Img2Img pipeline --------- Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2. fix the bug of new GLUMBConv; 3. run success;
2. Downloading ckpt from hub automatically;
# Conflicts: # src/diffusers/models/normalization.py
@a-r-r-o-w This branch is already rebased onto the merged DCAE branch and make sure the functions are still normal |
2. code update;
any update with the bfloat16-compatible model? 🙏 |
@lawrence-cj This is looking very close to merge now! We need to address the following:
As a note to self, conversion: python3 scripts/convert_sana_to_diffusers.py --orig_ckpt_path /raid/aryan/sana-1600m-1024px-original/checkpoints/Sana_1600M_1024px.pth --image_size 1024 --model_type SanaMS_1600M_P1_D20 --scheduler_type flow-dpm_solver --dump_path /raid/aryan/sana-1600m-1024px-diffusers --dtype fp32 --save_full_pipeline inference: import torch
from diffusers import SanaPipeline
pipe = SanaPipeline.from_pretrained("/raid/aryan/sana-1600m-1024px-diffusers", torch_dtype=torch.float32)
pipe.to("cuda")
pipe.text_encoder.to(torch.bfloat16)
pipe.transformer = pipe.transformer.to(torch.float16)
pipe.vae.to(torch.float32)
image = pipe(
prompt="a cyberpunk cat with a neon sign that says \"Sana\"",
guidance_scale=5.0,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('output.png') |
as far as i've seen, the bf16 weights released now don't quite work yet on the current PR here so i think there is more adjusting to be done |
What does this PR do?
This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.
Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana
Core contributor of DC-AE:
work with @johnny_ez@163.com
Core library:
We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu
Core library:
HF projects:
-->
Images is generated by
SanaPAGPipeline
withFlowDPMSolverMultistepScheduler