Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Open
wants to merge 144 commits into
base: main
Choose a base branch
from

Conversation

lawrence-cj
Copy link
Contributor

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @johnny_ez@163.com

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

HF projects:

-->

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

5361732169697_ pic_hd

lawrence-cj and others added 14 commits December 9, 2024 18:33
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
…uggingface#9932)

* fix progress bar updates in SD 1.5 PAG Img2Img pipeline

---------

Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2. fix the bug of new GLUMBConv;
3. run success;
2. Downloading ckpt from hub automatically;
@lawrence-cj
Copy link
Contributor Author

@a-r-r-o-w This branch is already rebased onto the merged DCAE branch and make sure the functions are still normal

@bghira
Copy link
Contributor

bghira commented Dec 11, 2024

any update with the bfloat16-compatible model? 🙏

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Dec 11, 2024

@lawrence-cj This is looking very close to merge now! We need to address the following:

  • Hosting the diffusers format checkpoints (will need your help). I still have some changes that I want to test. Will let you know tomorrow so that the implementations are finalized for transformer.
  • Once hosted, we can update the docs usages
  • Since the original weights are jn FP32, I'm assuming that's the precision it was trained on. We should host fp32 weights as default IMO due to this reason. fp16/bf16 weights should be hosted as variants (relevant code has been added to conversion script). Only transformer and text encoder should probably be in variant format, and VAE in fp32
  • Do we need convert to PAG script? Layers can be specified on-the-fly too when loading models and that is the intended usage, so I think it can be removed.
  • Do we need the clean caption code? For newer pipelines, I am pushing towards not having those changes because prompt pre-processing should be done outside the pipelines. Whether we keep it or not, it is okay with me but lmk your thoughts on this
  • For Scheduler changes, I will hand it off to @yiyixuxu and @hlky for reviews
  • Integration tests (I will take it up after diffusers weights are hosted)
  • LoRA loading support (future separate PR, I will take it up)

As a note to self, conversion:

python3 scripts/convert_sana_to_diffusers.py --orig_ckpt_path /raid/aryan/sana-1600m-1024px-original/checkpoints/Sana_1600M_1024px.pth --image_size 1024 --model_type SanaMS_1600M_P1_D20 --scheduler_type flow-dpm_solver --dump_path /raid/aryan/sana-1600m-1024px-diffusers --dtype fp32 --save_full_pipeline

inference:

import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained("/raid/aryan/sana-1600m-1024px-diffusers", torch_dtype=torch.float32)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.transformer = pipe.transformer.to(torch.float16)
pipe.vae.to(torch.float32)

image = pipe(
    prompt="a cyberpunk cat with a neon sign that says \"Sana\"",
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('output.png')

@bghira
Copy link
Contributor

bghira commented Dec 11, 2024

as far as i've seen, the bf16 weights released now don't quite work yet on the current PR here so i think there is more adjusting to be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

9 participants