[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj · 2024-11-21T06:16:57Z

What does this PR do?

This PR will add the official Sana (SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer) into the diffusers lib. Sana first makes the Text-to-Image available on 32x compressed latent space, powered by DC-AE(https://arxiv.org/abs/2410.10733v1) without performance degradation. Also, Sana contains several popular efficiency related techs, like DiT with Linear Attention processor and we use Decoder-only LLM (Gemma-2B-IT) for low GPU requirement and fast speed.

Paper: https://arxiv.org/abs/2410.10629
Original code repo: https://github.com/NVlabs/Sana
Project: https://nvlabs.github.io/Sana

Core contributor of DC-AE:
work with @johnny_ez@163.com

Core library:

We want to collaborate on this PR together with friends from HF. Feel free to contact me here. Cc: @sayakpaul, @yiyixuxu

Core library:

Schedulers: @yiyixuxu
Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
Docs: @stevhliu and @sayakpaul
General functionalities: @sayakpaul @yiyixuxu @DN6

HF projects:

transformers: different repo
safetensors: different repo

-->

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

# Conflicts: # src/diffusers/models/normalization.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

…uggingface#9932) * fix progress bar updates in SD 1.5 PAG Img2Img pipeline --------- Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Co-authored-by: hlky <hlky@hlky.ac>

2. fix the bug of new GLUMBConv; 3. run success;

2. Downloading ckpt from hub automatically;

# Conflicts: # src/diffusers/models/normalization.py

lawrence-cj · 2024-12-09T10:40:22Z

@a-r-r-o-w This branch is already rebased onto the merged DCAE branch and make sure the functions are still normal

2. code update;

bghira · 2024-12-11T19:22:49Z

any update with the bfloat16-compatible model? 🙏

a-r-r-o-w · 2024-12-11T22:53:38Z

@lawrence-cj This is looking very close to merge now! We need to address the following:

Hosting the diffusers format checkpoints (will need your help). I still have some changes that I want to test. Will let you know tomorrow so that the implementations are finalized for transformer.
Once hosted, we can update the docs usages
Since the original weights are jn FP32, I'm assuming that's the precision it was trained on. We should host fp32 weights as default IMO due to this reason. fp16/bf16 weights should be hosted as variants (relevant code has been added to conversion script). Only transformer and text encoder should probably be in variant format, and VAE in fp32
Do we need convert to PAG script? Layers can be specified on-the-fly too when loading models and that is the intended usage, so I think it can be removed.
Do we need the clean caption code? For newer pipelines, I am pushing towards not having those changes because prompt pre-processing should be done outside the pipelines. Whether we keep it or not, it is okay with me but lmk your thoughts on this
For Scheduler changes, I will hand it off to @yiyixuxu and @hlky for reviews
Integration tests (I will take it up after diffusers weights are hosted)
LoRA loading support (future separate PR, I will take it up)

As a note to self, conversion:

python3 scripts/convert_sana_to_diffusers.py --orig_ckpt_path /raid/aryan/sana-1600m-1024px-original/checkpoints/Sana_1600M_1024px.pth --image_size 1024 --model_type SanaMS_1600M_P1_D20 --scheduler_type flow-dpm_solver --dump_path /raid/aryan/sana-1600m-1024px-diffusers --dtype fp32 --save_full_pipeline

inference:

import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained("/raid/aryan/sana-1600m-1024px-diffusers", torch_dtype=torch.float32)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.transformer = pipe.transformer.to(torch.float16)
pipe.vae.to(torch.float32)

image = pipe(
    prompt="a cyberpunk cat with a neon sign that says \"Sana\"",
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('output.png')

bghira · 2024-12-11T23:31:56Z

as far as i've seen, the bf16 weights released now don't quite work yet on the current PR here so i think there is more adjusting to be done

lawrence-cj and others added 30 commits October 18, 2024 17:40

first add a script for DC-AE;

6e616a9

Merge remote-tracking branch 'upstream/main' into DC-AE

d2e187a

DC-AE init

90e8939

replace triton with custom implementation

825c975

1. rename file and remove un-used codes;

3a44fa4

no longer rely on omegaconf and dataclass

55b2615

merge

6fb7fdb

Merge remote-tracking branch 'upstream/main' into DC-AE

c323e76

replace custom activation with diffuers activation

da7caa5

remove dc_ae attention in attention_processor.py

fb6d92a

iinherit from ModelMixin

5e63a1a

inherit from ConfigMixin

72cce2b

dc-ae reduce to one file

8f9b4e4

Merge remote-tracking branch 'upstream/main' into DC-AE

b7f68f9

Merge branch 'huggingface:main' into DC-AE

6d96b95

Merge remote-tracking branch 'refs/remotes/origin/main' into DC-AE

3c3cc51

# Conflicts: # src/diffusers/models/normalization.py

update downsample and upsample

1448681

merge

bf40fe8

clean code

dd7718a

support DecoderOutput

19986a5

Merge branch 'main' into DC-AE

3481e23

Merge branch 'main' into DC-AE

0e818df

remove get_same_padding and val2tuple

c6eb233

remove autocast and some assert

59de0a3

update ResBlock

ea604a4

remove contents within super().__init__

80dce02

Update src/diffusers/models/autoencoders/dc_ae.py

1752afd

Co-authored-by: YiYi Xu <yixu310@gmail.com>

remove opsequential

883bcf4

Merge branch 'DC-AE' of github.com:lawrence-cj/diffusers into DC-AE

25ae389

update other blocks to support the removal of build_norm

96e844b

lawrence-cj and others added 14 commits December 9, 2024 18:33

Update src/diffusers/models/transformers/sana_transformer_2d.py

11f1c8d

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update src/diffusers/models/transformers/sana_transformer_2d.py

12b5160

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update src/diffusers/pipelines/sana/pipeline_sana.py

7823c8f

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update src/diffusers/pipelines/sana/pipeline_sana.py

d3fd40a

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

update Sana for DC-AE's recent commit;

abee1ee

make style && make quality

fe08f67

Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (h…

658e7e9

…uggingface#9932) * fix progress bar updates in SD 1.5 PAG Img2Img pipeline --------- Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

make the vae can be None in __init__ of SanaPipeline

6ab2d8d

Update src/diffusers/models/transformers/sana_transformer_2d.py

7602dd5

Co-authored-by: hlky <hlky@hlky.ac>

change the ae related code due to the latest update of DCAE branch;

5687ba1

change the ae related code due to the latest update of DCAE branch;

b76493f

1. change code based on AutoencoderDC;

297c0e7

2. fix the bug of new GLUMBConv; 3. run success;

update for solving conversation.

e7c1a59

1. fix bugs and run convert script success;

ad3935f

2. Downloading ckpt from hub automatically;

lawrence-cj force-pushed the Sana branch from 7aa1aa9 to ad3935f Compare December 9, 2024 10:33

lawrence-cj added 2 commits December 9, 2024 18:35

Merge remote-tracking branch 'refs/remotes/origin/main' into Sana

2bd4b9c

# Conflicts: # src/diffusers/models/normalization.py

make style && make quality;

f0aa9b9

1. remove un-unsed parameters in init;

7fa435f

2. code update;

StephanAkkerman mentioned this pull request Dec 9, 2024

Try out sana model StephanAkkerman/FluentAI#107

Open

Merge branch 'main' into Sana

bd13e36

a-r-r-o-w added 5 commits December 12, 2024 01:50

Merge branch 'main' into Sana

0f5d848

remove test file

0f9daec

refactor; add docs; add tests; update conversion script

f698524

make style

9d31426

make fix-copies

0debade

a-r-r-o-w added the close-to-merge label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

lawrence-cj commented Nov 21, 2024

lawrence-cj commented Dec 9, 2024

bghira commented Dec 11, 2024

a-r-r-o-w commented Dec 11, 2024 •

edited

Loading

bghira commented Dec 11, 2024

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Are you sure you want to change the base?

[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. #9982

Conversation

lawrence-cj commented Nov 21, 2024

What does this PR do?

Images is generated by SanaPAGPipeline with FlowDPMSolverMultistepScheduler

lawrence-cj commented Dec 9, 2024

bghira commented Dec 11, 2024

a-r-r-o-w commented Dec 11, 2024 • edited Loading

bghira commented Dec 11, 2024

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

[Sana] Add Sana, including `SanaPipeline`, `SanaPAGPipeline`, `LinearAttentionProcessor`, `Flow-based DPM-sovler` and so on. #9982

Images is generated by `SanaPAGPipeline` with `FlowDPMSolverMultistepScheduler`

a-r-r-o-w commented Dec 11, 2024 •

edited

Loading