Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token Merging (ToMe) for Stable Diffusion #2940

Closed
takuma104 opened this issue Mar 31, 2023 · 8 comments
Closed

Token Merging (ToMe) for Stable Diffusion #2940

takuma104 opened this issue Mar 31, 2023 · 8 comments
Assignees

Comments

@takuma104
Copy link
Contributor

takuma104 commented Mar 31, 2023

ToMe for SD speeds up diffusion by merging redundant tokens. by @dbolya

2023-04-01 0 06 41

Token Merging (ToMe) speeds up transformers by merging redundant tokens, which means the transformer has to do less work. We apply this to the underlying transformer blocks in Stable Diffusion in a clever way that minimizes quality loss while keeping most of the speed-up and memory benefits. ToMe for SD doesn't require training and should work out of the box for any Stable Diffusion model.

Code: https://github.com/dbolya/tomesd
Paper: https://arxiv.org/abs/2303.17604

I conducted a simple generation speed benchmark on my end. I applied a patch to stable-diffusion-webui and used the best value from 4 runs via the API. For the baseline, I used xFormers, as it is commonly used in use cases seeking speed. xFormers is also enabled when using ToMe. I adopted the recommended quality value of 0.5 for ToMe's ratio. It was curious to see high-resolution images being adopted in the paper, but the method is more effective for high-resolution images. This could be a useful feature for high-resolution use cases.

Resolution [px^2] Baseline [it/s]↑ ToMe ratio=0.5 [it/s]↑ Speedup [x]↑
512 10.47 10.59 1.01
768 4.56 5.03 1.10
1024 2.34 2.85 1.22
1280 1.26 1.67 1.33
1536 0.74 1.06 1.44
1792 0.45 0.69 1.55
2048 0.28 0.47 1.65
@sayakpaul
Copy link
Member

sayakpaul commented Apr 4, 2023

Hi. This is actually on our radar. The Meta team might soon start working on this.

@patrickvonplaten
Copy link
Contributor

This looks really cool!

@sayakpaul
Copy link
Member

sayakpaul commented Apr 5, 2023

@dbolya has this: https://github.com/dbolya/tomesd#diffusers. This allows users to just do:

import torch, tomesd
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

image = pipe("a photo of an astronaut riding a horse on mars").images[0]
image.save("astronaut.png")

However, it needs tomesd to be installed. I don't think we can have tomesd as a hard dependency. I think we have two choices:

  • Copy-paste code from tomesd with all the necessary credits and allow users to use ToMe natively from diffusers.
  • Add tomesd as a soft dependency to diffusers for users wanting to use ToMe in their pipelines.
  • Add a section in https://huggingface.co/docs/diffusers/main/en/optimization/opt_overview about ToMe with the use of tomesd including a few benchmarks on the speedup. I think this is the best option.

@patrickvonplaten WDYT?

Cc: @dbolya

@dbolya
Copy link

dbolya commented Apr 5, 2023

Option 3 seems the best to me. I would caution against just copy-pasting the code, since I do plan to update tomesd in the future, and having duplicate code would be a hassle. Other optimization packages also require separate installations already (e.g., xformers), so having the user optionally install tomesd seems reasonable to me.

@patrickvonplaten
Copy link
Contributor

Option 3.) is ok/good for me!

@sayakpaul sayakpaul self-assigned this Apr 6, 2023
@sayakpaul
Copy link
Member

Alright.

I will start working on the doc soon.

@sayakpaul
Copy link
Member

Closing this with #3208.

@bigmover
Copy link

ToMe for SD speeds up diffusion by merging redundant tokens. by @dbolya

2023-04-01 0 06 41

Token Merging (ToMe) speeds up transformers by merging redundant tokens, which means the transformer has to do less work. We apply this to the underlying transformer blocks in Stable Diffusion in a clever way that minimizes quality loss while keeping most of the speed-up and memory benefits. ToMe for SD doesn't require training and should work out of the box for any Stable Diffusion model.

Code: https://github.com/dbolya/tomesd Paper: https://arxiv.org/abs/2303.17604

I conducted a simple generation speed benchmark on my end. I applied a patch to stable-diffusion-webui and used the best value from 4 runs via the API. For the baseline, I used xFormers, as it is commonly used in use cases seeking speed. xFormers is also enabled when using ToMe. I adopted the recommended quality value of 0.5 for ToMe's ratio. It was curious to see high-resolution images being adopted in the paper, but the method is more effective for high-resolution images. This could be a useful feature for high-resolution use cases.

Resolution [px^2] Baseline [it/s]↑ ToMe ratio=0.5 [it/s]↑ Speedup [x]↑
512 10.47 10.59 1.01
768 4.56 5.03 1.10
1024 2.34 2.85 1.22
1280 1.26 1.67 1.33
1536 0.74 1.06 1.44
1792 0.45 0.69 1.55
2048 0.28 0.47 1.65

Is ToMe + controlnet avaible for use? I have heared that ToMe will be less affective due to controlnet modifying SD‘s forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants