Skip to content

Token Merging (ToMe) for Stable Diffusion #2940

Closed
@takuma104

Description

@takuma104

ToMe for SD speeds up diffusion by merging redundant tokens. by @dbolya

2023-04-01 0 06 41

Token Merging (ToMe) speeds up transformers by merging redundant tokens, which means the transformer has to do less work. We apply this to the underlying transformer blocks in Stable Diffusion in a clever way that minimizes quality loss while keeping most of the speed-up and memory benefits. ToMe for SD doesn't require training and should work out of the box for any Stable Diffusion model.

Code: https://github.com/dbolya/tomesd
Paper: https://arxiv.org/abs/2303.17604

I conducted a simple generation speed benchmark on my end. I applied a patch to stable-diffusion-webui and used the best value from 4 runs via the API. For the baseline, I used xFormers, as it is commonly used in use cases seeking speed. xFormers is also enabled when using ToMe. I adopted the recommended quality value of 0.5 for ToMe's ratio. It was curious to see high-resolution images being adopted in the paper, but the method is more effective for high-resolution images. This could be a useful feature for high-resolution use cases.

Resolution [px^2] Baseline [it/s]↑ ToMe ratio=0.5 [it/s]↑ Speedup [x]↑
512 10.47 10.59 1.01
768 4.56 5.03 1.10
1024 2.34 2.85 1.22
1280 1.26 1.67 1.33
1536 0.74 1.06 1.44
1792 0.45 0.69 1.55
2048 0.28 0.47 1.65

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions