Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

0xdevalias · 2022-11-24T07:44:36Z

I've been bouncing around various StableDiffusion optimisations the last couple of weeks, and figured I would link out to some of the ones I remember in hopes that they can be explored/added into the benchmarks/comparisons here:

https://github.com/VoltaML/voltaML-fast-stable-diffusion
- Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with single line of code
https://github.com/kamalkraj/stable-diffusion-tritonserver
- Deploy stable diffusion model with onnx + tritonserver
https://github.com/luohao123/gaintmodels
- Int8 StableFusion model
Stable Diffusion Meta AITemplate with >= 200% performance increase AUTOMATIC1111/stable-diffusion-webui#1625
- AITemplate
[Feature Request]: Explore NVIDIA/TransformerEngine for speed/efficiency AUTOMATIC1111/stable-diffusion-webui#4721
- https://github.com/NVIDIA/TransformerEngine
[Feature Request]: Explore potential StableDiffusion speed benefits from implementing kernl (Up to 12X faster GPU inference) AUTOMATIC1111/stable-diffusion-webui#4096
- https://github.com/ELS-RD/kernl
  - Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
- https://www.reddit.com/r/MachineLearning/comments/ydqmjp/p_up_to_12x_faster_gpu_inference_on_bert_t5_and/
  - Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels
[Feature Request]: colossalai integration AUTOMATIC1111/stable-diffusion-webui#4606
- https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion?rgh-link-date=2022-11-11T18%3A20%3A17Z
- Other related issues: [Feature Request]: colossalai integration AUTOMATIC1111/stable-diffusion-webui#4606 (comment)
etc

glennko · 2022-11-27T23:43:08Z

Thanks for sharing this!

Upon reviewing the first repo on the list, voltaML-fast-stable-diffusion, we found out that they copied our code (they do reference us in their README).
Second repo should have similar #s of ONNX (CUDA) from our results. Triton inference server doesn't add much acceleration without TensorRT.
We are planning to benchmark Int8 on Intel CPUs with VNNI. Int8 on GPUs may have smaller footprint on GPUs but not faster than others listed on this repo.
AI Template is already on the repo.
To the best of my knowledge, Kernl doesn't have the Triton kernels for entirety of stable diffusion models.
Colossalai's example only accelerates training, this repo focuses on inference.

Our conclusion still holds for now and AITemplate is still the fastest. Please let us know if you have any other suggestions! We are looking for ways to improve this.

0xdevalias · 2022-11-30T20:35:02Z

Thanks for your detailed response :)

Colossalai's example only accelerates training, this repo focuses on inference.

Is that true? They definitely talk about inference here (though I didn't explore too deeply to see what optimisations are applied):

https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion#inference

A bit further down on the page they reference some of the optimisations they make use of:

https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion#comments
- The implementation of the transformer encoder is from x-transformers by lucidrains.
- The implementation of flash attention is from HazyResearch.

0xdevalias · 2022-11-30T21:03:50Z

Another one I stumbled upon:

Add INT8 Stable Diffusion through Optimum huggingface/diffusers#1324
- https://github.com/huggingface/optimum-intel
  - Accelerate inference of 🤗 Transformers with Intel optimization tools
  - https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor/text-to-image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

0xdevalias commented Nov 24, 2022 •

edited

glennko commented Nov 27, 2022

0xdevalias commented Nov 30, 2022 •

edited

0xdevalias commented Nov 30, 2022

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

Comments

0xdevalias commented Nov 24, 2022 • edited

glennko commented Nov 27, 2022

0xdevalias commented Nov 30, 2022 • edited

0xdevalias commented Nov 30, 2022

0xdevalias commented Nov 24, 2022 •

edited

0xdevalias commented Nov 30, 2022 •

edited