Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deepspeed, xformers, kernl, transformerengine, ColossalAI, tritonserver, VoltaML, etc #21

Open
0xdevalias opened this issue Nov 24, 2022 · 3 comments

Comments

@0xdevalias
Copy link

0xdevalias commented Nov 24, 2022

I've been bouncing around various StableDiffusion optimisations the last couple of weeks, and figured I would link out to some of the ones I remember in hopes that they can be explored/added into the benchmarks/comparisons here:

@glennko
Copy link
Member

glennko commented Nov 27, 2022

Thanks for sharing this!

  • Upon reviewing the first repo on the list, voltaML-fast-stable-diffusion, we found out that they copied our code (they do reference us in their README).
  • Second repo should have similar #s of ONNX (CUDA) from our results. Triton inference server doesn't add much acceleration without TensorRT.
  • We are planning to benchmark Int8 on Intel CPUs with VNNI. Int8 on GPUs may have smaller footprint on GPUs but not faster than others listed on this repo.
  • AI Template is already on the repo.
  • To the best of my knowledge, Kernl doesn't have the Triton kernels for entirety of stable diffusion models.
  • Colossalai's example only accelerates training, this repo focuses on inference.

Our conclusion still holds for now and AITemplate is still the fastest. Please let us know if you have any other suggestions! We are looking for ways to improve this.

@0xdevalias
Copy link
Author

0xdevalias commented Nov 30, 2022

Thanks for your detailed response :)

Colossalai's example only accelerates training, this repo focuses on inference.

Is that true? They definitely talk about inference here (though I didn't explore too deeply to see what optimisations are applied):

A bit further down on the page they reference some of the optimisations they make use of:

@0xdevalias
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants