Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory efficient attention not gain speedups on A10 and V100 #762

Open
lucasjinreal opened this issue Jun 7, 2023 · 5 comments
Open

Memory efficient attention not gain speedups on A10 and V100 #762

lucasjinreal opened this issue Jun 7, 2023 · 5 comments

Comments

@lucasjinreal
Copy link

Using diffuers and enable enable_xformers_memory_efficient_attention

But the speed didn't get any changed. Why?

@yjhong89
Copy link

yjhong89 commented Jun 8, 2023

Using xformers on V100 doesn't gain any speedup for me either.
Time per iteration increase if using xformers on V100.

  • Above (using xformers), Below (vanilla cross attention) when training stable diffusion
    image

@danthe3rd
Copy link
Contributor

Hi,
What is the version of XFormers you are using? Is XFormers using less GPU memory?
It might be because "vanilla" diffusers is now using XFormers kernels which were integrated inside Pytorch, but not sure about this. Might be better to open an issue in diffusers (please tag me if you do so)

@yjhong89
Copy link

yjhong89 commented Jun 8, 2023

  • My XFormer version is 0.0.20.
  • And I check XFormer using less GPU memory than vanilla version.
  • But elapsed time per iteration little bit increased than vanilla attention.
  • Do I have to use fp16 ?? (Current setting using fp32)

@danthe3rd
Copy link
Contributor

  • Do I have to use fp16 ?? (Current setting using fp32)

Oh yes - good catch! We have kernels for f32 but they are not really efficient. You should use f16 or bf16 if possible to get the best speed. In fact, it's very likely that xFormers induces a slow-down when training in f32.

@yjhong89
Copy link

yjhong89 commented Jun 8, 2023

Okay, Thank you! I'll try using fp16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants