Torch JIT breaks when memory_efficient_attention #406

Dango233 · 2022-09-20T09:11:54Z

🐛 Bug

torch.jit.trace breaks with the following error:
RuntimeError: unsupported output type: int, from operator: xformers::efficient_attention_forward_generic
The output of the ops contains an int that can't be traced by JIT.

Command

To Reproduce

torch.jit.trace the module mentioned in
huggingface/diffusers#532

Expected behavior

No int output so module can be JIT traced.

The text was updated successfully, but these errors were encountered:

danthe3rd · 2022-09-27T14:42:58Z

Thanks for reporting :) Should be fixed in #438

geekinglcq · 2022-10-11T07:41:27Z

Thanks for reporting :) Should be fixed in #438

Hello, has it been fixed right now?

danthe3rd · 2022-10-11T07:49:37Z

Hi, the PR was merged so it should be yes. Please let us know if you have other issues

geekinglcq · 2022-10-11T13:04:30Z

Thank you. I have tried the newest commit of xformers, and the RuntimeError: unsupported output type: int, from operator: xformers::efficient_attention_forward_generic is solved.
However, another problem appears. When I run the following code:

inputs = torch.randn(2,4,64,64, dtype=torch.half, device='cuda:6'), torch.randn(1, dtype=torch.half, device='cuda:6'), torch.randn(2, 77, 768, dtype=torch.half, device='cuda:6')
# Here pipeline is a `diffusers.LDMTextToImagePipeline`
with torch.no_grad():
    with torch.autocast("cuda"):
        jit_unet = torch.jit.trace(pipeline.unet, inputs, strict=False)

But, if I run the code above twice, the error disappears itself 😂 and the pipeline works fine in later parts.

gigadeplex · 2023-09-13T10:35:37Z

I'm getting this error too.
return self._op(*args, **kwargs or {})
RuntimeError: unsupported output type: int, from operator: xformers::efficient_attention_forward_cutlass

roninjiang · 2023-10-19T11:23:43Z

got this error too.

xinlin-xiao · 2023-11-23T03:18:59Z

got save erros when I use torch.jit.trace ，any update？

ShijunK · 2024-03-29T17:04:21Z

I think the original fix (#438) did work, but the issue was re-introduced later in #587

question to @danthe3rd , what's the purpose of two int output values rng_seed, rng_offset? is it possible to re-apply the fix from #438?

danthe3rd · 2024-03-29T17:09:54Z

Oh this is a regression - right.
The purpose of rng_seed, rng_offset is to keep the RNG state for the backward pass. This is useful when there is a dropout in the FW pass, and we need to mask the exact same values in the BW pass (and we don't want to save a "dropout" mask that would be too expensive).
There are also complications due to replaying CUDA Graphs (in which case we want the RNG to be different).
I believe we should be able to store these values in a torch.Tensor, or maybe there is a a best-practice for these sort of issues? @drisspg or @fmassa maybe?

drisspg · 2024-03-29T17:34:11Z

https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/cuda/attention.cu#L1018-L1055
cc @danthe3rd

danthe3rd · 2024-04-04T14:24:32Z

Does JIT support SymInt? Because the version in PT outputs SymInt, not exactly sure why.
Anyway we want to rely on the PyTorch version moving forward (with the C++ code moving to PyTorch repo), so hopefully this can be fixed at the same time.

ShijunK · 2024-04-16T16:43:09Z

@danthe3rd , which version of torch are you referring to? for torch 2.2.0, I see the type is Tensor for both seed and offset

func: _scaled_dot_product_efficient_attention(Tensor query, Tensor key, Tensor value, Tensor? attn_bias, bool compute_log_sumexp, float dropout_p=0.0, bool is_causal=False, *, float? scale=None) -> (Tensor output, Tensor log_sumexp, Tensor philox_seed, Tensor philox_offset) https://github.com/pytorch/pytorch/blob/4cb7dd0fc99981062cebf8d5a94e62b48bf78446/aten/src/ATen/native/native_functions.yaml#L14484-L14488

and here are when they are initialized: https://github.com/pytorch/pytorch/blob/d47f715d29d05e28b94c280f15dce097ef3dc7cb/aten/src/ATen/native/transformers/cuda/attention.cu#L978-L982

ShijunK · 2024-04-16T17:15:35Z

Anyway we want to rely on the PyTorch version moving forward (with the C++ code moving to PyTorch repo)

@danthe3rd are you referring to at::_scaled_dot_product_efficient_attention ?

Dango233 mentioned this issue Sep 20, 2022

Up to 2x speedup on GPUs using memory efficient attention huggingface/diffusers#532

Merged

danthe3rd added the bug Something isn't working label Nov 15, 2022

philipwan mentioned this issue Mar 28, 2024

efficient_attention_forward_cutlass op is incompatible with Torch JIT #1009

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch JIT breaks when memory_efficient_attention #406

Torch JIT breaks when memory_efficient_attention #406

Dango233 commented Sep 20, 2022 •

edited

danthe3rd commented Sep 27, 2022

geekinglcq commented Oct 11, 2022

danthe3rd commented Oct 11, 2022

geekinglcq commented Oct 11, 2022 •

edited

gigadeplex commented Sep 13, 2023

roninjiang commented Oct 19, 2023

xinlin-xiao commented Nov 23, 2023

ShijunK commented Mar 29, 2024

danthe3rd commented Mar 29, 2024

drisspg commented Mar 29, 2024

danthe3rd commented Apr 4, 2024

ShijunK commented Apr 16, 2024

ShijunK commented Apr 16, 2024 •

edited

Torch JIT breaks when memory_efficient_attention #406

Torch JIT breaks when memory_efficient_attention #406

Comments

Dango233 commented Sep 20, 2022 • edited

🐛 Bug

Command

To Reproduce

Expected behavior

danthe3rd commented Sep 27, 2022

geekinglcq commented Oct 11, 2022

danthe3rd commented Oct 11, 2022

geekinglcq commented Oct 11, 2022 • edited

gigadeplex commented Sep 13, 2023

roninjiang commented Oct 19, 2023

xinlin-xiao commented Nov 23, 2023

ShijunK commented Mar 29, 2024

danthe3rd commented Mar 29, 2024

drisspg commented Mar 29, 2024

danthe3rd commented Apr 4, 2024

ShijunK commented Apr 16, 2024

ShijunK commented Apr 16, 2024 • edited

Dango233 commented Sep 20, 2022 •

edited

geekinglcq commented Oct 11, 2022 •

edited

ShijunK commented Apr 16, 2024 •

edited