test_dummy_mha_with_nt_cuda
fails on sm70
, sm75
#129523
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
module: multi-headed-attention
module: nestedtensor
NestedTensor tag see issue #25032
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Describe the bug
Looks like it's dispatching to efficient attention backward and failing one of the shape checks (
)
failing call:
Printing
k.sizes()
here shows:[1, 6, 2, 3]
whenmax_seqlen_k
is10
.Doesn't seem to happen on sm80+ as they seem to be able to dispatch to FA instead?
Interestingly fixing the backend on sm80+ with a decorator to run on efficient-attention only gives:
Simply removing the
max_seqlen_k <= k.size(1)
shape check allows for test to pass but I'm not sure that's correct---is there some special inductor/symbolic tracing accounting for shapes that needs to be done here?CC @drisspg
Versions
Current 2024/06/25 source build
cc @ptrblck @msaroufim @cpuhrsch @jbschlosser @bhosmer @drisspg @soulitzer
The text was updated successfully, but these errors were encountered: