Skip to content

Add MPS SDPA workarounds for value head dim and bidirectional attention#44591

Closed
moktamd wants to merge 2 commits into
huggingface:mainfrom
moktamd:fix/mps-sdpa-workarounds
Closed

Add MPS SDPA workarounds for value head dim and bidirectional attention#44591
moktamd wants to merge 2 commits into
huggingface:mainfrom
moktamd:fix/mps-sdpa-workarounds

Conversation

@moktamd
Copy link
Copy Markdown

@moktamd moktamd commented Mar 11, 2026

Adds _apply_mps_fixes in sdpa_attention.py to handle two upstream PyTorch MPS bugs:

  1. MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim pytorch/pytorch#176767 (fixed in PyTorch 2.12): pads value tensor when v_head_dim != q_head_dim to avoid corrupted output. Affects DeepSeek models with MQA.

  2. [MPS] Out of bounds memory access/corruption and correctness issue in SDPA pytorch/pytorch#174861 (fixed in PyTorch 2.11): forces a non-bool attention mask for non-causal, non-float32 attention to route through sdpa_general_mps instead of broken sdpa_vector_2pass_mps.

Both fixes are version-gated and will no-op once the upstream PyTorch fixes are available.

Fixes #44554
Fixes #44247

moktamd added 2 commits March 11, 2026 10:32
…on bugs

Add _apply_mps_fixes to handle two upstream PyTorch MPS bugs:

1. pytorch/pytorch#176767: pad value tensor when v_head_dim != q_head_dim
   to avoid corrupted SDPA output (affects DeepSeek models, fixed in 2.12)

2. pytorch/pytorch#174861: force a non-bool attention mask for non-causal
   attention with non-float32 dtypes to route through sdpa_general_mps
   instead of the broken sdpa_vector_2pass_mps path (fixed in 2.11)

Fixes huggingface#44554
Fixes huggingface#44247
@hvaara
Copy link
Copy Markdown
Contributor

hvaara commented Mar 11, 2026

@moktamd Great that you want to contribute! Unfortunately these issues are actively worked on already. It is clear from my discussion with the maintainers in #44247 and #44554, on top of me being the author of both issues. Please find another issue to work on that is not already taken. Thank you.

@moktamd
Copy link
Copy Markdown
Author

moktamd commented Mar 11, 2026

Apologies, I missed that you were already working on this. Closing in your favor. Good luck with the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MPS] Upstream correctness issue in attention when value head dim differs from query [MPS] Silent correctness issue in bidirectional attention

3 participants