Add MPS SDPA workarounds for value head dim and bidirectional attention by moktamd · Pull Request #44591 · huggingface/transformers

moktamd · 2026-03-11T10:32:26Z

Adds _apply_mps_fixes in sdpa_attention.py to handle two upstream PyTorch MPS bugs:

MPS: scaled_dot_product_attention returns wrong output shape when value dim != query/key dim pytorch/pytorch#176767 (fixed in PyTorch 2.12): pads value tensor when v_head_dim != q_head_dim to avoid corrupted output. Affects DeepSeek models with MQA.
[MPS] Out of bounds memory access/corruption and correctness issue in SDPA pytorch/pytorch#174861 (fixed in PyTorch 2.11): forces a non-bool attention mask for non-causal, non-float32 attention to route through sdpa_general_mps instead of broken sdpa_vector_2pass_mps.

Both fixes are version-gated and will no-op once the upstream PyTorch fixes are available.

…on bugs Add _apply_mps_fixes to handle two upstream PyTorch MPS bugs: 1. pytorch/pytorch#176767: pad value tensor when v_head_dim != q_head_dim to avoid corrupted SDPA output (affects DeepSeek models, fixed in 2.12) 2. pytorch/pytorch#174861: force a non-bool attention mask for non-causal attention with non-float32 dtypes to route through sdpa_general_mps instead of the broken sdpa_vector_2pass_mps path (fixed in 2.11) Fixes huggingface#44554 Fixes huggingface#44247

hvaara · 2026-03-11T12:31:42Z

@moktamd Great that you want to contribute! Unfortunately these issues are actively worked on already. It is clear from my discussion with the maintainers in #44247 and #44554, on top of me being the author of both issues. Please find another issue to work on that is not already taken. Thank you.

moktamd · 2026-03-11T12:51:25Z

Apologies, I missed that you were already working on this. Closing in your favor. Good luck with the fix!

moktamd added 2 commits March 11, 2026 10:32

Merge remote-tracking branch 'origin/main' into fix/mps-sdpa-workarounds

cc9f2da

This was referenced Mar 11, 2026

[MPS] Upstream correctness issue in attention when value head dim differs from query #44554

Open

[MPS] Silent correctness issue in bidirectional attention #44247

Open

Rocketknight1 closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPS SDPA workarounds for value head dim and bidirectional attention#44591

Add MPS SDPA workarounds for value head dim and bidirectional attention#44591
moktamd wants to merge 2 commits into
huggingface:mainfrom
moktamd:fix/mps-sdpa-workarounds

moktamd commented Mar 11, 2026

Uh oh!

hvaara commented Mar 11, 2026

Uh oh!

moktamd commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

moktamd commented Mar 11, 2026

Uh oh!

hvaara commented Mar 11, 2026

Uh oh!

moktamd commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants