Fix 2pass sdpa on < M2 by awni · Pull Request #3099 · ml-explore/mlx

awni · 2026-02-05T15:02:47Z

Honestly I think this is a lower level issue. For some reason using blocks = 128 when blocks is a function constant makes the kernel support < 1024 threads per thread group. Other values (64, 256, etc) do not have that issue. This is just for bfloat16 on M1 and M2 🤷‍♂️

angeloskath

Thanks for the fix.

I assume there isn't a perf regression...

awni · 2026-02-05T16:51:05Z

No but at first I tried getting rid of the blocks function constant entirely from the first pass as well and there was a very note-able regression.. which was unexpected.

fix 2pass sdpa on < M2

9ac787d

awni force-pushed the fix_2pass_sdpa branch from 549f517 to 9ac787d Compare February 5, 2026 15:19

angeloskath approved these changes Feb 5, 2026

View reviewed changes

awni merged commit 99ca62c into main Feb 5, 2026
16 checks passed

awni deleted the fix_2pass_sdpa branch February 5, 2026 16:51

N1k1tung mentioned this pull request Feb 7, 2026

Qwen3-Coder-Next-4bit produces corrupted / garbage tokens with long prompts ml-explore/mlx-lm#856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 2pass sdpa on < M2#3099

Fix 2pass sdpa on < M2#3099
awni merged 1 commit intomainfrom
fix_2pass_sdpa

awni commented Feb 5, 2026

Uh oh!

angeloskath left a comment

Uh oh!

awni commented Feb 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Feb 5, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

awni commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Feb 5, 2026 •

edited

Loading