Skip to content

Speed up NAX split-K by better tuning and routing and fix NAX addmm#3422

Merged
angeloskath merged 3 commits intomainfrom
routing-nax-mm
Apr 20, 2026
Merged

Speed up NAX split-K by better tuning and routing and fix NAX addmm#3422
angeloskath merged 3 commits intomainfrom
routing-nax-mm

Conversation

@angeloskath
Copy link
Copy Markdown
Member

Well the title says it all

  • Split-K matmuls were being routed to the non NAX version
  • Adds some tuning on the M5 Max to improve further
  • AddMM was completely broken on NAX this fixes it
    • Also it is a bit unclear whether the const_for_loop helped with codegen cause there is some measurement variance but it is quite clean and comes with compile time bounds checking that already found a bug when I was fixing this.

Here are the first measurements on the Max

Shape (MxK)  Before  After   Speedup
-------------------------------------
128x128      0.009   0.01    0.9
128x256      0.012   0.01    1.2
128x512      0.014   0.012   1.17
128x1024     0.015   0.016   0.94
256x128      0.013   0.011   1.18
256x256      0.016   0.011   1.45
256x512      0.017   0.015   1.13
256x1024     0.025   0.016   1.56
256x2048     0.042   0.025   1.68
512x128      0.018   0.015   1.2
512x256      0.021   0.015   1.4
512x512      0.049   0.018   2.72
512x1024     0.057   0.029   1.97
512x2048     0.105   0.052   2.02
512x4096     0.195   0.086   2.27
1024x256     0.057   0.03    1.9
1024x512     0.105   0.048   2.19
1024x1024    0.09    0.09    1.0
1024x2048    0.164   0.164   1.0
1024x4096    0.313   0.301   1.04
1280x1280    0.164   0.164   1.0
1280x2048    0.238   0.238   1.0
1280x4096    0.47    0.459   1.02
2048x512     0.159   0.159   1.0
2048x1024    0.291   0.292   1.0
2048x2048    0.55    0.55    1.0
2048x4096    1.099   1.101   1.0
4096x1024    1.092   1.093   1.0
4096x2048    2.172   2.173   1.0
4096x4096    4.344   4.346   1.0
8192x2048    8.695   8.711   1.0
8192x4096    17.431  17.471  1.0
8192x8192    35.13   35.100  1.0

the benchmark is running (MxK @ KxM) @ MxK chained.

I will probably add some numbers from other M5s before merging.

@angeloskath angeloskath requested a review from jagrit06 April 18, 2026 03:47
ciaranbor pushed a commit to exo-explore/exo that referenced this pull request Apr 19, 2026
## Motivation

Vision models don't understand images on M5 series MacBooks. The
upstream NAX addmm fix (ml-explore/mlx#3422)
fixes this.

## Why It Works
Same conclusion I came to when I was debugging the issue on an M5 Max.
It works after this fix.

## Test Plan

### Manual Testing
Works for Qwen3.5 27B
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@angeloskath angeloskath merged commit a6222f5 into main Apr 20, 2026
31 of 32 checks passed
@angeloskath angeloskath deleted the routing-nax-mm branch April 20, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants