CUDA gather mv by angeloskath · Pull Request #3039 · ml-explore/mlx

angeloskath · 2026-01-22T08:50:02Z

Adds a CUDA implementation for gather_mv, the gather_vm is missing but using this and @zcbenz grouped mm for the sorted version we can run unquantized MoEs on CUDA.

$ mlx_lm.benchmark --model Qwen/Qwen3-30B-A3B -p 2048 -g 128
Running warmup..
Timing with prompt_tokens=2048, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=2101.020, generation_tps=32.027, peak_memory=61.821
Trial 2:  prompt_tps=2088.784, generation_tps=32.078, peak_memory=61.821
Trial 3:  prompt_tps=2101.889, generation_tps=32.072, peak_memory=61.822
Trial 4:  prompt_tps=2095.918, generation_tps=32.046, peak_memory=61.822
Trial 5:  prompt_tps=2112.402, generation_tps=30.730, peak_memory=61.822
Averages: prompt_tps=2100.002, generation_tps=31.791, peak_memory=61.822

awni

Awesome!!

mlx/backend/cuda/matmul.cpp

CUDA gather mv

037fae7

angeloskath requested review from awni and zcbenz January 22, 2026 08:50

awni approved these changes Jan 22, 2026

View reviewed changes

awni reviewed Jan 22, 2026

View reviewed changes

mlx/backend/cuda/matmul.cpp Outdated Show resolved Hide resolved

Fix

c38d9d5

angeloskath merged commit becc769 into main Jan 23, 2026
29 of 32 checks passed

angeloskath deleted the gather-mv branch January 23, 2026 01:20

BrewTestBot mentioned this pull request Jan 27, 2026

mlx 0.30.4 Homebrew/homebrew-core#264789

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

CUDA gather mv#3039

CUDA gather mv#3039
angeloskath merged 2 commits intomainfrom
gather-mv

angeloskath commented Jan 22, 2026

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

angeloskath commented Jan 22, 2026

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants