Skip to content

Comments

CUDA gather mv#3039

Merged
angeloskath merged 2 commits intomainfrom
gather-mv
Jan 23, 2026
Merged

CUDA gather mv#3039
angeloskath merged 2 commits intomainfrom
gather-mv

Conversation

@angeloskath
Copy link
Member

Adds a CUDA implementation for gather_mv, the gather_vm is missing but using this and @zcbenz grouped mm for the sorted version we can run unquantized MoEs on CUDA.

$ mlx_lm.benchmark --model Qwen/Qwen3-30B-A3B -p 2048 -g 128
Running warmup..
Timing with prompt_tokens=2048, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=2101.020, generation_tps=32.027, peak_memory=61.821
Trial 2:  prompt_tps=2088.784, generation_tps=32.078, peak_memory=61.821
Trial 3:  prompt_tps=2101.889, generation_tps=32.072, peak_memory=61.822
Trial 4:  prompt_tps=2095.918, generation_tps=32.046, peak_memory=61.822
Trial 5:  prompt_tps=2112.402, generation_tps=30.730, peak_memory=61.822
Averages: prompt_tps=2100.002, generation_tps=31.791, peak_memory=61.822

@angeloskath angeloskath requested review from awni and zcbenz January 22, 2026 08:50
Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!

@angeloskath angeloskath merged commit becc769 into main Jan 23, 2026
29 of 32 checks passed
@angeloskath angeloskath deleted the gather-mv branch January 23, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants