cuda: fix rope fusion for gemma3 #17378

am17an · 2025-11-19T06:32:51Z

Fix #17322, apparently we weren't checking the order of the ops here and gemma3 has a peculiar set of ops which seem to match ggml_cuda_should_fuse_rope_set_rows

CISC · 2025-11-19T08:21:28Z

BTW, I think there are several issues in ggml_cuda_can_fuse.

There is a pattern of checking the ops size, but not the content before calling ggml_can_fuse_subgraph with ops, like here (which makes the second call redundant):

llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu

Lines 3084 to 3085 in 01590ff

    
           if (ops.size() == 5 && (ggml_can_fuse_subgraph(cgraph, node_idx, ops, {node_idx + 4}) || 
        
                                   ggml_can_fuse_subgraph(cgraph, node_idx, ops, {node_idx + 4}))) {

It should have checked that mul_mat_bias_glu_ops or mul_mat_id_bias_glu_ops were the ops we were trying to fuse first, then made only one ggml_can_fuse_subgraph call.

am17an · 2025-11-19T08:23:44Z

@CISC yeah, I'm refactoring that function. Will open a separate PR

bssrdf · 2025-11-19T23:11:43Z

@am17an, I am interested in implementing some fusion ops for sd.cpp ,which should really benefit from it. Is there any guide about how to add these fusion ops in cuda backends? Or just follow the existing examples? I think a general/brief introduction to this topic is helpful for other developers.

am17an · 2025-11-19T23:44:53Z

@bssrdf Sure, let me write up something

cuda: fix rope fusion for gemma3

01590ff

am17an requested a review from slaren as a code owner November 19, 2025 06:32

am17an requested review from JohannesGaessler and removed request for slaren November 19, 2025 06:32

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 19, 2025

JohannesGaessler approved these changes Nov 19, 2025

View reviewed changes

am17an merged commit fd7353d into ggml-org:master Nov 19, 2025
74 checks passed

am17an deleted the cuda-fix-gemma3n branch November 19, 2025 10:25

ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Nov 19, 2025

cuda: fix rope fusion for gemma3 (ggml-org#17378)

1d2b80f

CISC mentioned this pull request Nov 20, 2025

Eval bug: Regression in Gemma 3n models starting from b7044 - fail sign NULL_POINTER_READ_c0000005_ggml-cuda.dll!Unknown #17322

Closed

am17an mentioned this pull request Nov 21, 2025

Eval bug: Segmentation fault (core dumped) for Gemma 2 series #17426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda: fix rope fusion for gemma3 #17378

cuda: fix rope fusion for gemma3 #17378

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

CISC commented Nov 19, 2025

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

Uh oh!

bssrdf commented Nov 19, 2025 •

edited

Loading

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cuda: fix rope fusion for gemma3 #17378

cuda: fix rope fusion for gemma3 #17378

Uh oh!

Conversation

am17an commented Nov 19, 2025

Uh oh!

CISC commented Nov 19, 2025

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

Uh oh!

bssrdf commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

am17an commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bssrdf commented Nov 19, 2025 •

edited

Loading