Skip to content

Conversation

@am17an
Copy link
Collaborator

@am17an am17an commented Nov 19, 2025

Fix #17322, apparently we weren't checking the order of the ops here and gemma3 has a peculiar set of ops which seem to match ggml_cuda_should_fuse_rope_set_rows

@am17an am17an requested a review from slaren as a code owner November 19, 2025 06:32
@am17an am17an requested review from JohannesGaessler and removed request for slaren November 19, 2025 06:32
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 19, 2025
@CISC
Copy link
Collaborator

CISC commented Nov 19, 2025

BTW, I think there are several issues in ggml_cuda_can_fuse.

There is a pattern of checking the ops size, but not the content before calling ggml_can_fuse_subgraph with ops, like here (which makes the second call redundant):

if (ops.size() == 5 && (ggml_can_fuse_subgraph(cgraph, node_idx, ops, {node_idx + 4}) ||
ggml_can_fuse_subgraph(cgraph, node_idx, ops, {node_idx + 4}))) {

It should have checked that mul_mat_bias_glu_ops or mul_mat_id_bias_glu_ops were the ops we were trying to fuse first, then made only one ggml_can_fuse_subgraph call.

@am17an
Copy link
Collaborator Author

am17an commented Nov 19, 2025

@CISC yeah, I'm refactoring that function. Will open a separate PR

@am17an am17an merged commit fd7353d into ggml-org:master Nov 19, 2025
74 checks passed
@am17an am17an deleted the cuda-fix-gemma3n branch November 19, 2025 10:25
ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Nov 19, 2025
@bssrdf
Copy link
Contributor

bssrdf commented Nov 19, 2025

@am17an, I am interested in implementing some fusion ops for sd.cpp ,which should really benefit from it. Is there any guide about how to add these fusion ops in cuda backends? Or just follow the existing examples? I think a general/brief introduction to this topic is helpful for other developers.

@am17an
Copy link
Collaborator Author

am17an commented Nov 19, 2025

@bssrdf Sure, let me write up something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Regression in Gemma 3n models starting from b7044 - fail sign NULL_POINTER_READ_c0000005_ggml-cuda.dll!Unknown

4 participants