Skip to content

Conversation

MemoryIt
Copy link

This PR adds a new fused operator in the Metal backend to support Qwen3NextRMSNormGated from Qwen3-Next models:

  • Extends kernel_rms_norm_fuse_impl with F == 4: fuses rms_norm + mul + swiglu_split into a single kernel.
  • Matches the behavior of Qwen3NextRMSNormGated in modeling_qwen3_next.py.
  • Extends fusion detection logic to recognize SWIGLU_SPLIT as a fusible tail op.
  • Expands test_rms_norm_mul_add test class to cover the new fusion pattern.
  • ✅ Passes test-backend-ops with full parameter coverage (broadcast, eps, multi_add).
  • ✅ Passes local CI on macOS (CPU + Metal backends, numerical consistency verified).

Tested on Apple M4 Pro. All tests pass.

Related to #15940

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant