sycl: add RMS_NORM_BACK operation support #16808

YaelLogic · 2025-10-27T19:45:57Z

Summary

Add SYCL backend support for RMS_NORM_BACK using a single FP32 compensated parallel reduction path.
No changes to the public API. Default numerical accuracy is preserved; a fast opt-in macro is also available.

Implementation

Algorithm (onsistent with existing backend behavior)

inv_r = 1 / sqrt( (Σ x²) / D + eps )
coeff = − (Σ x·dz) / (Σ x² + D·eps)
dx[i] = (dz[i] + coeff * x[i]) * inv_r

What was implemented

Per-thread accumulation of Σ x² and Σ x·dz with Kahan-style compensation.
Warp (sub_group) reduction via warp_reduce_sum.
Cross-warp reduction using local memory (one value per warp) with a single barrier.
group_broadcast used to distribute inv_r and coeff across the work-group.
Work-group size: multiple of WARP_SIZE, capped by device limit (≤256), not larger than D.

Optional fast path

Define GGML_SYCL_RMS_BACK_FAST to disable compensated summation and use plain FP32 accumulation.
Default remains high-accuracy compensated mode.

Validation

Focused tests executed locally:

Test Suite	Result
RMS_NORM_BACK (CPU)	4 / 4 passed
RMS_NORM_BACK (SYCL host/GPU)	4 / 4 passed
Sanity check (NMSE)	≈ 1e-11

Build is warning-free for this code path.

Reproduce (build + test)

# Configure & build with SYCL
cmake -B build -DGGML_SYCL=ON && cmake --build build -j"$(nproc)"

# Focused RMS_NORM_BACK tests
./build/bin/test-backend-ops test -o RMS_NORM_BACK -b CPU
SYCL_DEVICE_FILTER=host ./build/bin/test-backend-ops test -o RMS_NORM_BACK -b SYCL0
./build/bin/test-backend-ops test -o RMS_NORM_BACK -b SYCL0

# Optional fast path (less numerically stable)
# add -DGGML_SYCL_RMS_BACK_FAST to your compiler definitions

Files Changed (minimal scope only)

File	Purpose
`ggml/src/ggml-sycl/norm.cpp`	Implementation of `ggml_sycl_op_rms_norm_back`
`ggml/src/ggml-sycl/ggml-sycl.cpp`	Operation dispatch registration
`ggml/src/ggml-sycl/norm.hpp`	Function declaration
`docs/ops.md`	Mark RMS_NORM_BACK as ✅ for SYCL
`docs/ops/SYCL.csv`	Mark RMS_NORM_BACK entries as supported

No unrelated files or personal data included.

Notes & Risks

Default path gives high numerical accuracy using compensated FP32 sums.
Fast path is fully optional (disabled by default).
Reduction order on GPUs is not bitwise-identical to CPU, but produces NMSE ≈ 1e-11.

Reviewers

cc @CISC @NeoZhangJianyu
Looking forward to your feedback. Thanks in advance!

…epoint before further changes

Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).

YaelLogic · 2025-10-27T23:36:22Z

This PR is ready for review.
Tagging @CISC and @NeoZhangJianyu — your feedback would be greatly appreciated whenever you have the chance.
Thanks for your work on maintaining and improving the SYCL backend!

ggml/src/ggml-sycl/norm.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

NeoZhangJianyu

It's good job!

Thank you!

NeoZhangJianyu · 2025-10-28T06:00:34Z

@YaelLogic
Please fix the EditorConfig issue.

YaelLogic · 2025-10-28T10:05:00Z

Hi @NeoZhangJianyu,
the EditorConfig issue has been fixed.
Please let me know if anything else is needed. Thank you.

YaelLogic added 6 commits October 27, 2025 18:49

sycl: add RMS_NORM_BACK operation support

6c341bc

sycl: rms_norm_back: add dual reduction paths (FP64 and FP32) and sav…

757bd10

…epoint before further changes

sycl: add RMS_NORM_BACK support

6ddf4b6

Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).

revert: restore .gitignore and tools/run/CMakeLists.txt to upstream

13611fd

revert: restore tests/CMakeLists.txt to upstream

db7f078

sycl: optimize rms_norm_back

6dec189

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 27, 2025

fix: restore SYCL.csv to correct state with RMS_NORM_BACK support

30e78ed

NeoZhangJianyu reviewed Oct 28, 2025

View reviewed changes

ggml/src/ggml-sycl/norm.cpp Outdated Show resolved Hide resolved

Update ggml/src/ggml-sycl/norm.cpp

a2fa50a

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

NeoZhangJianyu approved these changes Oct 28, 2025

View reviewed changes

fix: remove trailing whitespace and add missing newline (EditorConfig)

c7bc7f0

NeoZhangJianyu merged commit 338074c into ggml-org:master Oct 29, 2025
76 of 79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl: add RMS_NORM_BACK operation support #16808

sycl: add RMS_NORM_BACK operation support #16808

Uh oh!

YaelLogic commented Oct 27, 2025

Uh oh!

YaelLogic commented Oct 27, 2025

Uh oh!

Uh oh!

NeoZhangJianyu left a comment

Uh oh!

NeoZhangJianyu commented Oct 28, 2025

Uh oh!

YaelLogic commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sycl: add RMS_NORM_BACK operation support #16808

sycl: add RMS_NORM_BACK operation support #16808

Uh oh!

Conversation

YaelLogic commented Oct 27, 2025

Summary

Implementation

Algorithm (onsistent with existing backend behavior)

What was implemented

Optional fast path

Validation

Reproduce (build + test)

Files Changed (minimal scope only)

Notes & Risks

Reviewers

Uh oh!

YaelLogic commented Oct 27, 2025

Uh oh!

Uh oh!

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

Uh oh!

NeoZhangJianyu commented Oct 28, 2025

Uh oh!

YaelLogic commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants