[ONNX] RMS Norm #159377

justinchuby · 2025-07-29T16:31:18Z

Implement rms norm using onnx RMSNormalization-23

Use the correct eps for float32

pytorch/aten/src/ATen/native/cuda/layer_norm_kernel.cu

Lines 1844 to 1866 in eaadd12

    
           auto acc_type = at::toAccumulateType(input.scalar_type(), /*is_cuda=*/true); 
        
           double eps_val; 
        
           if (acc_type == at::ScalarType::Float) { 
        
             eps_val = eps.value_or(std::numeric_limits<float>::epsilon()); 
        
           } else { 
        
             eps_val = eps.value_or(std::numeric_limits<double>::epsilon()); 
        
           } 
        
           Tensor Y = at::native::empty_like( 
        
               *X, 
        
               std::nullopt /* dtype */, 
        
               std::nullopt /* layout */, 
        
               std::nullopt /* device */, 
        
               std::nullopt /* pin_memory */, 
        
               LEGACY_CONTIGUOUS_MEMORY_FORMAT); 
        
           Tensor rstd = at::empty({M}, X->options().dtype(acc_type)); 
        
           if (M > 0) { 
        
             RmsNormKernelImpl(*X, *gamma, M, N, eps_val, &Y, &rstd); 
        
           } 
        
           const auto input_shape = input.sizes(); 
        
           const size_t axis = input.dim() - normalized_shape.size();

Created facility to run tests with the reference runtime by extending ONNXProgram and assert_onnx_program.

Fix #159257

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

pytorch-bot · 2025-07-29T16:31:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159377

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 340dc41 with merge base f4bfac1 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

titaiwangms · 2025-07-29T16:41:39Z

Where do we add the tests for op23 symbolic functions?

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby · 2025-07-29T17:10:45Z

Where do we add the tests for op23 symbolic functions?

I added tests in small models e2e. I think we can extend the opinfo tests later.

justinchuby · 2025-07-30T01:05:12Z

@pytorchbot merge -i

pytorchmergebot · 2025-07-30T01:06:59Z

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

justinchuby · 2025-07-30T02:29:11Z

In test: export with opset 23

test/onnx/exporter/test_small_models_e2e.py

pytorchmergebot · 2025-07-30T02:30:39Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2025-07-30T03:50:56Z

test_rms_norm needs 5e-5

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2025-07-30T14:51:35Z

@pytorchbot merge -i

pytorchmergebot · 2025-07-30T14:54:03Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-07-30T15:02:29Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2025-07-30T15:19:06Z

The default eps 1e-5 we use is too large. Reading the decomposed program we see the eps used in pytorch:

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x: "f32[2, 5, 7, 3]"):
             # File: /Users/justinc/Documents/GitHub/pytorch/test/onnx/exporter/test_small_models_e2e.py:772 in forward, code: return torch.nn.functional.rms_norm(x, [7, 3])
            pow_1: "f32[2, 5, 7, 3]" = torch.ops.aten.pow.Tensor_Scalar(x, 2)
            mean: "f32[2, 5, 1, 1]" = torch.ops.aten.mean.dim(pow_1, [3, 2], True);  pow_1 = None
            add: "f32[2, 5, 1, 1]" = torch.ops.aten.add.Scalar(mean, 1.1920928955078125e-07);  mean = None
            rsqrt: "f32[2, 5, 1, 1]" = torch.ops.aten.rsqrt.default(add);  add = None
            mul: "f32[2, 5, 7, 3]" = torch.ops.aten.mul.Tensor(x, rsqrt);  rsqrt = None
            type_as: "f32[2, 5, 7, 3]" = torch.ops.aten.type_as.default(mul, x);  mul = x = None
            return (type_as,)
            
Graph signature: 
    # inputs
    x: USER_INPUT
    
    # outputs
    type_as: USER_OUTPUT
    
Range constraints: {}

justinchuby · 2025-07-30T15:53:31Z

@pytorchbot merge -i

pytorchmergebot · 2025-07-30T15:55:15Z

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

- Implement rms norm using onnx RMSNormalization-23 - Use the correct eps for float32 https://github.com/pytorch/pytorch/blob/eaadd1282c8e66f37acf54f95668529831c95df7/aten/src/ATen/native/cuda/layer_norm_kernel.cu#L1844-L1866 <img width="743" height="107" alt="image" src="https://github.com/user-attachments/assets/a6fd45aa-01d9-4667-924d-3012232cfcde" /> - Created facility to run tests with the reference runtime by extending ONNXProgram and assert_onnx_program. Fix #159257 Pull Request resolved: #159377 Approved by: https://github.com/titaiwangms

justinchuby added 2 commits July 29, 2025 08:19

impl

c06b79f

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

tests

105a753

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby requested review from titaiwangms, shubhambhokare1 and wschin as code owners July 29, 2025 16:31

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Jul 29, 2025

pytorchbot added the open source label Jul 29, 2025

justinchuby added 3 commits July 29, 2025 09:56

Test with reference

c358f5d

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

lint

97ad86b

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

fix test

9e70e91

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

justinchuby added module: onnx Related to torch.onnx topic: improvements topic category labels Jul 29, 2025

titaiwangms approved these changes Jul 30, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 30, 2025

pytorchmergebot added the merging label Jul 30, 2025

justinchuby commented Jul 30, 2025

View reviewed changes

test/onnx/exporter/test_small_models_e2e.py Outdated Show resolved Hide resolved

justinchuby added 2 commits July 29, 2025 19:49

set opset_version in tests

051a4d7

Signed-off-by: Justin Chu <justinchu@microsoft.com>

opset_version

32e0568

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Update tests

9db919d

Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby added 3 commits July 30, 2025 08:08

Fix eps

04d20e5

Signed-off-by: Justin Chu <justinchu@microsoft.com>

Use finfo

79354f5

Signed-off-by: Justin Chu <justinchu@microsoft.com>

no report

340dc41

Signed-off-by: Justin Chu <justinchu@microsoft.com>

pytorchmergebot added the Merged label Jul 30, 2025

pytorchmergebot closed this in 73ee323 Jul 30, 2025

pytorchmergebot removed the merging label Jul 30, 2025

justinchuby deleted the justinchu/rms branch July 30, 2025 20:50

justinchuby restored the justinchu/rms branch July 30, 2025 20:50

justinchuby deleted the justinchu/rms branch July 30, 2025 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] RMS Norm #159377

[ONNX] RMS Norm #159377

justinchuby commented Jul 29, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

titaiwangms commented Jul 29, 2025

Uh oh!

justinchuby commented Jul 29, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

Uh oh!

	auto acc_type = at::toAccumulateType(input.scalar_type(), /is_cuda=/true);
	double eps_val;
	if (acc_type == at::ScalarType::Float) {
	eps_val = eps.value_or(std::numeric_limits<float>::epsilon());
	} else {
	eps_val = eps.value_or(std::numeric_limits<double>::epsilon());
	}

	Tensor Y = at::native::empty_like(
	*X,
	std::nullopt /* dtype */,
	std::nullopt /* layout */,
	std::nullopt /* device */,
	std::nullopt /* pin_memory */,
	LEGACY_CONTIGUOUS_MEMORY_FORMAT);
	Tensor rstd = at::empty({M}, X->options().dtype(acc_type));

	if (M > 0) {
	RmsNormKernelImpl(X, gamma, M, N, eps_val, &Y, &rstd);
	}

	const auto input_shape = input.sizes();
	const size_t axis = input.dim() - normalized_shape.size();

[ONNX] RMS Norm #159377

[ONNX] RMS Norm #159377

Conversation

justinchuby commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159377

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

titaiwangms commented Jul 29, 2025

Uh oh!

justinchuby commented Jul 29, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Merge started

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Merge started

Uh oh!

pytorchmergebot commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

justinchuby commented Jul 30, 2025

Uh oh!

pytorchmergebot commented Jul 30, 2025

Merge started

Uh oh!

Uh oh!

justinchuby commented Jul 29, 2025 •

edited

Loading

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading