Don't call `sum()` on a tensor that is not summable in layer_norm #156600

ahmadsharif1 · 2025-06-23T13:22:20Z

Don't call sum() on a tensor that is default constructed.

Previously we could call sum() on a tensor that was default-contructed. That would lead to an error like this:

Traceback (most recent call last):
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_utils.py", line 3191, in wrapper
    method(*args, **kwargs)
  File "/home/ahmads/personal/pytorch/test/test_nn.py", line 7235, in test_layer_norm_backwards_eps
    ln_out_cuda.backward(grad_output_cuda)
  File "/home/ahmads/personal/pytorch/torch/_tensor.py", line 647, in backward
    torch.autograd.backward(
  File "/home/ahmads/personal/pytorch/torch/autograd/__init__.py", line 354, in backward
    _engine_run_backward(
  File "/home/ahmads/personal/pytorch/torch/autograd/graph.py", line 829, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: tensor does not have a device
Exception raised from device_default at /home/ahmads/personal/pytorch/c10/core/TensorImpl.h:1265 (most recent call first):
C++ CapturedTraceback:
#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
#7 at::TensorBase::options() const from :0
#8 at::meta::resize_reduction(at::impl::MetaBase&, at::Tensor const&, c10::OptionalArrayRef<long>, bool, c10::ScalarType, bool) from :0
#9 at::meta::structured_sum_dim_IntList::meta(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0
#10 at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0
#11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), &at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0
#12 at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0
#13 void at::native::(anonymous namespace)::LaunchGammaBetaBackwardCUDAKernel<float, float>(float const*, float const*, float const*, float const*, long, long, at::Tensor*, at::Tensor*, CUstream_st*) from ??:0
#14 void at::native::(anonymous namespace)::LayerNormBackwardKernelImplInternal<float>(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor*, at::Tensor*, at::Tensor*) from ??:0
#15 at::native::(anonymous namespace)::LayerNormBackwardKernelImpl(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor*, at::Tensor*, at::Tensor*) from ??:0
#16 at::native::layer_norm_backward_cuda(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from ??:0
#17 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm_backward(at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from RegisterCUDA_0.cpp:0

Now we only call sum(0) on tensors that are defined and properly guard the sum(0) and assignment.

pytorch-bot · 2025-06-23T13:22:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156600

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 65c15aa with merge base d061a02 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-06-23T13:23:17Z

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

eqy

Is this reachable with a test case?

ahmadsharif1 · 2025-06-23T13:40:23Z

Is this reachable with a test case?

I don't know the exact conditions when these are null, but this is failing inside meta for some reason and my hypothesis is it is due to these tensors being null.

I am still testing the hypothesis.

ahmadsharif1 · 2025-06-23T15:22:07Z

@eqy added a test and verified that it fails on the baseline. It needs bias=False and large M and small N to trigger.

PTAL.

facebook-github-bot · 2025-06-23T15:25:37Z

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ngimel

The only place where this function is called dgamma and dbeta are defined

pytorch/aten/src/ATen/native/cuda/layer_norm_kernel.cu

Line 1587 in 73b4938

Tensor dgamma;

ahmadsharif1 · 2025-06-23T18:57:18Z

The only place where this function is called dgamma and dbeta are defined

pytorch/aten/src/ATen/native/cuda/layer_norm_kernel.cu

Line 1587 in 73b4938

Tensor dgamma;

The if condition guard was correct because we don't assign dgamma_blocks unless dgamma->defined() is true.

But it was not that readable. Moreover the PR description was misleading (gamma was actually not nullptr -- it was merely not defined(). So I updated that as well.

PTAL.

ngimel

modulo 2 small comments

aten/src/ATen/native/cuda/layer_norm_kernel.cu

facebook-github-bot · 2025-06-23T19:44:06Z

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ahmadsharif1 · 2025-06-24T01:52:06Z

Can someone with push privileges merge this PR?

@eqy @ngimel

I could not find real failures in the failing CI runs

ngimel · 2025-06-24T04:53:05Z

@pytorchbot merge

pytorchmergebot · 2025-06-24T04:54:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

After I landed this PR: #156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: #156699 Approved by: https://github.com/eqy, https://github.com/ngimel

…#156699) After I landed this PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: pytorch#156699 Approved by: https://github.com/eqy, https://github.com/ngimel (cherry picked from commit 36dd598)

…nsors (#2583) After PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: pytorch#156699 Approved by: https://github.com/eqy, https://github.com/ngimel (cherry picked from commit 36dd598) Fixes SWDEV-547998 Co-authored-by: Ahmad Sharif <ahmads@fb.com>

Add null pointer checks to layernorm

690b4ec

pytorch-bot bot added the release notes: cuda release notes category label Jun 23, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 23, 2025

ahmadsharif1 marked this pull request as ready for review June 23, 2025 13:36

ahmadsharif1 requested review from eqy and syed-ahmed as code owners June 23, 2025 13:36

eqy reviewed Jun 23, 2025

View reviewed changes

Added a test

3998bc8

ahmadsharif1 marked this pull request as draft June 23, 2025 14:37

ahmadsharif1 added 3 commits June 23, 2025 08:16

Tweaked test

c6b881b

Test is now failing on baseline and passing on this branch

29cd51c

Tweaked sensitivity of the test

8a6c63a

ahmadsharif1 marked this pull request as ready for review June 23, 2025 15:20

ahmadsharif1 assigned ngimel and unassigned ngimel Jun 23, 2025

ahmadsharif1 requested a review from ngimel June 23, 2025 15:29

eqy approved these changes Jun 23, 2025

View reviewed changes

ngimel requested changes Jun 23, 2025

View reviewed changes

ahmadsharif1 changed the title ~~Add null pointer checks to layernorm~~ Don't call sum() on a tensor that is not summable in layer_norm Jun 23, 2025

.

074f038

Merge branch 'main' of https://github.com/pytorch/pytorch into ln6

9fedd60

ngimel approved these changes Jun 23, 2025

View reviewed changes

aten/src/ATen/native/cuda/layer_norm_kernel.cu Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/layer_norm_kernel.cu Outdated Show resolved Hide resolved

.

65c15aa

pytorchmergebot added the merging label Jun 24, 2025

pytorchmergebot added the Merged label Jun 24, 2025

pytorchmergebot closed this in 899d3d3 Jun 24, 2025

pytorchmergebot removed the merging label Jun 24, 2025

ahmadsharif1 mentioned this pull request Jun 24, 2025

layernorm tests: Tweak test thresholds for comparing tensors #156699

Closed

dnikolaev-amd mentioned this pull request Aug 26, 2025

[release/2.8] layernorm tests: Tweak test thresholds for comparing tensors ROCm/pytorch#2583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't call `sum()` on a tensor that is not summable in layer_norm #156600

Don't call `sum()` on a tensor that is not summable in layer_norm #156600

Uh oh!

ahmadsharif1 commented Jun 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

eqy left a comment

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

ngimel left a comment

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

ngimel left a comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

ahmadsharif1 commented Jun 24, 2025

Uh oh!

ngimel commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Uh oh!

Uh oh!

Don't call sum() on a tensor that is not summable in layer_norm #156600

Don't call sum() on a tensor that is not summable in layer_norm #156600

Uh oh!

Conversation

ahmadsharif1 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156600

⏳ 1 Pending, 1 Unrelated Failure

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 commented Jun 23, 2025

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2025

Uh oh!

ahmadsharif1 commented Jun 24, 2025

Uh oh!

ngimel commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Merge started

Uh oh!

Uh oh!

Don't call `sum()` on a tensor that is not summable in layer_norm #156600

Don't call `sum()` on a tensor that is not summable in layer_norm #156600

ahmadsharif1 commented Jun 23, 2025 •

edited

Loading

pytorch-bot bot commented Jun 23, 2025 •

edited

Loading