Skip to content

Conversation

ahmadsharif1
Copy link
Contributor

@ahmadsharif1 ahmadsharif1 commented Jun 23, 2025

Don't call sum() on a tensor that is default constructed.

Previously we could call sum() on a tensor that was default-contructed. That would lead to an error like this:

Traceback (most recent call last):
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
       ^^^^^^^^
  File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_utils.py", line 3191, in wrapper
    method(*args, **kwargs)
  File "/home/ahmads/personal/pytorch/test/test_nn.py", line 7235, in test_layer_norm_backwards_eps
    ln_out_cuda.backward(grad_output_cuda)
  File "/home/ahmads/personal/pytorch/torch/_tensor.py", line 647, in backward
    torch.autograd.backward(
  File "/home/ahmads/personal/pytorch/torch/autograd/__init__.py", line 354, in backward
    _engine_run_backward(
  File "/home/ahmads/personal/pytorch/torch/autograd/graph.py", line 829, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: tensor does not have a device
Exception raised from device_default at /home/ahmads/personal/pytorch/c10/core/TensorImpl.h:1265 (most recent call first):
C++ CapturedTraceback:
#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
#7 at::TensorBase::options() const from :0
#8 at::meta::resize_reduction(at::impl::MetaBase&, at::Tensor const&, c10::OptionalArrayRef<long>, bool, c10::ScalarType, bool) from :0
#9 at::meta::structured_sum_dim_IntList::meta(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0
#10 at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0
#11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), &at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0
#12 at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0
#13 void at::native::(anonymous namespace)::LaunchGammaBetaBackwardCUDAKernel<float, float>(float const*, float const*, float const*, float const*, long, long, at::Tensor*, at::Tensor*, CUstream_st*) from ??:0
#14 void at::native::(anonymous namespace)::LayerNormBackwardKernelImplInternal<float>(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor*, at::Tensor*, at::Tensor*) from ??:0
#15 at::native::(anonymous namespace)::LayerNormBackwardKernelImpl(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor*, at::Tensor*, at::Tensor*) from ??:0
#16 at::native::layer_norm_backward_cuda(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from ??:0
#17 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm_backward(at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from RegisterCUDA_0.cpp:0

Now we only call sum(0) on tensors that are defined and properly guard the sum(0) and assignment.

Copy link

pytorch-bot bot commented Jun 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156600

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 65c15aa with merge base d061a02 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Jun 23, 2025
@facebook-github-bot
Copy link
Contributor

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 23, 2025
@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review June 23, 2025 13:36
Copy link
Collaborator

@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reachable with a test case?

@ahmadsharif1
Copy link
Contributor Author

Is this reachable with a test case?

I don't know the exact conditions when these are null, but this is failing inside meta for some reason and my hypothesis is it is due to these tensors being null.

I am still testing the hypothesis.

@ahmadsharif1 ahmadsharif1 marked this pull request as draft June 23, 2025 14:37
@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review June 23, 2025 15:20
@ahmadsharif1
Copy link
Contributor Author

@eqy added a test and verified that it fails on the baseline. It needs bias=False and large M and small N to trigger.

PTAL.

@facebook-github-bot
Copy link
Contributor

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ahmadsharif1 ahmadsharif1 assigned ngimel and unassigned ngimel Jun 23, 2025
@ahmadsharif1 ahmadsharif1 requested a review from ngimel June 23, 2025 15:29
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place where this function is called dgamma and dbeta are defined

@ahmadsharif1 ahmadsharif1 changed the title Add null pointer checks to layernorm Don't call sum() on a tensor that is not summable in layer_norm Jun 23, 2025
@ahmadsharif1
Copy link
Contributor Author

The only place where this function is called dgamma and dbeta are defined

The if condition guard was correct because we don't assign dgamma_blocks unless dgamma->defined() is true.

But it was not that readable. Moreover the PR description was misleading (gamma was actually not nullptr -- it was merely not defined(). So I updated that as well.

PTAL.

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo 2 small comments

@facebook-github-bot
Copy link
Contributor

@ahmadsharif1 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ahmadsharif1
Copy link
Contributor Author

Can someone with push privileges merge this PR?

@eqy @ngimel

I could not find real failures in the failing CI runs

@ngimel
Copy link
Collaborator

ngimel commented Jun 24, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 2, 2025
After I landed this PR: #156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices.

We now raise the tolerances for larger tensors.
Pull Request resolved: #156699
Approved by: https://github.com/eqy, https://github.com/ngimel
dnikolaev-amd pushed a commit to ROCm/pytorch that referenced this pull request Aug 26, 2025
…#156699)

After I landed this PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices.

We now raise the tolerances for larger tensors.
Pull Request resolved: pytorch#156699
Approved by: https://github.com/eqy, https://github.com/ngimel

(cherry picked from commit 36dd598)
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Aug 27, 2025
…nsors (#2583)

After PR: pytorch#156600, this test was
failing internally on large tensors because the differences were greater
than tolerances on some cuda devices.

We now raise the tolerances for larger tensors.
Pull Request resolved: pytorch#156699
Approved by: https://github.com/eqy, https://github.com/ngimel

(cherry picked from commit 36dd598)

Fixes SWDEV-547998

Co-authored-by: Ahmad Sharif <ahmads@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants