[NestedTensor] chunk fails under DEBUG=1 builds #125503

davidberard98 · 2024-05-03T21:29:53Z

🐛 Describe the bug

test_chunk is failing when run with DEBUG=1 (including in internal tests).

This isn't blocking anything for me, but I wanted to document it

$ python test/test_nestedtensor.py -k test_chunk_cuda
/home/dberard/local/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
  warnings.warn(
/home/dberard/local/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
  warnings.warn(
E
======================================================================
ERROR: test_chunk_cuda (__main__.TestNestedTensorSubclassCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dberard/local/pytorch/torch/testing/_internal/common_utils.py", line 2759, in wrapper
    method(*args, **kwargs)
  File "/home/dberard/local/pytorch/torch/testing/_internal/common_utils.py", line 2759, in wrapper
    method(*args, **kwargs)
  File "/home/dberard/local/pytorch/torch/testing/_internal/common_device_type.py", line 432, in instantiated_test
    raise rte
  File "/home/dberard/local/pytorch/torch/testing/_internal/common_device_type.py", line 419, in instantiated_test
    result = test(self, **param_kwargs)
  File "/home/dberard/local/pytorch/test/test_nestedtensor.py", line 3339, in test_chunk
    chunks = nt.chunk(NUM_CHUNKS, dim=-1)
  File "/home/dberard/local/pytorch/torch/nested/_internal/nested_tensor.py", line 233, in __torch_function__
    return func(*args, **kwargs)
RuntimeError: aliased_input.storage().is_alias_of(aliased_output.storage()) INTERNAL ASSERT FAILED at "/home/dberard/local/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp":422, please report a bug to PyTorch. aten::chunk

To execute this test, run the following from the base repo dir:
     python test/test_nestedtensor.py -k test_chunk_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 1 test in 0.048s

FAILED (errors=1)

CPP stacktrace:

Exception raised from autogradNotImplementedFallbackImpl at /home/dberard/local/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:422 (most recent call first):
C++ CapturedTraceback:
#4 std::enable_if<is_invocable_r_v<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, THPModule_initExtension(_object*, _object*)::{lambda()#1}&>, std
::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::type std::__invoke_r<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
, THPModule_initExtension(_object*, _object*)::{lambda()#1}&>(THPModule_initExtension(_object*, _object*)::{lambda()#1}&) from /usr/include/c++/11/bits/invoke.h:116
#5 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), THPModule_initExtension(_object*, _object*)::{lambda()#1}>::_M_invoke(
std::_Any_data const&) from /usr/include/c++/11/bits/std_function.h:291
#6 std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const from /usr/include/c++/11/bits/std_function.h:590
#7 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from /home/dberard/local/pytorch/c10/util/Logging.cpp:87
#8 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) from /home/dberard/lo
cal/pytorch/c10/util/Exception.cpp:84
#9 c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
 from /home/dberard/local/pytorch/c10/util/Exception.cpp:112
#10 torch::autograd::autogradNotImplementedFallbackImpl(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from /home/dberard
/local/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:422
#11 void c10::BoxedKernel::make_boxed_function<&torch::autograd::autogradNotImplementedFallbackImpl>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vec
tor<c10::IValue, std::allocator<c10::IValue> >*) from /home/dberard/local/pytorch/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:450
#12 c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from /home/dberard/local/pytorch/ate
n/src/ATen/core/boxing/BoxedKernel_impl.h:41
#13 c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from /home/dberard/local/pytorch/
aten/src/ATen/core/boxing/KernelFunction_impl.h:46
#14 c10::Dispatcher::redispatchBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from /home/dberard/local/pytorc
h/aten/src/ATen/core/dispatch/Dispatcher.h:785
#15 c10::OperatorHandle::redispatchBoxed(c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from /home/dberard/local/pytorch/aten/src/ATen/core/dis
patch/Dispatcher.h:473
#16 (anonymous namespace)::pythonTLSSnapshotFallback(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from /home/dberard/lo
cal/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:108
#17 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector
<c10::IValue, std::allocator<c10::IValue> >*) from /home/dberard/local/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:162
#18 c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from /home/dberard/local/pytorch/ate
n/src/ATen/core/boxing/BoxedKernel_impl.h:41
#19 c10::impl::BoxedKernelWrapper<std::vector<at::Tensor, std::allocator<at::Tensor> > (at::Tensor const&, long, long), void>::call(c10::BoxedKernel const&, c10::OperatorHandle con
st&, c10::DispatchKeySet, at::Tensor const&, long, long) from /home/dberard/local/pytorch/aten/src/ATen/core/boxing/impl/boxing.h:236
#20 std::vector<at::Tensor, std::allocator<at::Tensor> > c10::KernelFunction::call<std::vector<at::Tensor, std::allocator<at::Tensor> >, at::Tensor const&, long, long>(c10::Operato
rHandle const&, c10::DispatchKeySet, at::Tensor const&, long, long) const from /home/dberard/local/pytorch/aten/src/ATen/core/boxing/KernelFunction_impl.h:114
#21 at::Tensor::chunk(long, long) const from /home/dberard/local/pytorch/build/aten/src/ATen/core/TensorBody.h:2019
#22 torch::autograd::THPVariable_chunk(_object*, _object*, _object*)::{lambda(at::Tensor const&, long, long)#1}::operator()(at::Tensor const&, long, long) const from /home/dberard/
local/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:5215
#23 torch::autograd::THPVariable_chunk(_object*, _object*, _object*) from /home/dberard/local/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:5217
#24 method_vectorcall_VARARGS_KEYWORDS from /usr/local/src/conda/python-3.10.13/Objects/descrobject.c:344
#25 PyVectorcall_Call from /usr/local/src/conda/python-3.10.13/Objects/call.c:267
#26 do_call_core from /usr/local/src/conda/python-3.10.13/Python/ceval.c:5945
#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.13/Include/internal/pycore_ceval.h:46
#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.13/Include/cpython/abstract.h:114
#29 PyObject_CallFunctionObjArgs from /usr/local/src/conda/python-3.10.13/Objects/call.c:841
#30 torch::dispatch_on_subclass(_object*, _object*, c10::ArrayRef<_object*>, pybind11::tuple, _object*, bool, char const*, std::optional<c10::impl::TorchDispatchModeKey>) from /hom
e/dberard/local/pytorch/torch/csrc/utils/python_arg_parser.cpp:301
#31 torch::handle_torch_function_no_python_arg_parser(c10::ArrayRef<_object*>, _object*, _object*, char const*, _object*, char const*, torch::TorchFunctionName) from /home/dberard/
local/pytorch/torch/csrc/utils/python_arg_parser.cpp:495
...

Versions

Main branch as of may 3.

cc @cpuhrsch @jbschlosser @bhosmer @drisspg @soulitzer

The text was updated successfully, but these errors were encountered:

davidberard98 · 2024-05-03T23:58:43Z

It seems like this assertion is checking that if an op claims to have some aliasing relationship between input and output, the aliasing relationship actually exists - which it doesn't here, because we just created new nested tensors.

jbschlosser · 2024-05-06T15:08:52Z

Thanks for filing this! Another gap coming from the fact we don't run CI tests with DEBUG=1.

soulitzer · 2024-05-06T15:31:42Z

This could be fixed by adding derivative for nested tensor chunk, or patching the autograd not implemented kernel to not do this check for subclasses, or using return_and_correct_aliasing on nested tensor.

pytorch/torch/utils/_python_dispatch.py

Lines 506 to 519 in 7bf6ed0

    
           def return_and_correct_aliasing(func, args, kwargs, out): 
        
               """ 
        
               This function should be used by wrapper tensor ``__torch_dispatch__`` subclasses 
        
               that would like to work with torch.compile. It ensures that the subclass 
        
               properly implements the aliasing behavior of every op, 
        
               which is needed for correctness in AOTAutograd. 
        
               This function will handle: 
        
                   * When we see a view op, we will alias the storages of any 
        
                     input and output tensor subclasses 
        
                   * When we see an inplace or out= op, we will directly 
        
                     return the corresponding input tensor, instead of returning 
        
                     a (potentially) fresh output tensor.

jbschlosser · 2024-05-06T15:49:34Z

In theory, as NJT is a traceable wrapper subclass, we're supposed to be using return_and_correct_aliasing(). I've had some problems with it in the past though that were causing bad behavior / failed tests (e.g. #117860) so punted on this then.

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…+ compatible storage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…+ compatible storage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…+ compatible storage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…+ compatible storage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch._C._set_storage()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…+ compatible storage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch.ops.aten._unsafe_set_storage_()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]

…rage setting" Fixes #125503 Context: `return_and_correct_aliasing()` is required for traceable wrapper subclasses so that aliasing relationships are correct. NJT has not been using this, but needs to for correct aliasing relationships, and to avoid tripping asserts when DEBUG=1 (e.g. #125503). This PR: * Uses `return_and_correct_aliasing()` in NJT * Changes how storage setting is done in `return_and_correct_aliasing()` * Old way: use `set_.source_Storage_storage_offset()`, which has extra logic for storage resizing that we don't need * New way: `torch.ops.aten._unsafe_set_storage_()` that shoves in a storage without this extra logic. Notably, this avoids `computeStorageNbytes()` choking on nested ints in NJT's sizes / strides [ghstack-poisoned]