Skip to content

Conversation

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163973

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9fc7b16 with merge base 84d673e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
return #name;

switch (t) {
AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS(DEFINE_CASE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS defined? Isn't it not stable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This macro is defined in torch/headeronly/core/ScalarType.h that is included by a number of torch/csrc/stable/ files. Does this inclusion count as being in stable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot use AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS as it contains dtypes that are unsupported in stable. For example, calling libtorch_agnostic.ops.my_empty_like on a qint8 tensor fails with

[E927 21:27:46.276315525 shim_common.cpp:1666] Exception in aoti_torch: false INTERNAL ASSERT FAILED at "/home/pearu/git/pytorch/pytorch-linear/aten/src/ATen/quantized/Quantizer.cpp":441, please report a bug to PyTorch. cannot call qscheme on UnknownQuantizer
Exception raised from qscheme at /home/pearu/git/pytorch/pytorch-linear/aten/src/ATen/quantized/Quantizer.cpp:441 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x91 (0x7f9da216bfe1 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7f (0x7f9da20ed082 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) + 0x63 (0x7f9da2168353 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libc10.so)
frame #3: <unknown function> + 0x318ceeb (0x7f9db0821eeb in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #4: at::native::qscheme_quant(at::Tensor const&) + 0x37 (0x7f9daf5238e7 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #5: at::_ops::qscheme::call(at::Tensor const&) + 0xbb (0x7f9daf7d944b in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #6: at::native::empty_like_quantized(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0x29a (0x7f9daf2d652a in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x30f0140 (0x7f9db0785140 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::empty_like::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>) + 0xf7 (0x7f9dafbfedd7 in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x4ce144d (0x7f9db237644d in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #10: aoti_torch_call_dispatcher + 0x34c (0x7f9db2d210ac in /home/pearu/git/pytorch/pytorch-linear/torch/lib/libtorch_cpu.so)
frame #11: my_empty_like(torch::stable::Tensor) + 0x82 (0x7f9cec3a6582 in /home/pearu/git/pytorch/pytorch-linear/test/cpp_extensions/libtorch_agnostic_extension/install/home/pearu/miniconda3/envs/pytorch-cuda-dev/lib/python3.13/site-packages/libtorch_agnostic/_C.so)
<snip>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#161891 now defines STABLE_FORALL_SUPPORTED_SCALAR_TYPES that is used here instead of AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_AND_QINTS.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More highlevel question: previously I believe we had decided that torchaudio doesn't actually need these headers. Has that changed? Are these needed now?

@pearu
Copy link
Collaborator Author

pearu commented Sep 26, 2025

More highlevel question: previously I believe we had decided that torchaudio doesn't actually need these headers. Has that changed? Are these needed now?

torchaudio code currently has 8 places where AT_DISPATCH_... macros are used. Having these defined, porting the corresponding macro usages to stable introduces 0 LOC as we just need to replace AT_DISPATCH_XYZ with STABLE_DISPATCH_XYZ.

The alternative is to explicitly expand the AT_DISPATCH_... macros to a switch statement that means that the LOC count that the macro usages occupy will be doubled (in the case of AT_DISPATCH_FLOATING_TYPES) or tripled (in the case of AT_DISPATCH_FLOATING_TYPES_AND_HALF). So, based on this increased LOC-s plus resulting code being unnecessarily verbose, I think defining the STABLE_DISPATCH_... macros makes sense.

That said, we could define as subset of these macros within torchaudio but it also has disadvantages. For instance, other similar porting efforts could possibly reuse these macros as well.

[ghstack-poisoned]
@janeyx99
Copy link
Contributor

@pearu do you know why torchaudio even needs to use the macro though?

@pearu
Copy link
Collaborator Author

pearu commented Sep 26, 2025

@pearu do you know why torchaudio even needs to use the macro though?

In general, these macros are used in compute kernels to support tensor inputs that dtype is in a set of supported dtypes.

torchaudio test-suite typically uses float32 and float64 tensors + float16 in cuda-related routines. Hence, for torchaudio, the set of supported dtypes is {float16, float32, float64}.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@pearu pearu requested a review from janeyx99 September 29, 2025 11:13
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants