-
Notifications
You must be signed in to change notification settings - Fork 25.4k
When doing typed typecheck, also check signature with symint removed #109727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109727
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9b7ff4a with merge base 869226b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 35f8492 Pull Request resolved: #109727
"_test::symint_op", ""); | ||
|
||
expectThrows<c10::Error>([&] { | ||
opHandle.typed<Tensor(const Tensor&, Tensor, const c10::SymInt&)>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that const c10::SymInt& previously fell back to the int64_t variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know what, there are more bugs. working on it some more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so there was a second bug that helps explain what was going on.
core/boxing/KernelFunction.h b/aten/src/ATen/core/boxing/KernelFunction.h
index 4880a396bb2..d8d0a3d1514 100644
--- a/aten/src/ATen/core/boxing/KernelFunction.h
+++ b/aten/src/ATen/core/boxing/KernelFunction.h
@@ -18,10 +18,10 @@ class KernelFunction;
template <typename T>
using has_symint =
guts::disjunction<
- std::is_same<c10::SymInt, std::decay_t<T>>,
- std::is_same<c10::SymIntArrayRef, std::decay_t<T>>,
- std::is_same<at::OptionalSymIntArrayRef, std::decay_t<T>>,
- std::is_same<c10::optional<c10::SymInt>, std::decay_t<T>>
+ std::is_same<c10::SymInt, T>,
+ std::is_same<c10::SymIntArrayRef, T>,
+ std::is_same<at::OptionalSymIntArrayRef, T>,
+ std::is_same<c10::optional<c10::SymInt>, T>
>;
template <typename T>
Previously, we treated both SymInt
and const SymInt&
as having SymInt, due to the decay call. This meant that we would hit this codepath:
template<class Return, class... Args>
C10_ALWAYS_INLINE Return KernelFunction::call(const OperatorHandle& opHandle, DispatchKeySet dispatchKeySet, Args... args) const {
// note: Args above is intentionally not Args&&. We don't want perfect
// forwarding, which would require Args to be deduced, but instead we
// want callers to explicitly specify the Args.
// This should get inlined by compiler
if (guts::disjunction<has_symint<Args>...>::value) {
if (sym_unboxed_kernel_func_ != nullptr) {
auto *functor = boxed_kernel_func_.getFunctor();
return callUnboxedKernelFunction<Return, Args...>(
sym_unboxed_kernel_func_, functor, dispatchKeySet, std::forward<Args>(args)...);
}
if (unboxed_kernel_func_ != nullptr) {
auto *functor = boxed_kernel_func_.getFunctor();
return callUnboxedKernelFunction<Return, typename remove_symint<Args>::type...>(
unboxed_kernel_func_, functor, dispatchKeySet, unpackSymInt<Args>(args)...);
}
However, remove_symint
doesn't actually do anything if you have const SymInt&
arg because it's doing an exact match
template <>
struct remove_symint<c10::SymInt> {
using type = int64_t;
};
So the net effect is, if you had only an int kernel registered, but you try to call it at const SymInt&
, we would end up doing a callUnboxedKernelFunction with const SymInt&
(sic!) still in the signature. And at this point there is no more type checking, so you end up unsafely coercing const SymInt& into int64_t.
For some reason, this still ends up giving you the right result in my OSS build. But on fbcode you properly get corrupted memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I rebuilt this PR with ASAN and did some more experiments and things get even more interesting.
So first, ASAN is good: UBSAN is able to detect when you call a function at the wrong type. So it is able to catch the specific situation that Flavio ran into: int64_t
kernel, const SymInt&
caller. My fix for the extra test for the remove_symint signature properly gives us a runtime "Tried to access or call an operator with a wrong signature" error now.
However, I was flummoxed to discover that (on this PR) if you have c10::SymInt
kernel and const SymInt&
caller, this passes, and not only does it pass, it doesn't trigger ASAN. What's up with that? Here's what's up. First, we attempt to test if const SymInt&
has SymInt. After this PR, it does not, because we only accept something as SymInt if it has exactly SymInt in its signature. So we check if there is a non-symint kernel. But there is no non-SymInt kernel, because we only registered a real SymInt kernel. When this occurs, we fall back to the boxed calling convention. And the boxed calling convention can deal with const SymInt&
fine, as during boxing it will just create a SymInt
to push onto the argument stack and everything is fine.
The easiest fix, I think, is to just reject const SymInt&
input when you call call
. But I think we probably should do this in a separate PR, as it is technically BC breaking.
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging in and figuring out what is going on. Added some suggestions for test cases, but feel free to ship without them, these assertions are helpful
Tensor symint_op(Tensor self, c10::SymInt length) { | ||
return self; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return self.clone(), it's not good for non-compositeimplicitautograd operators to return self directly
template<class T, bool AllowDeprecatedTypes> | ||
struct assert_is_valid_input_type<T, AllowDeprecatedTypes, std::enable_if_t<std::is_same<const c10::SymInt&, T>::value>> { | ||
static_assert(guts::false_t<T>::value, | ||
"You tried to register a kernel taking c10::SymInt by reference. Please accept it by value instead."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a compile-time test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, compile time. But oddly, this doesn't seem to work for some reason... see the test I just posted.
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: b17c75e Pull Request resolved: #109727
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot. |
@pytorchbot merge -f "everything is clear" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: bc57807 Pull Request resolved: #109727
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 34a13c8 Pull Request resolved: #109727
yolov3 problem is real but doesn't repro locally 🤔 |
checking if it's a compiler specific stack overflow |
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: f3e8b35 Pull Request resolved: #109727
Summary: Caught this when I was tightening up the error checking at pytorch/pytorch#109727 Need to fix the problem before I land the improved error checking. Differential Revision: D49572882
Summary: Caught this when I was tightening up the error checking at pytorch/pytorch#109727 Need to fix the problem before I land the improved error checking. Differential Revision: D49572882
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
it turns out we need to fix some regs in fbgemm |
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: c12b430 Pull Request resolved: #109727
Summary: Pull Request resolved: #2039 Caught this when I was tightening up the error checking at pytorch/pytorch#109727 Need to fix the problem before I land the improved error checking. Reviewed By: zou3519 Differential Revision: D49572882 fbshipit-source-id: 59345bf2bd7b969a6739f3d1cf8bf47c9cdb0e58
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 755006f Pull Request resolved: #109727
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 0cae825 Pull Request resolved: #109727
…nt removed" See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> [ghstack-poisoned]
See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 582c6ca Pull Request resolved: #109727
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)
It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.
Signed-off-by: Edward Z. Yang ezyang@meta.com