Skip to content

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 20, 2023

Stack from ghstack (oldest at bottom):

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang ezyang@meta.com

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 20, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109727

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9b7ff4a with merge base 869226b (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang added a commit that referenced this pull request Sep 20, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 35f8492
Pull Request resolved: #109727
"_test::symint_op", "");

expectThrows<c10::Error>([&] {
opHandle.typed<Tensor(const Tensor&, Tensor, const c10::SymInt&)>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that const c10::SymInt& previously fell back to the int64_t variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know what, there are more bugs. working on it some more...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so there was a second bug that helps explain what was going on.

core/boxing/KernelFunction.h b/aten/src/ATen/core/boxing/KernelFunction.h
index 4880a396bb2..d8d0a3d1514 100644
--- a/aten/src/ATen/core/boxing/KernelFunction.h
+++ b/aten/src/ATen/core/boxing/KernelFunction.h
@@ -18,10 +18,10 @@ class KernelFunction;
 template <typename T>
 using has_symint =
   guts::disjunction<
-    std::is_same<c10::SymInt, std::decay_t<T>>,
-    std::is_same<c10::SymIntArrayRef, std::decay_t<T>>,
-    std::is_same<at::OptionalSymIntArrayRef, std::decay_t<T>>,
-    std::is_same<c10::optional<c10::SymInt>, std::decay_t<T>>
+    std::is_same<c10::SymInt, T>,
+    std::is_same<c10::SymIntArrayRef, T>,
+    std::is_same<at::OptionalSymIntArrayRef, T>,
+    std::is_same<c10::optional<c10::SymInt>, T>
   >;
 
 template <typename T>

Previously, we treated both SymInt and const SymInt& as having SymInt, due to the decay call. This meant that we would hit this codepath:

template<class Return, class... Args>
C10_ALWAYS_INLINE Return KernelFunction::call(const OperatorHandle& opHandle, DispatchKeySet dispatchKeySet, Args... args) const {
    // note: Args above is intentionally not Args&&. We don't want perfect
    // forwarding, which would require Args to be deduced, but instead we
    // want callers to explicitly specify the Args.

    // This should get inlined by compiler
    if (guts::disjunction<has_symint<Args>...>::value) {
      if (sym_unboxed_kernel_func_ != nullptr) {
        auto *functor = boxed_kernel_func_.getFunctor();
        return callUnboxedKernelFunction<Return, Args...>(
            sym_unboxed_kernel_func_, functor, dispatchKeySet, std::forward<Args>(args)...);
      }

      if (unboxed_kernel_func_ != nullptr) {
        auto *functor = boxed_kernel_func_.getFunctor();
        return callUnboxedKernelFunction<Return, typename remove_symint<Args>::type...>(
            unboxed_kernel_func_, functor, dispatchKeySet, unpackSymInt<Args>(args)...);
      }

However, remove_symint doesn't actually do anything if you have const SymInt& arg because it's doing an exact match

template <>
struct remove_symint<c10::SymInt> {
  using type = int64_t;
};

So the net effect is, if you had only an int kernel registered, but you try to call it at const SymInt&, we would end up doing a callUnboxedKernelFunction with const SymInt& (sic!) still in the signature. And at this point there is no more type checking, so you end up unsafely coercing const SymInt& into int64_t.

For some reason, this still ends up giving you the right result in my OSS build. But on fbcode you properly get corrupted memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I rebuilt this PR with ASAN and did some more experiments and things get even more interesting.

So first, ASAN is good: UBSAN is able to detect when you call a function at the wrong type. So it is able to catch the specific situation that Flavio ran into: int64_t kernel, const SymInt& caller. My fix for the extra test for the remove_symint signature properly gives us a runtime "Tried to access or call an operator with a wrong signature" error now.

However, I was flummoxed to discover that (on this PR) if you have c10::SymInt kernel and const SymInt& caller, this passes, and not only does it pass, it doesn't trigger ASAN. What's up with that? Here's what's up. First, we attempt to test if const SymInt& has SymInt. After this PR, it does not, because we only accept something as SymInt if it has exactly SymInt in its signature. So we check if there is a non-symint kernel. But there is no non-SymInt kernel, because we only registered a real SymInt kernel. When this occurs, we fall back to the boxed calling convention. And the boxed calling convention can deal with const SymInt& fine, as during boxing it will just create a SymInt to push onto the argument stack and everything is fine.

The easiest fix, I think, is to just reject const SymInt& input when you call call. But I think we probably should do this in a separate PR, as it is technically BC breaking.

…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
@ezyang ezyang requested a review from zou3519 September 21, 2023 18:57
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging in and figuring out what is going on. Added some suggestions for test cases, but feel free to ship without them, these assertions are helpful

Comment on lines 2160 to 2162
Tensor symint_op(Tensor self, c10::SymInt length) {
return self;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return self.clone(), it's not good for non-compositeimplicitautograd operators to return self directly

template<class T, bool AllowDeprecatedTypes>
struct assert_is_valid_input_type<T, AllowDeprecatedTypes, std::enable_if_t<std::is_same<const c10::SymInt&, T>::value>> {
static_assert(guts::false_t<T>::value,
"You tried to register a kernel taking c10::SymInt by reference. Please accept it by value instead.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a compile-time test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, compile time. But oddly, this doesn't seem to work for some reason... see the test I just posted.

…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 21, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: b17c75e
Pull Request resolved: #109727
@ezyang ezyang added release notes: composability release notes category topic: bug fixes topic category labels Sep 22, 2023
@ezyang
Copy link
Contributor Author

ezyang commented Sep 22, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 22, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

@ezyang
Copy link
Contributor Author

ezyang commented Sep 22, 2023

@pytorchbot merge -f "everything is clear"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

ezyang added a commit that referenced this pull request Sep 22, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: bc57807
Pull Request resolved: #109727
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 23, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 34a13c8
Pull Request resolved: #109727
@ezyang
Copy link
Contributor Author

ezyang commented Sep 23, 2023

yolov3 problem is real but doesn't repro locally 🤔

@ezyang
Copy link
Contributor Author

ezyang commented Sep 23, 2023

checking if it's a compiler specific stack overflow

…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 23, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: f3e8b35
Pull Request resolved: #109727
ezyang added a commit to ezyang/FBGEMM that referenced this pull request Sep 23, 2023
Summary:
Caught this when I was tightening up the error checking at
pytorch/pytorch#109727  Need to fix
the problem before I land the improved error checking.

Differential Revision: D49572882
ezyang added a commit to ezyang/FBGEMM that referenced this pull request Sep 24, 2023
Summary:

Caught this when I was tightening up the error checking at
pytorch/pytorch#109727  Need to fix
the problem before I land the improved error checking.

Differential Revision: D49572882
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
@ezyang
Copy link
Contributor Author

ezyang commented Sep 24, 2023

it turns out we need to fix some regs in fbgemm

ezyang added a commit that referenced this pull request Sep 24, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: c12b430
Pull Request resolved: #109727
@ezyang ezyang requested a review from a team as a code owner September 24, 2023 19:26
facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Sep 25, 2023
Summary:
Pull Request resolved: #2039

Caught this when I was tightening up the error checking at
pytorch/pytorch#109727  Need to fix
the problem before I land the improved error checking.

Reviewed By: zou3519

Differential Revision: D49572882

fbshipit-source-id: 59345bf2bd7b969a6739f3d1cf8bf47c9cdb0e58
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 26, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 755006f
Pull Request resolved: #109727
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 26, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 0cae825
Pull Request resolved: #109727
…nt removed"

See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Sep 27, 2023
See the test case for what we didn't catch (SymInt vs const SymInt&
mismatch.)

It's necessary to test for both, because we will fall back to the
non-SymInt signature if there is no SymInt unboxed kernel available.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>

ghstack-source-id: 582c6ca
Pull Request resolved: #109727
@ezyang
Copy link
Contributor Author

ezyang commented Sep 27, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/2361/head branch September 30, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: composability release notes category Reverted topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants