Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix an ASAN error in MKLDNN #104331

Closed
wants to merge 1 commit into from
Closed

Conversation

cyyever
Copy link
Collaborator

@cyyever cyyever commented Jun 28, 2023

Fixes ASAN stack-use-after-scope in MKLDNN.
The stack trace is

2023-06-27T16:37:20.9099950Z ==1424==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f0c5dc20980 at pc 0x7f0c61286a73 bp 0x7ffef8e76990 sp 0x7ffef8e76118
2023-06-27T16:37:20.9100054Z READ of size 24 at 0x7f0c5dc20980 thread T0
2023-06-27T16:37:20.9100327Z     #0 0x7f0c61286a72 in memcmp (/usr/lib/llvm-7/lib/clang/7.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x5da72)
2023-06-27T16:37:20.9100701Z     #1 0x7f0c2f395d0b in c10::ArrayRef<long>::equals(c10::ArrayRef<long>) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb8bd0b)
2023-06-27T16:37:20.9101196Z     #2 0x7f0c314a1bb1 in at::native::mkldnn_matmul(at::Tensor const&, at::Tensor const&, at::Tensor const&, float, float) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xec97bb1)
2023-06-27T16:37:20.9101714Z     #3 0x7f0c301f49c5 in at::native::bmm_out_or_baddbmm_(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ea9c5)
2023-06-27T16:37:20.9102153Z     #4 0x7f0c301f85ab in at::native::structured_bmm_out_cpu::impl(at::Tensor const&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9ee5ab)
2023-06-27T16:37:20.9102601Z     #5 0x7f0c32cb3cb6 in at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x104a9cb6)
2023-06-27T16:37:20.9103662Z     #6 0x7f0c32ea1f43 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::wrapper_CPU_bmm(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x10697f43)
2023-06-27T16:37:20.9104330Z     #7 0x7f0c3187252a in at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xf06852a)
2023-06-27T16:37:20.9104756Z     #8 0x7f0c3257e097 in at::_ops::bmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd74097)
2023-06-27T16:37:20.9105237Z     #9 0x7f0c383c31c3 in torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb91c3)
2023-06-27T16:37:20.9106496Z     #10 0x7f0c383c25b9 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::bmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x15bb85b9)
2023-06-27T16:37:20.9106874Z     #11 0x7f0c3257da60 in at::_ops::bmm::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xfd73a60)
2023-06-27T16:37:20.9107275Z     #12 0x7f0c301fc0e2 in at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9f20e2)
2023-06-27T16:37:20.9107647Z     #13 0x7f0c301f9c21 in at::native::matmul(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xd9efc21)
2023-06-27T16:37:20.9108853Z     #14 0x7f0c33dca7e3 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__matmul(at::Tensor const&, at::Tensor const&))>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x115c07e3)
2023-06-27T16:37:20.9109255Z     #15 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0)
2023-06-27T16:37:20.9110023Z     #16 0x7f0c2f596b62 in at::autocast::WrapFunction_<(at::autocast::CastPolicy)0, (c10::DeviceType)0, at::Tensor (at::Tensor const&, at::Tensor const&), &(at::_ops::matmul::call(at::Tensor const&, at::Tensor const&)), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcd8cb62)
2023-06-27T16:37:20.9110723Z     #17 0x7f0c2f348403 in c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >::operator()(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e403)
2023-06-27T16:37:20.9111596Z     #18 0x7f0c2f348063 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<at::Tensor (*)(at::Tensor const&, at::Tensor const&), at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xcb3e063)
2023-06-27T16:37:20.9111976Z     #19 0x7f0c32958ef0 in at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1014eef0)
2023-06-27T16:37:20.9112383Z     #20 0x7f0c5803dc3e in torch::autograd::THPVariable_matmul(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x2b2cc3e)
2023-06-27T16:37:20.9112561Z warning: parsing line table prologue at 0x00000000 should have ended at 0x0000050b but it ended at 0x0000050a
2023-06-27T16:37:20.9112713Z     #21 0x5074a6 in cfunction_call (/opt/conda/envs/py_3.9/bin/python3.9+0x5074a6)
2023-06-27T16:37:20.9112857Z     #22 0x505997 in _PyObject_Call (/opt/conda/envs/py_3.9/bin/python3.9+0x505997)
2023-06-27T16:37:20.9113114Z     #23 0x505997 in PyObject_Call /croot/python-split_1684193875530/work/build-static/<invalid>:293:12
2023-06-27T16:37:20.9113258Z     #24 0x4ed302 in do_call_core (/opt/conda/envs/py_3.9/bin/python3.9+0x4ed302)
2023-06-27T16:37:20.9113633Z     #25 0x4ed302 in _PyEval_EvalFrameDefault /croot/python-split_1684193875530/work/build-static/<invalid>:3582:22
2023-06-27T16:37:20.9113780Z     #26 0x4e6729 in _PyEval_EvalFrame (/opt/conda/envs/py_3.9/bin/python3.9+0x4e6729)
2023-06-27T16:37:20.9114041Z     #27 0x4e6729 in _PyEval_EvalCode /croot/python-split_1684193875530/work/build-static/<invalid>:4329:14
2023-06-27T16:37:20.9114202Z     #28 0x4efd7d in _PyFunction_Vectorcall (/opt/conda/envs/py_3.9/bin/python3.9+0x4efd7d)

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104331

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit c3439d8:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cyyever cyyever marked this pull request as draft June 28, 2023 07:10
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jun 28, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 28, 2023

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 28, 2023
@cyyever cyyever marked this pull request as ready for review June 28, 2023 12:23
@cyyever cyyever changed the title fix ASAN errors in MKLDNN fix an ASAN error in MKLDNN Jun 28, 2023
Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this fixes the ASAN error?

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 28, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@soulitzer contiguous_strides returns a DimVector, then the old function call passed it to a IntArrayRef result, which became a dangling reference. And in fact, this commit was picked from PR #104224. The error disappeared after the change.

Copy link
Contributor

@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 29, 2023
@cyyever
Copy link
Collaborator Author

cyyever commented Jun 29, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@cyyever cyyever deleted the asan_mkldnn_fix branch July 26, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants