Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out what mojo FB common/process/StackTrace.h has that we don't #56399

Open
ezyang opened this issue Apr 19, 2021 · 0 comments
Open

Figure out what mojo FB common/process/StackTrace.h has that we don't #56399

ezyang opened this issue Apr 19, 2021 · 0 comments
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Apr 19, 2021

I have observed that in FB-internal builds, the implementation of stack trace printing in common/process/StackTrace.h is superior to the native implementation in c10/util/Backtrace.cpp in that it is able to symbolize more symbols than we are able to.

For now, I'm going to stem the bleeding by making sure fbcode builds use this better implementation, but ultimately we should figure out what's going on (it's probably related to DWARF) and apply it to our implementation.

cc @ezyang @bhosmer @smessmer @ljk53 @bdhirsh @ailzhang @driazati

ezyang added a commit that referenced this issue Apr 19, 2021
See #56399

I don't have time to fix this properly, so this is just to stem the
bleeding.  Someone should go and figure out what it is that common/process
is doing better.

Differential Revision: [D27861908](https://our.internmc.facebook.com/intern/diff/D27861908/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27861908/)!

[ghstack-poisoned]
ezyang added a commit that referenced this issue Apr 19, 2021
See #56399

I don't have time to fix this properly, so this is just to stem the
bleeding.  Someone should go and figure out what it is that common/process
is doing better.

Differential Revision: [D27861908](https://our.internmc.facebook.com/intern/diff/D27861908/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D27861908/)!

ghstack-source-id: 126868405
Pull Request resolved: #56400
@bdhirsh bdhirsh added better-engineering Relatively self-contained tasks for better engineering contributors module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 19, 2021
facebook-github-bot pushed a commit that referenced this issue Apr 20, 2021
Summary:
Pull Request resolved: #56400

See #56399

I don't have time to fix this properly, so this is just to stem the
bleeding.  Someone should go and figure out what it is that common/process
is doing better.
ghstack-source-id: 126868405

Test Plan:
I manually patched this into D27765125 and triggered a
exception and observed that everything symbolized good:

```
[9]   what():  new_refcount != 1INTERNAL ASSERT FAILED at "caffe2/c10/util/intrusive_ptr.h":234, please report a bug to PyTorch. intrusive_ptr: Cannot increase refcount after it reached zero.
Exception raised from retain_ at caffe2/c10/util/intrusive_ptr.h:234 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  c10::(anonymous namespace)::GetFetchStackTrace[abi:cxx11]()::$_0::operator()[abi:cxx11]() const
# 2  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::Ge
tFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&)
# 3  std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const
# 4  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 5  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat
or<char> > const&)
# 6  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 7  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::retain_()
# 8  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::intrusive_ptr(c10::intrusiv
e_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&)
# 9  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >& c10::intrusive_ptr<c10d::Pr
ocessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::operator=<c10d::ProcessGroup, c10::detail::intrusive_target
_default_null_type<c10d::ProcessGroup> >(c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessG
roup> > const&) &
```

Reviewed By: driazati

Differential Revision: D27861908

fbshipit-source-id: 84c1dfb1ef28c460b020646f836c153562ad5c44
krshrimali pushed a commit to krshrimali/pytorch that referenced this issue May 19, 2021
…6400)

Summary:
Pull Request resolved: pytorch#56400

See pytorch#56399

I don't have time to fix this properly, so this is just to stem the
bleeding.  Someone should go and figure out what it is that common/process
is doing better.
ghstack-source-id: 126868405

Test Plan:
I manually patched this into D27765125 and triggered a
exception and observed that everything symbolized good:

```
[9]   what():  new_refcount != 1INTERNAL ASSERT FAILED at "caffe2/c10/util/intrusive_ptr.h":234, please report a bug to PyTorch. intrusive_ptr: Cannot increase refcount after it reached zero.
Exception raised from retain_ at caffe2/c10/util/intrusive_ptr.h:234 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  c10::(anonymous namespace)::GetFetchStackTrace[abi:cxx11]()::$_0::operator()[abi:cxx11]() const
# 2  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::Ge
tFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&)
# 3  std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const
# 4  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 5  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat
or<char> > const&)
# 6  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 7  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::retain_()
# 8  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::intrusive_ptr(c10::intrusiv
e_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&)
# 9  c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >& c10::intrusive_ptr<c10d::Pr
ocessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::operator=<c10d::ProcessGroup, c10::detail::intrusive_target
_default_null_type<c10d::ProcessGroup> >(c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessG
roup> > const&) &
```

Reviewed By: driazati

Differential Revision: D27861908

fbshipit-source-id: 84c1dfb1ef28c460b020646f836c153562ad5c44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants