Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spdlog generates a SEGFAULT when a context is being destructed #933

Closed
ivanpauno opened this issue Dec 5, 2019 · 13 comments · Fixed by ros2/rcl_logging#23
Closed

spdlog generates a SEGFAULT when a context is being destructed #933

ivanpauno opened this issue Dec 5, 2019 · 13 comments · Fixed by ros2/rcl_logging#23
Assignees
Labels
bug Something isn't working

Comments

@ivanpauno
Copy link
Member

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • from source
  • Version or commit hash:
    • https://gist.githubusercontent.com/ivanpauno/2201f6fb5ad0fc5449d1f2e98f73a73d/raw/77582cbdc6bac5022039cc17fb0c03d0ed9f8289/ros2.repos

(used in https://github.com/ros2/rmw_fastrtps/pull/312#issuecomment-561769677).

  • DDS implementation:
    • rmw_fastrtps_cpp
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Unfortunately, I have been able to reproduce this only with the repos file specified above.
That repos file are using rclcpp master, and it only has changes in others repositories.
The error doesn't seem to be a consequence of any of the changes in my branches, but just an existing error that was triggered randomly (see explanation below).

After building, run test_subscription_publisher_count_api, or test_publisher_subscription_count_api.

Expected behavior

No crashes.

Actual behavior

It segfaults.

Traceback of the segfault
    #0  0x00007ffff4b2e3b8 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
    <char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std:
    :__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<ch
    ar> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node (
        this=0x7ffff4d7db98 <spdlog::details::registry::instance()::s_instance+120>, __n=2, __k="root", __code=18029476052510536322) at /usr/include/c++/7/bits/hashtable.h:1548
    #1  0x00007ffff4b2a636 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
    <char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std:
    :__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<ch
    ar> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase (
        this=0x7ffff4d7db98 <spdlog::details::registry::instance()::s_instance+120>, __k="root") at /usr/include/c++/7/bits/hashtable.h:1910
    #2  0x00007ffff4b26ac5 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
    <char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std:
    :__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<ch
    ar> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::erase (
        this=0x7ffff4d7db98 <spdlog::details::registry::instance()::s_instance+120>, __k="root") at /usr/include/c++/7/bits/hashtable.h:759
    #3  0x00007ffff4b218ff in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<spdlog::logger>, std::hash<std::__cxx11::basic_string<char, st
    d::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, s
    td::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > > >::erase (this=0x7ffff4d7db98 <spdlog::details::registry::instance()::s_instance+120>, __x="root")
        at /usr/include/c++/7/bits/unordered_map.h:814
    #4  0x00007ffff4b1b3d2 in spdlog::details::registry::drop (this=0x7ffff4d7db20 <spdlog::details::registry::instance()::s_instance>, logger_name="root")
        at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/install/spdlog_vendor/include/spdlog/details/registry.h:187
    #5  0x00007ffff4b1bceb in spdlog::drop (name="root") at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/install/spdlog_vendor/include/spdlog/spdlog.h:113
    #6  0x00007ffff4b12f4b in rcl_logging_external_shutdown () at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/src/ros2/rcl_logging/rcl_logging_spdlog/src/rcl_logging_spdlog.cpp:155
    #7  0x00007ffff6faa420 in rcl_logging_fini () at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/src/ros2/rcl/rcl/src/rcl/logging.c:116
    #8  0x00007ffff6fa721f in rcl_shutdown (context=0x5555559f7b50) at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/src/ros2/rcl/rcl/src/rcl/init.c:188
    #9  0x00007ffff75ad229 in rclcpp::Context::shutdown (this=0x5555559f7940, reason="context destructor was called while still not shutdown")
        at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/context.cpp:145
    #10 0x00007ffff75ac0f1 in rclcpp::Context::~Context (this=0x5555559f7940, __in_chrg=<optimized out>)
        at /home/ivanpauno/ros2_ws/other_ws/ros2_node_participant_ws/src/ros2/rclcpp/rclcpp/src/rclcpp/context.cpp:43

Additional information

The problem happens only with Context objects of static storage duration (it may happen for example with the default context).

When the context is destructed, it will try to shutdown, if its rcl_context_t is still valid (shutdown not manually called) it will shutdown the rcl context , which triggers the shutdown of the loggers, which shutdowns the external loggers, which calls spdlog::drop, which calls a method in a singleton object that was previously destructed, boom 💥 .

I checked carefully with the debugger that:

  • The spdlog registry singleton gets destructed before the Context.
  • The context that was causing the problem was one of the contexts created here.

IMO, this error was randomly triggered by unrelated changes, because the order of destruction of static/global objects in different shared libraries is not ensured by the linker (and the order of destruction of different static/global objects in the same library is also not ensured, by the compiler).
Also, I think that we haven't hit this before just because we usually call rclcpp::shutdown or Context::shutdown manually, but I don't believe that that should be mandatory.
I'm (almost) sure that rclpy should never run into this problem.

Of course, the test can be fixed by avoiding to use a global variable, but I think that's not a "real" fix, as the problem may be hit again in the future (for example, it may happen with the default context).

Some questions and ideas:

  • The logger is being shut down by the context, but all the "state" of the logger is global, and not part of the context. Why the context should shut down it? What if you have two contexts, shut down one, and the other context is still alive and trying to log?
  • Do we need to call spdlog::drop? If not, I wouldn't call it.
@ivanpauno ivanpauno added the bug Something isn't working label Dec 5, 2019
@ivanpauno
Copy link
Member Author

@wjwwood @clalancette @hidmic any ideas?

@hidmic
Copy link
Contributor

hidmic commented Dec 5, 2019

Whether we usually hit this or not, we should never rely on static storage objects to be destroyed in any particular order.

Why the context should shut down it? What if you have two contexts, shut down one, and the other context is still alive and trying to log?

I don't see why shutting down one context should result in all loggers being shut down either. But changing that won't free us from this race.

Do we need to call spdlog::drop? If not, I wouldn't call it.

That's probably the simplest fix. It's not great, you can't clean up unused loggers, but I don't see any other alternative if we don't drop the concept of a singleton default context ourselves.

@clalancette
Copy link
Contributor

@ivanpauno Good job tracing this down and giving a thorough explanation of the problem.

That's probably the simplest fix. It's not great, you can't clean up unused loggers, but I don't see any other alternative if we don't drop the concept of a singleton default context ourselves.

Yeah, I agree. Since it is the root logger, and we can't guarantee the shutdown order, I'd say let's get rid of that drop. Please put a comment in explaining why we can't reasonably call it.

@wjwwood
Copy link
Member

wjwwood commented Dec 5, 2019

we should never rely on static storage objects to be destroyed in any particular order.

You can if they are in the same translation unit. I don't know if it's possible, but we may be able to store it in the context and route all access to it from there. That would fix the issue, but it may not be possible due to needing to access it from C libraries well beneath rclcpp.

@wjwwood
Copy link
Member

wjwwood commented Dec 5, 2019

Also, that only works if spdlog allows us to use it completely without singletons, which I'm not sure about.

@wjwwood
Copy link
Member

wjwwood commented Dec 5, 2019

Also, that only works if spdlog allows us to use it completely without singletons, which I'm not sure about.

Just glancing at it, the answer looks like no...

@hidmic
Copy link
Contributor

hidmic commented Dec 5, 2019

You can if they are in the same translation unit.

Interesting, I had to look that up. Accidentally, I came across an idiom to deal with this particular problem. We just need to send a patch upstream 😅.

@clalancette
Copy link
Contributor

Just glancing at it, the answer looks like no...

It's been a while since I looked at it in detail, but that was my recollection as well.

Interesting, I had to look that up. Accidentally, I came across an idiom to deal with this particular problem. We just need to send a patch upstream sweat_smile.

That seems like a reasonable way to go about fixing it. I'd suggest @ivanpauno test it out locally first just to make sure it actually solves his problem. If it does, yeah, an upstream patch is the way to go for Foxy. It's less clear if we want to backport that to Eloquent, or just remove the drop there, but we can discuss that later.

@ivanpauno
Copy link
Member Author

That's probably the simplest fix. It's not great, you can't clean up unused loggers, but I don't see any other alternative if we don't drop the concept of a singleton default context ourselves.

Yeah, I agree. Since it is the root logger, and we can't guarantee the shutdown order, I'd say let's get rid of that drop. Please put a comment in explaining why we can't reasonably call it.

Ok, I will start by avoiding the use of spdlog::drop.

You can if they are in the same translation unit.

Interesting, I had to look that up. Accidentally, I came across an idiom to deal with this particular problem. We just need to send a patch upstream sweat_smile.

That's true. I don't really like the nifty counter idiom, but it's fair using it here.
I think that the we can use that idiom in rcl_logging_configure and rcl_logging_fini.
So the first context creates the loggers, and the last one destroys them.

Interesting, I had to look that up. Accidentally, I came across an idiom to deal with this particular problem. We just need to send a patch upstream sweat_smile.

That will be needed to avoid the problem. I will work on a patch, test it locally, and see if they accept it.


Another reason why we didn't run into this problem before, is this other issue #893.
#906 fixed that problem. I haven't backport it as the patch is not ABI compatible.
So, the fixes of this issue shouldn't be backported either (except we decide to backport #906).

@gabime
Copy link

gabime commented Dec 5, 2019

Also, that only works if spdlog allows us to use it completely without singletons, which I'm not sure about.

spdlog allows this. spdlog logger objects can be created and used like any other object. Using the registry is not mandatory and can be bypassed easily, by creating the objects directly.
Also, consider calling spdlog::shutdown() at the end of main to free any spdlog registry items in a predictable place.

@wjwwood
Copy link
Member

wjwwood commented Dec 5, 2019

Thanks for letting us know @gabime, is there an example of this in use or is it best to just look at the implementation of the registry?

@gabime
Copy link

gabime commented Dec 5, 2019

Create the logger object and pass it his name and some sinks.
typical example: https://github.com/gabime/spdlog#logger-with-multi-sinks---each-with-different-format-and-log-level

@ivanpauno
Copy link
Member Author

Thanks @gabime !!
Proposed fixed in ros2/rcl_logging#23.

nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 8, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

The spdlog example points out that dropping all loggers via
spdlog::shutdown(), which invokes spdlog::drop_all(), is optional.

Omit destructor calls of spdlog::drop(), but retain non-destructor calls
of spdlog::drop_all() for now since they do not cause any harm.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Link: https://github.com/gabime/spdlog/blob/a2b4262090fd3f005c2315dcb5be2f0f1774a005/example/example.cpp#L96
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 8, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

The spdlog example points out that dropping all loggers via
spdlog::shutdown(), which invokes spdlog::drop_all(), is optional.

Omit destructor calls of spdlog::drop(), but retain non-destructor calls
of spdlog::drop_all() for now since they do not cause any harm.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Link: https://github.com/gabime/spdlog/blob/a2b4262090fd3f005c2315dcb5be2f0f1774a005/example/example.cpp#L96
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 8, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

Move call of spdlog::drop() from destructor to main() function, to
ensure the logger may be re-registered with the same name in tests.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 8, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

Move call of spdlog::drop() from destructor to main() function, to
ensure the logger may be re-registered with the same name in tests.
Use Scope Guard pattern to implement RAII for logger registration.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Link: gabime/spdlog#2113
Link: https://en.wikibooks.org/wiki/More_C++_Idioms/Scope_Guard
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 8, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

Move call of spdlog::drop() from destructor to main() function, to
ensure the logger may be re-registered with the same name in tests.
Use Scope Guard pattern to implement RAII for logger registration.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Link: gabime/spdlog#2113
Link: https://en.wikibooks.org/wiki/More_C++_Idioms/Scope_Guard
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
pcolberg added a commit to OFS/opae-sdk that referenced this issue Apr 11, 2024
For static objects, the order of destruction is in reverse of the order
of construction, but the latter is hard to control except among static
objects defined within the same translation unit. This means that spdlog
may be destructed before spdlog::drop() is invoked in the destructor,
which leads to a segmentation fault due to memory use-after-free.

Move call of spdlog::drop() from destructor to main() function, to
ensure the logger may be re-registered with the same name in tests.
Use Scope Guard pattern to implement RAII for logger registration.

Link: ros2/rclcpp#933
Link: gabime/spdlog#1738
Link: gabime/spdlog#2113
Link: https://en.wikibooks.org/wiki/More_C++_Idioms/Scope_Guard
Closes: https://hsdes.intel.com/appstore/article/#/22019839238
Signed-off-by: Peter Colberg <peter.colberg@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants