Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dumped in rcl_logging_external_log after ctrl+c in some case #1418

Closed
iuhilnehc-ynos opened this issue Oct 22, 2020 · 11 comments
Closed
Labels
bug Something isn't working

Comments

@iuhilnehc-ynos
Copy link
Collaborator

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • source
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Using directly call for the executable file can easily reproduce the issue(frequency: 1/3),

$ ./install/lib/demo_nodes_cpp/listener --ros-args --log-level debug      (press ctrl+c immediately)

Expected behavior

No core dumped

Actual behavior

...
[DEBUG] [1603352071.289077304] [rcl]: Timeout calculated based on next scheduled timer: false
^C[INFO] [1603352071.813215385] [rclcpp]: signal_handler(signal_value=2)
[DEBUG] [1603352071.813371611] [rclcpp]: signal_handler(): SIGINT received, notifying deferred signal handler
[DEBUG] [1603352071.813459003] [rclcpp]: deferred_signal_handler(): woken up due to SIGINT or uninstall
[DEBUG] [1603352071.813565138] [rclcpp]: deferred_signal_handler(): SIGINT received, shutting down
[DEBUG] [1603352071.813858947] [rclcpp]: deferred_signal_handler(): shutting down rclcpp::Context @ 0x561b1ab00b50, because it had shutdown_on_sigint == true
[DEBUG] [1603352071.813943162] [rcl]: Shutting down ROS client library, for context at address: 0x561b1ab00e30
[DEBUG] [1603352071.814205573] [rcl]: Finalizing publisher
[DEBUG] [1603352071.817029627] [rcl]: Publisher finalized
[DEBUG] [1603352071.817415178] [rclcpp]: deferred_signal_handler(): waiting for SIGINT or uninstall
[DEBUG] [1603352071.814182399] [rcl]: Guard condition in wait set is ready
Segmentation fault (core dumped)

Additional information

backtrace

(gdb) bt
#0  std::__atomic_base<int>::load (__m=std::memory_order_relaxed, this=0x40) at /usr/include/c++/9/bits/atomic_base.h:419
#1  spdlog::logger::should_log (this=0x0, msg_level=spdlog::level::debug) at /home/sharedata/Linux/ROS2/docker_ros2_master/local_install/include/spdlog/logger.h:322
#2  0x00007f87fdb3305e in spdlog::logger::log (this=0x0, loc=..., lvl=spdlog::level::debug, msg=...)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/local_install/include/spdlog/logger.h:165
#3  0x00007f87fdb35253 in spdlog::logger::log<char const*, (char const**)0> (this=0x0, loc=..., lvl=spdlog::level::debug, 
    msg=@0x7ffe9ec01258: 0x7ffe9ec03780 "[DEBUG] [1603352071.814182399] [rcl]: Guard condition in wait set is ready")
    at /home/sharedata/Linux/ROS2/docker_ros2_master/local_install/include/spdlog/logger.h:147
#4  0x00007f87fdb343e7 in spdlog::logger::log<char const*> (this=0x0, lvl=spdlog::level::debug, 
    msg=@0x7ffe9ec01258: 0x7ffe9ec03780 "[DEBUG] [1603352071.814182399] [rcl]: Guard condition in wait set is ready")
    at /home/sharedata/Linux/ROS2/docker_ros2_master/local_install/include/spdlog/logger.h:140
#5  0x00007f87fdb31e66 in rcl_logging_external_log (severity=10, name=0x7f87fdd290f3 "rcl", 
    msg=0x7ffe9ec03780 "[DEBUG] [1603352071.814182399] [rcl]: Guard condition in wait set is ready")
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rcl_logging/rcl_logging_spdlog/src/rcl_logging_spdlog.cpp:148
#6  0x00007f87fdd0a941 in rcl_logging_ext_lib_output_handler (location=0x7f87fdd2f600 <__rcutils_logging_location.4922>, severity=10, name=0x7f87fdd290f3 "rcl", 
    timestamp=1603352071814182399, format=0x7f87fdd297b0 "Guard condition in wait set is ready", args=0x7ffe9ec03cd0)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rcl/rcl/src/rcl/logging.c:204
#7  0x00007f87fdd0a570 in rcl_logging_multiple_output_handler (location=0x7f87fdd2f600 <__rcutils_logging_location.4922>, severity=10, name=0x7f87fdd290f3 "rcl", 
    timestamp=1603352071814182399, format=0x7f87fdd297b0 "Guard condition in wait set is ready", args=0x7ffe9ec03cd0)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rcl/rcl/src/rcl/logging.c:154
#8  0x00007f87fe75b2e1 in rclcpp_logging_output_handler (location=0x7f87fdd2f600 <__rcutils_logging_location.4922>, severity=10, name=0x7f87fdd290f3 "rcl", 
    timestamp=1603352071814182399, format=0x7f87fdd297b0 "Guard condition in wait set is ready", args=0x7ffe9ec03cd0)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rclcpp/rclcpp/src/rclcpp/context.cpp:135
#9  0x00007f87fe1a2532 in rcutils_log (location=0x7f87fdd2f600 <__rcutils_logging_location.4922>, severity=10, name=0x7f87fdd290f3 "rcl", 
    format=0x7f87fdd297b0 "Guard condition in wait set is ready") at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rcutils/src/logging.c:544
#10 0x00007f87fdd1c854 in rcl_wait (wait_set=0x7ffe9ec09660, timeout=-1) at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rcl/rcl/src/rcl/wait.c:653
#11 0x00007f87fe77a1e1 in rclcpp::Executor::wait_for_work (this=0x7ffe9ec09640, timeout=...)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:724
#12 0x00007f87fe77ae52 in rclcpp::Executor::get_next_executable (this=0x7ffe9ec09640, any_executable=..., timeout=...)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rclcpp/rclcpp/src/rclcpp/executor.cpp:873
#13 0x00007f87fe78c86b in rclcpp::executors::SingleThreadedExecutor::spin (this=0x7ffe9ec09640)
    at /home/sharedata/Linux/ROS2/docker_ros2_master/src/ros2/rclcpp/rclcpp/src/rclcpp/executors/single_threaded_executor.cpp:35
#14 0x0000561b1a259220 in main (argc=4, argv=0x7ffe9ec0a338) at /home/sharedata/Linux/ROS2/docker_ros2_master/local_build/demo_nodes_cpp/rclcpp_components/node_main_listener.cpp:56
(gdb) 

NOTE:

  1. I think we don't want to add an extra lock to fix this issue because it will slow down the log performance
  2. Maybe we can move rcl_logging_fini in __delete_context
    // shutdown logger
    if (logging_mutex_) {
    // logging was initialized by this context
    std::lock_guard<std::recursive_mutex> guard(*logging_mutex_);
    size_t & count = get_logging_reference_count();
    if (0u == --count) {
    rcl_ret_t rcl_ret = rcl_logging_fini();
    if (RCL_RET_OK != rcl_ret) {
    RCUTILS_SAFE_FWRITE_TO_STDERR(
    RCUTILS_STRINGIFY(__file__) ":"
    RCUTILS_STRINGIFY(__LINE__)
    " failed to fini logging");
    rcl_reset_error();
    }
    }
    }
@iuhilnehc-ynos
Copy link
Collaborator Author

@ivanpauno

Could you confirm this issue?

@ivanpauno
Copy link
Member

@iuhilnehc-ynos Can you extend the explanation of the issue?
I don't understand what's causing the segfault.

@ivanpauno ivanpauno added the bug Something isn't working label Oct 22, 2020
@iuhilnehc-ynos
Copy link
Collaborator Author

iuhilnehc-ynos commented Oct 23, 2020

@ivanpauno

Let me try to explain this.

Please refer to the following steps, it seems that g_root_logger might be nullptr at step 6.

  • important steps in two threads

main thread:

rclcpp::executors::SingleThreadedExecutor::spin
    ...
        rcl_wait              @ rcl/rcl/src/rcl/wait.c:653
            RCUTILS_LOG_DEBUG_EXPRESSION_NAMED                             
                rcutils_log
                    rclcpp_logging_output_handler                  step 3    ready to log by rcl_logging_external_log
                        rcl_logging_multiple_output_handler
                            rcl_logging_ext_lib_output_handler
                                g_root_logger->log                 step 6    crash here if using the shutdowned log
                                
signal_handler
    SignalHandler::signal_handler_common
        // set flag and notify
        notify_signal_handler                                      step 1    ctrl + c

deferred_signal_handler thread:

SignalHandler::deferred_signal_handler
    // wait event and check the flag to shutdown
    Context::shutdown

        // just notify/trigger the event without waiting
        this->interrupt_all_sleep_for();
        this->interrupt_all_wait_sets();                           step 2    trigger the waitset

        // shutdown logger
        rcl_logging_fini()
            rcutils_logging_set_output_handler                     step 4    setting handler is too late for this case
            rcl_logging_external_shutdown                          step 5    shutdown the external log
  • external log

rcl_logging/rcl_logging_spdlog/src/rcl_logging_spdlog.cpp

    rcl_logging_ret_t rcl_logging_external_shutdown()
    {
      g_root_logger = nullptr;
      return RCL_LOGGING_RET_OK;
    }

    void rcl_logging_external_log(int severity, const char * name, const char * msg)
    {
      (void)name;
      g_root_logger->log(map_external_log_level_to_library_level(severity), msg);
    }

BTW: I don't think it's a good method to fix this issue by checking pointer g_root_logger in rcl_logging_external_log.

@ivanpauno
Copy link
Member

Ok, based on your comment this sounds like another instance of #1139, right?
This one is a bit more severe, because it segfaults instead of only logging an error.

Do you have any fix in mind for this case?
I will try yo see if I have time for continuing the discussions in #1139.

@iuhilnehc-ynos
Copy link
Collaborator Author

@ivanpauno

Thanks for your information.

Ok, based on your comment this sounds like another instance of #1139, right?

Yes. It seems you had already discussed this kind of issue, which asynchronously shutting down causes some problems.

Do you have any fix in mind for this case?

Yes, but it might be just a workaround, I thought that we could delay calling rcl_logging_fini inside __delete_context before.
After referring to #1139, I think it's better to wait for the official discussion.

@ivanpauno
Copy link
Member

Yes, but it might be just a workaround, I thought that we could delay calling rcl_logging_fini inside __delete_context before.

Maybe that isn't a bad idea, but I'm not sure if that avoids completely a segfault.

@ivanpauno
Copy link
Member

I'm not sure how this is happening, when logging finalization and the output handler are synchronized:

@iuhilnehc-ynos
Copy link
Collaborator Author

iuhilnehc-ynos commented Oct 27, 2020

@ivanpauno

Thank you for your response.

  1. logging_mutex = get_global_logging_mutex();
  2. std::lock_guard<std::recursive_mutex> guard(*logging_mutex_);
    size_t & count = get_logging_reference_count();
    if (0u == --count) {
    rcl_ret_t rcl_ret = rcl_logging_fini();
    if (RCL_RET_OK != rcl_ret) {
    RCUTILS_SAFE_FWRITE_TO_STDERR(
    RCUTILS_STRINGIFY(__file__) ":"
    RCUTILS_STRINGIFY(__LINE__)
    " failed to fini logging");
    rcl_reset_error();
    }
    }
  3. std::lock_guard<std::recursive_mutex> guard(*logging_mutex);
    return rcl_logging_multiple_output_handler(
    location, severity, name, timestamp, format, args);

    to call g_rcl_logging_out_handlers[0~g_rcl_logging_num_out_handlers] that not be updated in rcl_logging_fini() which only update g_rcutils_logging_output_handler.

@iuhilnehc-ynos
Copy link
Collaborator Author

iuhilnehc-ynos commented Oct 27, 2020

Maybe there is another fix method,

rcl_ret_t rcl_logging_fini()
{
  rcl_ret_t status = RCL_RET_OK;
  rcutils_logging_set_output_handler(rcutils_logging_console_output_handler);
  // TODO.
+  g_rcl_logging_num_out_handlers = 1;
+  g_rcl_logging_out_handlers[0] = rcutils_logging_console_output_handler;

  if (g_rcl_logging_rosout_enabled) {
    status = rcl_logging_rosout_fini();
  }
  if (RCL_RET_OK == status && g_rcl_logging_ext_lib_enabled) {
    status = rcl_logging_external_shutdown();
  }

  return status;
}

Updated: I'll check it tomorrow.

@ivanpauno
Copy link
Member

Maybe there is another fix method,

That patch makes sense to me

@ivanpauno
Copy link
Member

Fixed in ros2/rcl#844.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants