Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logger::set_level does not lock the logging mutex #2120

Open
jrutgeer opened this issue Mar 7, 2023 · 9 comments
Open

Logger::set_level does not lock the logging mutex #2120

jrutgeer opened this issue Mar 7, 2023 · 9 comments
Labels
help wanted Extra attention is needed

Comments

@jrutgeer
Copy link
Contributor

jrutgeer commented Mar 7, 2023

Shouldn't set_level() also lock the global logging mutex?

Logger::set_level(Level level)
{
rcutils_ret_t rcutils_ret = rcutils_logging_set_logger_level(
get_name(),
static_cast<RCUTILS_LOG_SEVERITY>(level));

Since it writes to the rcutils hashmap, which has no locks (see also ros2/rcutils#397).

See also get_child() which does lock:

Logger::get_child(const std::string & suffix)
{
if (!name_) {
return Logger();
}
rcl_ret_t rcl_ret = RCL_RET_OK;
std::shared_ptr<std::recursive_mutex> logging_mutex;
logging_mutex = get_global_logging_mutex();
{
std::lock_guard<std::recursive_mutex> guard(*logging_mutex);
rcl_ret = rcl_logging_rosout_add_sublogger((*name_).c_str(), suffix.c_str());

@clalancette
Copy link
Contributor

In short: probably, yes.

The longer answer is that I think there are two different problems in this area of code:

  1. Logger::set_level doesn't seem to operate like the rest of the Logger class. In particular, the rest of the Logger class is manipulating things via rcl_logging_rosout, while this call is directly using rcutils_logging. Maybe that is just a case where we need to add another function to rcl_logging_rosout, but I'm not even sure of that. Looking at this design again, I'm not quite sure why everything here is manipulating rcl_logging_rosout; the logging is not rosout-specific. So there may be a deeper problem here.
  2. The global logging mutex seems to be a bit confused on what it wants to protect. It is currently used in these places:
    • NodeBase::NodeBase to lock around rcl_node_init (which ends up calling rcl_logging_rosout_init_publisher_for_node)
    • rclcpp_logging_output_handler to lock around rcl_logging_multiple_output_handler
    • Context::init to lock around rcl_logging_configure_with_output_handler
    • Logger::get_child to lock around rcl_logging_rosout_add_sublogger

These uses are all for logging stuff, true, but it seems like they span both regular logging and the rosout logging. It's not totally clear to me what the design here should be, but I do think it needs a bit of work. Just my 2 cents.

@clalancette clalancette added the help wanted Extra attention is needed label Mar 7, 2023
@jrutgeer
Copy link
Contributor Author

jrutgeer commented Mar 7, 2023

Looking at this design again, I'm not quite sure why everything here is manipulating rcl_logging_rosout; the logging is not rosout-specific. So there may be a deeper problem here.

Well, from an rclcpp point of view, you cannot think of console logging, rosout logging and external library logging as separate things, as they are called consecutively on each log call, and multiple log calls can be made concurrently from different threads.
So in that aspect it makes sense that there is one global rclcpp logging mutex. Though it could be an optimization to have three.

Any writing call to logging is locked:

NodeBase::NodeBase to lock around rcl_node_init

  • Upon creating a node, the rosout hashmap needs to be updated, so it contains the link for that node name to the publisher,
    • Hence the lock around rcl_node_init,

rclcpp_logging_output_handler to lock around rcl_logging_multiple_output_handler

  • Log calls from rclcpp are mutually exclusive
    • Hence this lock

context::init to lock around rcl_logging_configure_with_output_handler

  • You can have multiple contexts, but logging is initialized only once (per this check).
    • Concurrent calls to init should not be possible as there is an init_lock,
    • And logging should not be done before init is finished, except in case of bugs maybe. ;-)
    • So debatable whether this lock is mandatory, but it does not harm I suppose,
  • Note that this call initializes all three logging systems: here (which actually is a bug), here and here.

Logger::get_child to lock around rcl_logging_rosout_add_sublogger

  • On request of a child logger, the rosout hashmap needs to be updated so the child logger name is linked to the same publisher as the parent,
    • Hence this lock

So currently any write access to any logging is mutually excluded by a mutex lock, except for the Logger::set_level. Hence this issue.

Though, now that I am writing this, I just realized the following:

Read access of the loglevel, through rcutils_logging_logger_is_enabled_for(name, severity), currently does not have any locks around it. What happens in case of a concurrent call with rcutils_logging_set_logger_level? Currently this is not likely, as loglevels are only set on initialization through the command line arguments, but this might change that...

If this also needs a lock, then I think:

  • Either the lock should be moved from the rclcpp multiple output handler to the rclcpp log macros (i.e. around the whole if logger_is_enabled_for() { rcutils_log_internal();} call, or
  • Locking should be done at the level of rcutils, rosout and the external library.

First option is the easiest, and (without thinking it through) I think this would also fix this.

EDIT: this would not work as it locks both the read and the write part of the macro call. It is possible to split up at the rclcpp macro level, but the best solution is indeed as mentioned in ros2/rcutils#397.

Currently there only is a mutex in the external log library, not in rcutils and rosout.

@jrutgeer
Copy link
Contributor Author

jrutgeer commented Mar 7, 2023

What happens in case of a concurrent call with rcutils_logging_set_logger_level?

The same issue as rcutils #393 I suppose.


Either the lock should be moved from the rclcpp multiple output handler to the rclcpp log macros

Though that would mean that a log macro call from a low priority thread could block a call from a higher priority thread, even when disabled through the loglevel.

@jrutgeer
Copy link
Contributor Author

jrutgeer commented Mar 8, 2023

I realized that I concluded basically what is already written in ros2/rcutils#397.

@clalancette
Copy link
Contributor

clalancette commented Mar 8, 2023

Currently there only is a mutex in the external log library, not in rcutils and rosout.

Correct. Up until now, the design philosophy has always been to have the locking in the user-facing client libraries (rclcpp, rclpy, etc). This was both a decision to keep the lower layers simpler, as well as to make it easier to implement (as both C++ and Python have cross-platform locking solutions).

However, because of the dependency-injection style of design here at the logging layer, it doesn't really work for logging. We really do need to solve ros2/rcutils#397 .

@jrutgeer
Copy link
Contributor Author

jrutgeer commented Mar 8, 2023

Actually, could this be an alternative solution at the rclcpp level?

If I understand it correctly the two main needs for ros2/rcutils#397 are:

1 ) An optimization:

  • If a loglevel for a child logger is unset, then it looks up that of the ancestors,
  • So the hashmap should be updated so lookup doesn't always have to be repeated on each log call,

2 ) Ensure that rcutils_logging_set_logger_level is not concurrent with rcutils_logging_get_logger_level

  • Currently this is not likely to happen, as loglevels are typically only set during initialization (from the command line arguments), and at that time there are no logging calls yet,
  • But External logging level configuration ros2#1355 will make it more likely and hence might lead to sporadic crashes.

So what if:

  • Logger::get_child() is changed so it also:
    • Looks up its loglevel (i.e. possibly that of an ancestor), with rclcpp read lock around the call, and
    • Calls rcutils_logging_set_logger_level (with rclcpp write lock) so it is stored into the hashmap,
  • The optimization code in rcutils_logging_get_logger_effective_level stays removed,
    • It's no longer needed for children of node loggers, due to above change to get_child(),
    • Children of named loggers (e.g. rclcpp::get_logger("parent.child")) don't have the optimization, since you can't call get_child() on a named logger. But that is not a common use case (and could be discouraged in the documentation),
  • The rclcpp logging macros are changed (more or less a duplicate of the rcutils macros) so that an rclcpp readlock can be put arround the rcutils_logging_logger_is_enabled_for(name, severity) call and an rclcpp writelock around the rcutils_log_internal call.
  • All other 'writing' calls (init, node_init, set_level etc) also get the write lock.

@clalancette
Copy link
Contributor

  • Children of named loggers (e.g. rclcpp::get_logger("parent.child")) don't have the optimization, since you can't call get_child() on a named logger. But that is not a common use case (and could be discouraged in the documentation),

This situation is more common than you might think. In particular, when a user creates an action server, for instance, we create children named loggers for some of the internal things.

Separately, if we are going to do this work for rclcpp, we'll also need to do the same for rclpy for the same reasons.

All of that said, I do think that this is an easier and more viable way forward than trying to change rcutils. It improves the situation (and performance) for many common cases, and makes us thread-safe where we currently aren't.

So with that said, @jrutgeer are you up for trying an implementation? I'm happy to do a review.

@jrutgeer
Copy link
Contributor Author

jrutgeer commented Mar 16, 2023

@clalancette

This situation is more common than you might think. In particular, when a user creates an action server, for instance, we create children named loggers for some of the internal things.

I am not sure. I had checked that, and unless I am missing something, these are all node loggers, not named loggers.

are you up for trying an implementation?

Not sure yet, it is hard for me to assess how long it would take me. I haven't written any code since 2006. :-)

Also I'm not sure yet about the best solution:

A readers-writer lock isn't ideal, because this still blocks on the reader side if it concurs with a write. Ideally it should be possible to use logging macros in realtime code during testing, and disable them through the loglevel for normal use. But that implies a non-blocking read of the loglevel.

So what about following alternative:

  • Upon construction of an rclcpp::Logger, the loglevel is read from the hashtable (with read lock) and cached locally,
    • If there is no loglevel in the hashtable, the cached value is set to the default loglevel,
    • Copy constructor copies the cached value (no lookup in the hashtable)
  • Upon log macro call, rcutils_logging_get_logger_effective_level(name) is called only when a try_lock() succeeds,
    • If the try_lock() fails, the cached loglevel value is used instead,
    • Upon success, the cached loglevel is updated if needed.

A further optimization for node loggers could be not to call rcutils_logging_get_logger_effective_level(name) for node loggers, but always use the cached value. Since loglevel can only be changed upon initialization (i.e. before the node constructor) or through the upcoming service call, which should then update not only the hash table but also the cached value.
The annoying thing is that you can call that service with the name of another node. In that case only the hashtable would be updated and not the other node's local variable. So the service call would work when called with:

  • Its own node name (as this would change the hash table and the local variable),
  • Any named logger's name (as this would change the hash table, and named loggers always use above try_lock() procedure),
  • But not with another node's name (as the hashtable would be updated but not their local variable).
    Possible solution could be to also store a (weak?) pointer to the node in the hash table for each node logger entry, so upon service call the local variable can be set.

Also in the case of multiple nodes with the same name it would only work for the node that executes the service call.

@jrutgeer
Copy link
Contributor Author

According to this, std::mutex::try_lock() can fail spuriously, what in above case isn't really an issue (it'll just mean the cached value is used for that macro call, but on the next one it should succeed), but also that std::mutex::unlock() can block, which is an issue for realtime.

They propose the use of a spinlock intead of a mutex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants