Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Lost Wake Bug in ROSOutAppender #2033

Merged

Conversation

afakihcpr
Copy link
Contributor

Description

This fixes a bug in ROSOutAppender that consists of waiting on the condition variable queue_condition_ without checking if the log_queue_ is empty.

In this situation if the log_queue_ had some messages that were inserted while ROSOutAppender::logThread was publishing other messages, the new messages in the queue won't be published until another message eventually is added.

Note that the notify_all sent in the destructor would not cause the unpublished messages to get published upon node termination. This is because when the queue_condition_ is awaken by this notification, shutting_down_ would be true and would cause ROSOutAppender::logThread to return immediately.

How to Reproduce

One way to reproduce is illustrated in the snippet below. Send a bunch a bunch of log messages right after each other. Running multiple times, in many instances the last few messages won't be received on rosout. With the fix, all the messages are received everytime.

#include<ros/ros.h> 
#include <ros/topic_manager.h> 
#include <ros/publication.h> 
#include<chrono> 
#include<thread> 

int main(int argc, char** argv)  
{ 
  ros::init(argc, argv, "test"); 
  
  ros::NodeHandle n; 

  // Wait for rosout to subscribe 
  auto pub_ptr = ros::TopicManager::instance()->lookupPublication("/rosout"); 
  while (!pub_ptr->getNumSubscribers ()) 
  { 
    std::this_thread::yield(); 
  } 

  // Send a bunch of messages right after each other 
  const auto num_msgs = 10; 
  for (auto i = 0; i < num_msgs;)  
  { 
    ROS_INFO_STREAM("Message " << ++i << " of " << num_msgs); 
  } 

  // If another message is sent after a while, the missing messages will show up 
  /*using namespace std::chrono_literals; 
  std::this_thread::sleep_for(30s); 
  //Send another message to wake up the condition 
  ROS_INFO("Message after wait");*/ 

  ros::spin(); 
} 

This fixes a bug in ROSOutAppender that consists of waiting on the condition_variable `queue_condition_` without checking if the `log_queue_` is empty.

In this situation if the `log_queue_` had some messages that were inserted while `ROSOutAppender::logThread` was publishing other messages, the new messages in the queue won't be published, until another message eventually is added.

Note that the `notify_all` sent in the destructor would not cause the unpublished messages to get published, as when the `queue_condition_`  is awaken by this notification, `shutting_down_` would be true and would cause `ROSOutAppender::logThread`  to return immediately.
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, i think that this makes sense.

@dirk-thomas
Copy link
Member

@ros-pull-request-builder retest this please

@dirk-thomas
Copy link
Member

Thanks for the patch.

@dirk-thomas dirk-thomas merged commit 50897ce into ros:noetic-devel Sep 10, 2020
jacobperron pushed a commit that referenced this pull request Oct 16, 2020
This fixes a bug in ROSOutAppender that consists of waiting on the condition_variable `queue_condition_` without checking if the `log_queue_` is empty.

In this situation if the `log_queue_` had some messages that were inserted while `ROSOutAppender::logThread` was publishing other messages, the new messages in the queue won't be published, until another message eventually is added.

Note that the `notify_all` sent in the destructor would not cause the unpublished messages to get published, as when the `queue_condition_`  is awaken by this notification, `shutting_down_` would be true and would cause `ROSOutAppender::logThread`  to return immediately.

Co-authored-by: Adel Fakih <adel.fakih@avidbots.com>
jacobperron pushed a commit that referenced this pull request Oct 22, 2020
This fixes a bug in ROSOutAppender that consists of waiting on the condition_variable `queue_condition_` without checking if the `log_queue_` is empty.

In this situation if the `log_queue_` had some messages that were inserted while `ROSOutAppender::logThread` was publishing other messages, the new messages in the queue won't be published, until another message eventually is added.

Note that the `notify_all` sent in the destructor would not cause the unpublished messages to get published, as when the `queue_condition_`  is awaken by this notification, `shutting_down_` would be true and would cause `ROSOutAppender::logThread`  to return immediately.

Co-authored-by: Adel Fakih <adel.fakih@avidbots.com>
nim65s added a commit to nim65s/robotpkg that referenced this pull request Mar 12, 2021
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15

1.15.9 (2020-10-16)
-------------------
* Fix deadlock when service connection is dropped (ros/ros_comm#2074)
* Update maintainers (ros/ros_comm#2075)
* Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068)
* Fix spelling (ros/ros_comm#2066)
* Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033)
* Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023)
* Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038)
* Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017)
* Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051)
* Clearer error message (ros/ros_comm#2035)
* Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022)
* Clear cached URI for a node that has gone offline (ros/ros_comm#2010)
* Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009)
* Install advertisetest (ros/ros_comm#2046)
* Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013)
* Fix to address CVE-2020-16124 (ros/ros_comm#2065)
* Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003)

1.15.8 (2020-07-23)
-------------------
* change is_async_connected to use epoll when available (ros/ros_comm#1983)
* allow mixing latched and unlatched publishers (ros/ros_comm#1991)
* remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007)
* fix typo in topics.py (ros/ros_comm#1977)
* fix bad relative import (still Python 2 style) (ros/ros_comm#1973)
* improve shutdown message with duplicate node name (ros/ros_comm#1992)
* remove dependency on rostopic from rostest package (ros/ros_comm#2002)
* fix missing reload() function in Python 3 (ros/ros_comm#1968)
* add latch param to throttle (ros/ros_comm#1944)
* add const versions of XmlRpcValue converting operators (ros/ros_comm#1978)

1.15.7 (2020-05-28)
-------------------
* fix Windows build break (ros/ros_comm#1961)
* fix NameError in launch error handling (ros/ros_comm#1965)

1.15.6 (2020-05-21)
-------------------
* fix a bug that using a destroyed connection object (ros/ros_comm#1950)

1.15.5 (2020-05-15)
-------------------
* check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954)
* fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940)
* multiple latched publishers per process on the same topic (ros/ros_comm#1544)
* fix negative numbers in ros statistics (ros/ros_comm#1531)
* remove extra \n in ROS_DEBUG (ros/ros_comm#1925)
* add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850)
* fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952)
* fix brief description comments after members (ros/ros_comm#1920)
* add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937)
* roslaunch-check: search dir recursively (ros/ros_comm#1914)
* sort printed nodes by namespace alphabetically (ros/ros_comm#1934)
* remove pycrypto import (not used) (ros/ros_comm#1922)
* avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948)
* fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722)
* clear message queue on simtime jumping back (ros/ros_comm#1518)
* use undefined dynamic_lookup on macOS (ros/ros_comm#1923)
* check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929)

1.15.4 (2020-03-19)
-------------------
* restrict boost dependencies to components used (ros/ros_comm#1871)
* add exception for ConnectionAbortedError (ros/ros_comm#1908)
* fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907)
* fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4)

1.15.3 (2020-02-28)
-------------------
* remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903)

1.15.2 (2020-02-25)
-------------------
* export missing Boost dependency (ros/ros_comm#1898)
* add timestamp formatting for rosconsole (ros/ros_comm#1892)

1.15.1 (2020-02-24)
-------------------
* fix missing boost dependencies (ros/ros_comm#1895)
* use setuptools instead of distutils (ros/ros_comm#1870)
* increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897)

1.15.0 (2020-02-21)
-------------------
* fix dictionary changed size during iteration (ros/ros_comm#1894)
* update test to pass with old and new yaml (ros/ros_comm#1893)

Packaging changes:
- removed patch-an, as there are no more boost version checks
- updated patch-ao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants