Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp: Multiple latched publishers per process on the same topic (fix #146) #1544

Merged

Conversation

meyerj
Copy link
Contributor

@meyerj meyerj commented Nov 22, 2018

At the moment roscpp (and probably most other client libraries) does not support multiple latched publishers on the same topic within a single process, because the latched message is stored in the Publication singleton instead of in individual publishers (#146) and each publication of another publisher invalidates the previous latched message. This situation is unfortunate because in many cases it is hard to control which components in a single process do publish latched topics and how they are deployed, e.g. for Gazebo or move_base plugins.

My colleague @achim-k and I have been working on a patch for roscpp that enables multiple latched subscribers in a single process, following my proposal in #146 (comment). The solution is similar to what @mgrrx has done for Python.

While this patch does solve the issue for online publishing and our current needs, I do not see a good general solution for rosbag yet. It would be possible to instantiate one patched Publisher per connection during the replay, such that latched topics from different original publishers would mimic the recorded situation correctly during the replay. rosbag already creates one internal publisher per topic and callerid. So with this patch in roscpp latched publications of different nodes will be replayed correctly (see also #1537). But without modifications to the ROS protocol there is no way to tell which internal publisher instance of a node published which message if there is more than one, which would be required to replay messages as seen during the recording and to know which message invalidates which message seen earlier on the same topic.

For the special case of replaying tf_static it might be an option to collect static transforms while iterating over the bag file and each time publish a new TFMessage message that contains all found transformations so far.

Unfortunately the patch is not ABI compatible and also changes the behavior significantly, so it can not be merged in melodic-devel or even backported to kinetic.

@ablakey
Copy link

ablakey commented Nov 26, 2018

This is interesting. Are there common cases for wanting multiple latched publishers?

From my perspective, this frustrates the idea that topics are streams of state and a single message is a moment of state. I'm assuming your desire is for new subscribers to receive (in or out of order?) a series of messages from n latched publishers on a single topic.

You mention multiple publishers in the same process. Have you considered using a wrapper message that allows you to publish an array of that message type to a topic instead?

Consider that treating a collection of messages as a single moment of state frustrates rosbag (as you've alluded to) and other common tools/processes.

@flixr
Copy link
Contributor

flixr commented Nov 26, 2018

We also use multiple "latched" publishers on the same topic (in our case mostly /diagnostics) for things that we don't want to republish at fixed intervals as longs as they don't change.
The workaround we use is to use a normal publisher, but monitor the subscribers and just republish every time a new subscriber is added.

@meyerj
Copy link
Contributor Author

meyerj commented Nov 27, 2018

From #1544 (comment):

This is interesting. Are there common cases for wanting multiple latched publishers?

The most prominent use-case is tf_static. It is conceptually broken in my opinion if it is not possible to latch more than one message per process. Using a shared tf2_ros::StaticTransformBroadcaster is not an option either because in many cases publishers do not know about each other (nodelets, plugins, ...).

From my perspective, this frustrates the idea that topics are streams of state and a single message is a moment of state. I'm assuming your desire is for new subscribers to receive (in or out of order?) a series of messages from n latched publishers on a single topic.

Exactly. It should not make a difference whether there are multiple latched publishers in separate processes or in a single process. Each publisher delivers its last message.

You mention multiple publishers in the same process. Have you considered using a wrapper message that allows you to publish an array of that message type to a topic instead?

Not an option for the tf_static use-case where the message type cannot be changed. Or did I misunderstand you?

Consider that treating a collection of messages as a single moment of state frustrates rosbag (as you've alluded to) and other common tools/processes.

... depending on the queue size of subscribers. If the queue size is greater or equal the number of publishers, there is no problem with receiving more than one latched message from the same node. tf2_ros::TransformListener creates subscribers on /tf and /tf_static with a queue size of 100. Same for rosbag record.

From #1544 (comment):

The workaround we use is to use a normal publisher, but monitor the subscribers and just republish every time a new subscriber is added.

Yes, that would be a viable workaround to "emulate" latched publishing, in cases where you can control the creation of the publisher and inject custom subscriber callbacks. Nevertheless, the approach proposed here is more general and I cannot think of a use-case where it is actually desirable that one publisher "overwrites" the message published by another publisher instance.

It might be an option to apply this method to tf2_ros::StaticTransformPublisher, which could probably be done without breaking ABI. In combination with a patch to rosbag for special handling of tf_static as proposed above, at least this use case could be solved properly (for roscpp and rospy).

@meyerj
Copy link
Contributor Author

meyerj commented Nov 27, 2018

Jenkins build errors seem to be unrelated?

@cwecht
Copy link
Contributor

cwecht commented Apr 12, 2019

@ros-pull-request-builder retest this please

@peci1
Copy link
Contributor

peci1 commented Sep 8, 2019

Bump?

@meyerj meyerj force-pushed the fix/multiple_latched_publishers branch from 5676494 to dba6c6e Compare March 14, 2020 09:17
@meyerj
Copy link
Contributor Author

meyerj commented Mar 14, 2020

Any chance that this patch gets included for the upcoming ROS Noetic or any future ROS 1 release, alongside a similar patch by @mgrrx for rospy? Or is it considered too much of behavioral change without a proper design review process?

In combination with #1537 the resulting behavior of recording and replaying latched topics (e.g. /tf_static) is almost correct, in the sense that the situation seen by other subscribers during replay is close to the situation at the time the bag file was recorded. Only in cases where the original publisher node had multiple latched publishers, the data recorded in bag files cannot distinguish them anymore because the callerid is the same for all of them (see PR description).

@dirk-thomas dirk-thomas changed the base branch from melodic-devel to noetic-devel April 20, 2020 22:03
@dirk-thomas
Copy link
Member

Any chance that this patch gets included for ... any future ROS 1 release

Atm Noetic is the last ROS 1 distribution Open Robotics plans to release. So its either Noetic or potentially never since the patch is ABI / behavior breaking (which makes it a no-go for an already released distro).

Any chance that this patch gets included for the upcoming ROS Noetic.

I am slightly concerned that the explicit check for mismatching publisher option could lead to existing code to break.

That being said I understand why this enhancement would be very valuable to have. If necessary we can hopefully address issues in an ABI compatible way in a patch version after the initial release.

So 👍 from me to merge for Noetic. Thanks for working on this.

alongside a similar patch by @mgrrx for rospy?

It doesn't seem there is a PR for the Python side but I don't consider that a blocker for this,

@dirk-thomas dirk-thomas merged commit 3fc96bb into ros:noetic-devel May 14, 2020
nim65s added a commit to nim65s/robotpkg that referenced this pull request Mar 12, 2021
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15

1.15.9 (2020-10-16)
-------------------
* Fix deadlock when service connection is dropped (ros/ros_comm#2074)
* Update maintainers (ros/ros_comm#2075)
* Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068)
* Fix spelling (ros/ros_comm#2066)
* Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033)
* Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023)
* Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038)
* Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017)
* Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051)
* Clearer error message (ros/ros_comm#2035)
* Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022)
* Clear cached URI for a node that has gone offline (ros/ros_comm#2010)
* Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009)
* Install advertisetest (ros/ros_comm#2046)
* Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013)
* Fix to address CVE-2020-16124 (ros/ros_comm#2065)
* Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003)

1.15.8 (2020-07-23)
-------------------
* change is_async_connected to use epoll when available (ros/ros_comm#1983)
* allow mixing latched and unlatched publishers (ros/ros_comm#1991)
* remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007)
* fix typo in topics.py (ros/ros_comm#1977)
* fix bad relative import (still Python 2 style) (ros/ros_comm#1973)
* improve shutdown message with duplicate node name (ros/ros_comm#1992)
* remove dependency on rostopic from rostest package (ros/ros_comm#2002)
* fix missing reload() function in Python 3 (ros/ros_comm#1968)
* add latch param to throttle (ros/ros_comm#1944)
* add const versions of XmlRpcValue converting operators (ros/ros_comm#1978)

1.15.7 (2020-05-28)
-------------------
* fix Windows build break (ros/ros_comm#1961)
* fix NameError in launch error handling (ros/ros_comm#1965)

1.15.6 (2020-05-21)
-------------------
* fix a bug that using a destroyed connection object (ros/ros_comm#1950)

1.15.5 (2020-05-15)
-------------------
* check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954)
* fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940)
* multiple latched publishers per process on the same topic (ros/ros_comm#1544)
* fix negative numbers in ros statistics (ros/ros_comm#1531)
* remove extra \n in ROS_DEBUG (ros/ros_comm#1925)
* add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850)
* fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952)
* fix brief description comments after members (ros/ros_comm#1920)
* add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937)
* roslaunch-check: search dir recursively (ros/ros_comm#1914)
* sort printed nodes by namespace alphabetically (ros/ros_comm#1934)
* remove pycrypto import (not used) (ros/ros_comm#1922)
* avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948)
* fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722)
* clear message queue on simtime jumping back (ros/ros_comm#1518)
* use undefined dynamic_lookup on macOS (ros/ros_comm#1923)
* check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929)

1.15.4 (2020-03-19)
-------------------
* restrict boost dependencies to components used (ros/ros_comm#1871)
* add exception for ConnectionAbortedError (ros/ros_comm#1908)
* fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907)
* fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4)

1.15.3 (2020-02-28)
-------------------
* remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903)

1.15.2 (2020-02-25)
-------------------
* export missing Boost dependency (ros/ros_comm#1898)
* add timestamp formatting for rosconsole (ros/ros_comm#1892)

1.15.1 (2020-02-24)
-------------------
* fix missing boost dependencies (ros/ros_comm#1895)
* use setuptools instead of distutils (ros/ros_comm#1870)
* increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897)

1.15.0 (2020-02-21)
-------------------
* fix dictionary changed size during iteration (ros/ros_comm#1894)
* update test to pass with old and new yaml (ros/ros_comm#1893)

Packaging changes:
- removed patch-an, as there are no more boost version checks
- updated patch-ao
@meyerj meyerj deleted the fix/multiple_latched_publishers branch May 4, 2021 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants