New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No callback when AUTOMATIC liveliness is lost #384
Comments
@MiguelCompany @richiprosima for awareness. |
@richiware friendly ping :) I've moved this to the backlog, meaning we're not likely to look at it again anytime soon. Would this be something eProsima could work on? |
CC: @MiguelCompany |
|
Uh. That didn't use to be the case. @eboasson mind to take a look ?
Could it be related to the fact that in that test both nodes exist in the same process space and use the same participant? |
@hidmic I too find this quite surprising, I’ll have a look asap. |
@hidmic this:
I believe to be the case, based on Cyclone DDS trace output and a bit of tracing using the debugger. I don't know why exactly, the python code does make it to the invocation of With Somewhat stripped debugger output:
|
@hidmic Looking at nightly CI results for |
Uh, indeed. That package's |
@eboasson see ros2/system_tests#441. |
@hidmic the only test that fails with Maybe I'm running the wrong test? I did merge in ros2/system_tests#441 first. |
@hidmic Now that eProsima/Fast-DDS#1300 is merged, this should have been fixed. |
I don't see how it's going to work without changes to With def destroy(self):
print(f'publisher {self.topic}: destroy handle')
self.handle.destroy() it is quite clear that the liveliness demo code indeed tries to delete the node and the publisher, but it doesn't end up in the corresponding C destructor code. Therefore, there is no way the RMW layer or the underlying middleware can be expected to signal liveliness lost:
|
@eboasson there is a publisher being instantiated. Nonetheless, I'd intuitively expect a subscriber with a given deadline QoS to complain if that deadline is missed regardless of ongoing publications or even publisher existence.
True that. Fixing. |
Interestingly enough, my expectation does not match the DDS definition of deadline. And transitively, neither ROS definition. |
@eboasson you're right, but the issue doesn't show itself when using FastDDS. Investigating. |
If I'm not mistaken, there is no publisher in the "test_deadline_no_publisher" test: while it does instantiate a Not generating deadline missed notifications if there is no instance is the way it is defined in the DDS spec, and I do think it is the right choice: in the DDS data model, there are keys identifying instances and deadline misses are tied to instances (where it actually informs you for which instance a deadline was missed). Naturally you don't want these notifications for all instances that might come into existence in the future. The case where there are no keys (which ROS 2 uses) is treated by the spec as the limit case where there is at most 1 instance, and so also requires the presence of data. I happen to agree with that choice, but one could perfectly well argue that in that case there should be no concept of an instance. Doing that breaks various other things through nasty interactions, for example, the case of a transient/persistent topic: without an instance, you can't dispose data, without disposing data, you can't get rid of transient/persistent data. Of course one can work around such details by adding special cases, but there are too many special cases in the specs already ... The other issue with generating deadline missed notifications without data being present, is that it is not unusual for a system to take a fair amount of time to initialise and then run at a sustained high rate. If you'd use a deadline that makes sense for the update rate, you'd be getting spurious deadline missed notifications during initialisation. Of course you could ignore those, or (in the DDS interface anyway) start with the notifications disabled, enabling them when the system becomes operational, but that has its own donwsides. The usual OMG solution to such a problem is to make the behaviour configurable, but if you ask me, there are too many QoS settings as it is ... |
Found the issue with |
Bug report
Required Info:
Steps to reproduce issue
Try out the liveliness QoS demo using
rmw_fastrtps_cpp
. Default liveliness policy is already AUTOMATIC, but you may enforce it using the--policy
option:Expected behavior
Liveliness change events are received and printed on screen.
Actual behavior
No liveliness change event is reported.
Additional information
The talker node in this demo is explicitly destroyed after some time. This issue does not show itself when using other RMW implementations like CycloneDDS, so I suspect that either the underlying DDS data writer is not getting destroyed or Fast-RTPS keeps asserting liveliness even after the DDS data writer is gone.
The text was updated successfully, but these errors were encountered: