Subscription and intra-process subscription counts are not updated in sync #668

cvasfi · 2019-03-26T17:56:09Z

Bug report

Intra-process subscription count of Publisher is not updated in sync with the subscription count when intra-process communication is used.
When the get_subscription_count() method returns the updated count, get_intra_process_subscription_count() may not return the updated count yet. This behaviour causes the unit tests in test_publisher_subscription_count_api to fail occasionally.

Required Info:

Operating System:
- Ubuntu 16.04
Installation type:
- Source
Version or commit hash:
- 8743bcb
DDS implementation:
- Fast-RTPS
Client library (if applicable):
- rclcpp

Steps to reproduce issue

Repeat the unit test test_publisher_subscription_count_api until failure. Though this may be hard to reproduce locally, here's an instance of such a failure: https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1323/testReport/(root)/projectroot/test_publisher_subscription_count_api/

Above test logs shows that the equality conditions on get_subscription_count() are satisfied yet the conditions on get_intra_process_subscription_count() are not met.

Expected behavior

get_subscription_count() and get_intra_process_subscription_count() methods should be in sync when intra-process communication is used.

Actual behavior

The methods return different results.

Additional information

The text was updated successfully, but these errors were encountered:

wjwwood · 2019-03-26T19:40:42Z

get_subscription_count() and get_intra_process_subscription_count() methods should be in sync when intra-process communication is used.

That's not possible, they are two separate subscriptions (two separate DataWriters under the hood if using DDS).

As currently stated, the expected behavior you put forward is not possible and would be a won't fix.

There might other ways to address the symptom ("This behaviour causes the unit tests in test_publisher_subscription_count_api to fail occasionally."), but making them update in sync is, in my opinion, not the correct way to fix the problem, nor is it possible to implement right now.

ivanpauno · 2019-03-27T12:16:04Z

I agree with @wjwwood, that syncing them is not a good idea.

But it shouldn't fail with get_intra_process_subscription_count, as this is not being asked to the underlying DDS "datawriter". It's obtained from the size of an internal map in the intraprocess manager, which is updated when creating the publisher and when deleting it.

rclcpp/rclcpp/include/rclcpp/intra_process_manager_impl.hpp

Lines 254 to 272 in b352d45

    
           size_t 
        
           get_subscription_count(uint64_t intra_process_publisher_id) const 
        
           { 
        
             auto publisher_it = publishers_.find(intra_process_publisher_id); 
        
             if (publisher_it == publishers_.end()) { 
        
               // Publisher is either invalid or no longer exists. 
        
               return 0; 
        
             } 
        
             auto publisher = publisher_it->second.publisher.lock(); 
        
             if (!publisher) { 
        
               throw std::runtime_error("publisher has unexpectedly gone out of scope"); 
        
             } 
        
             auto sub_map_it = subscription_ids_by_topic_.find(publisher->get_topic_name()); 
        
             if (sub_map_it == subscription_ids_by_topic_.end()) { 
        
               // No intraprocess subscribers 
        
               return 0; 
        
             } 
        
             return sub_map_it->second.size(); 
        
           }

I will check later if I can repro this, and see if I can understand why sometimes the count is not being updated.

wjwwood · 2019-03-27T21:28:49Z

@ivanpauno I assumed we were asking the underlying datawriter on the intra process topic. Based on what you said, my theory no longer makes sense to me...

I would appreciate it if you could look into @ivanpauno (considering you also wrote the test, you might have unique insight).

ivanpauno · 2019-03-28T12:25:28Z

@ivanpauno I assumed we were asking the underlying datawriter on the intra process topic.

We aren't doing that, because it would return a wrong count in the case of two process with publishers and subscriber in the same topic using intraprocess comm.

I would appreciate it if you could look into @ivanpauno (considering you also wrote the test, you might have unique insight).

I've already reproduced the issue. I'm trying to figure out what the problem is.

* Add tests for node_options usage * Add tests for copying options arguments * Add bad argument tests for wait sets * Add tests for wait_set_get_allocator function * Add test for rcl_take_response function * Change take_request to match take_response without info * Change specific test for sequence_number * Add tests for client take failed * Remove tests already done in nominal setup * Add bad arguments tests * Add test for init returning BAD_ALLOC * Add test for client get_options * Add test for already init client * Add bad argument tests * Add basic test get_default_domain_id * Add test for rmw to rcl return code function * Add test for get_localhost_only * Add tests for localhost expected usage * Address peer review comments * Add test for env variable leading to ULONG_MAX * Change return values to enum in test * Fix rcl_get_localhost_only return value * Address peer review comments * Add unexpected value to get_localhost * Add reset_rcl_error after expected fails Signed-off-by: Jorge Perez <jjperez@ekumenlabs.com>

cvasfi changed the title ~~Subscription count and intra-process subscription counts are not updated in sync~~ Subscription and intra-process subscription counts are not updated in sync Mar 26, 2019

ivanpauno self-assigned this Mar 28, 2019

ivanpauno added the in progress Actively being worked on (Kanban column) label Mar 28, 2019

ivanpauno mentioned this issue Mar 28, 2019

Replace const char * with a std::array<char, TOPIC_NAME_LENGTH> as the key of IPM IDTopicMap #671

Merged

ivanpauno added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Mar 28, 2019

ivanpauno closed this as completed in #671 Apr 4, 2019

ivanpauno removed the in review Waiting for review (Kanban column) label Apr 4, 2019

pjreed mentioned this issue Dec 16, 2019

Port ObstacleTransformer node to ROS2 swri-robotics/marti_common#559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subscription and intra-process subscription counts are not updated in sync #668

Subscription and intra-process subscription counts are not updated in sync #668

cvasfi commented Mar 26, 2019 •

edited

Loading

wjwwood commented Mar 26, 2019

ivanpauno commented Mar 27, 2019

wjwwood commented Mar 27, 2019

ivanpauno commented Mar 28, 2019

Subscription and intra-process subscription counts are not updated in sync #668

Subscription and intra-process subscription counts are not updated in sync #668

Comments

cvasfi commented Mar 26, 2019 • edited Loading

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

wjwwood commented Mar 26, 2019

ivanpauno commented Mar 27, 2019

wjwwood commented Mar 27, 2019

ivanpauno commented Mar 28, 2019

cvasfi commented Mar 26, 2019 •

edited

Loading