Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscription and intra-process subscription counts are not updated in sync #668

Closed
cvasfi opened this issue Mar 26, 2019 · 4 comments
Closed
Assignees

Comments

@cvasfi
Copy link

cvasfi commented Mar 26, 2019

Bug report

Intra-process subscription count of Publisher is not updated in sync with the subscription count when intra-process communication is used.
When the get_subscription_count() method returns the updated count, get_intra_process_subscription_count() may not return the updated count yet. This behaviour causes the unit tests in test_publisher_subscription_count_api to fail occasionally.

Required Info:

  • Operating System:
    • Ubuntu 16.04
  • Installation type:
    • Source
  • Version or commit hash:
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Repeat the unit test test_publisher_subscription_count_api until failure. Though this may be hard to reproduce locally, here's an instance of such a failure: https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/1323/testReport/(root)/projectroot/test_publisher_subscription_count_api/

Above test logs shows that the equality conditions on get_subscription_count() are satisfied yet the conditions on get_intra_process_subscription_count() are not met.

Expected behavior

get_subscription_count() and get_intra_process_subscription_count() methods should be in sync when intra-process communication is used.

Actual behavior

The methods return different results.

Additional information

@cvasfi cvasfi changed the title Subscription count and intra-process subscription counts are not updated in sync Subscription and intra-process subscription counts are not updated in sync Mar 26, 2019
@wjwwood
Copy link
Member

wjwwood commented Mar 26, 2019

get_subscription_count() and get_intra_process_subscription_count() methods should be in sync when intra-process communication is used.

That's not possible, they are two separate subscriptions (two separate DataWriters under the hood if using DDS).

As currently stated, the expected behavior you put forward is not possible and would be a won't fix.

There might other ways to address the symptom ("This behaviour causes the unit tests in test_publisher_subscription_count_api to fail occasionally."), but making them update in sync is, in my opinion, not the correct way to fix the problem, nor is it possible to implement right now.

@ivanpauno
Copy link
Member

I agree with @wjwwood, that syncing them is not a good idea.

But it shouldn't fail with get_intra_process_subscription_count, as this is not being asked to the underlying DDS "datawriter". It's obtained from the size of an internal map in the intraprocess manager, which is updated when creating the publisher and when deleting it.

size_t
get_subscription_count(uint64_t intra_process_publisher_id) const
{
auto publisher_it = publishers_.find(intra_process_publisher_id);
if (publisher_it == publishers_.end()) {
// Publisher is either invalid or no longer exists.
return 0;
}
auto publisher = publisher_it->second.publisher.lock();
if (!publisher) {
throw std::runtime_error("publisher has unexpectedly gone out of scope");
}
auto sub_map_it = subscription_ids_by_topic_.find(publisher->get_topic_name());
if (sub_map_it == subscription_ids_by_topic_.end()) {
// No intraprocess subscribers
return 0;
}
return sub_map_it->second.size();
}

I will check later if I can repro this, and see if I can understand why sometimes the count is not being updated.

@wjwwood
Copy link
Member

wjwwood commented Mar 27, 2019

@ivanpauno I assumed we were asking the underlying datawriter on the intra process topic. Based on what you said, my theory no longer makes sense to me...

I would appreciate it if you could look into @ivanpauno (considering you also wrote the test, you might have unique insight).

@ivanpauno
Copy link
Member

@ivanpauno I assumed we were asking the underlying datawriter on the intra process topic.

We aren't doing that, because it would return a wrong count in the case of two process with publishers and subscriber in the same topic using intraprocess comm.

I would appreciate it if you could look into @ivanpauno (considering you also wrote the test, you might have unique insight).

I've already reproduced the issue. I'm trying to figure out what the problem is.

@ivanpauno ivanpauno self-assigned this Mar 28, 2019
@ivanpauno ivanpauno added the in progress Actively being worked on (Kanban column) label Mar 28, 2019
@ivanpauno ivanpauno added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Mar 28, 2019
@ivanpauno ivanpauno removed the in review Waiting for review (Kanban column) label Apr 4, 2019
nnmm pushed a commit to ApexAI/rclcpp that referenced this issue Jul 9, 2022
* Add tests for node_options usage
* Add tests for copying options arguments
* Add bad argument tests for wait sets
* Add tests for wait_set_get_allocator function
* Add test for rcl_take_response function
* Change take_request to match take_response without info
* Change specific test for sequence_number
* Add tests for client take failed
* Remove tests already done in nominal setup
* Add bad arguments tests
* Add test for init returning BAD_ALLOC
* Add test for client get_options
* Add test for already init client
* Add bad argument tests
* Add basic test get_default_domain_id
* Add test for rmw to rcl return code function
* Add test for get_localhost_only
* Add tests for localhost expected usage
* Address peer review comments
* Add test for env variable leading to ULONG_MAX
* Change return values to enum in test
* Fix rcl_get_localhost_only return value
* Address peer review comments
* Add unexpected value to get_localhost
* Add reset_rcl_error after expected fails

Signed-off-by: Jorge Perez <jjperez@ekumenlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants