Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove topic partitions #476

Closed
rohitsalem opened this issue Mar 28, 2018 · 12 comments
Closed

Remove topic partitions #476

rohitsalem opened this issue Mar 28, 2018 · 12 comments
Assignees

Comments

@rohitsalem
Copy link
Member

rohitsalem commented Mar 28, 2018

The issue is to track the removal of partitions usage in topics:

PR for rmw_fastrtps : ros2/rmw_fastrtps#192

PR for rmw_opensplice : ros2/rmw_opensplice#225

PR for rmw_connext : ros2/rmw_connext#285

PR for design doc : ros2/design#170

@dhood dhood added the in progress Actively being worked on (Kanban column) label Mar 29, 2018
@wjwwood wjwwood added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 3, 2018
@wjwwood
Copy link
Member

wjwwood commented Apr 3, 2018

@rohitsalem is looking for reviews on ros2/rmw_fastrtps#192 until the OpenSplice and Connext pr's are opened.

@mikaelarguedas
Copy link
Member

mikaelarguedas commented Apr 3, 2018

Not sure if there is some work by @wjwwood or @Karsten1987 to that effect, but the design document about ROS to DDS name mapping article will also need to be updated accordingly

@mikaelarguedas mikaelarguedas added in progress Actively being worked on (Kanban column) and removed in review Waiting for review (Kanban column) labels Apr 12, 2018
@rohitsalem rohitsalem added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 18, 2018
@Karsten1987 Karsten1987 changed the title Remove topic partions Remove topic partitions Apr 18, 2018
@Karsten1987
Copy link
Contributor

@rohitsalem Can you please embed the status of your latest CI here?

@rohitsalem
Copy link
Member Author

rohitsalem commented Apr 18, 2018

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@mikaelarguedas
Copy link
Member

Looks like this has been the victim of a new failure we have on windows recently: ros2/build_farmer#111

@rohitsalem
Copy link
Member Author

rohitsalem commented Apr 26, 2018

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@rohitsalem
Copy link
Member Author

@ros2/team this PRs are ready for review, Thanks!

@Karsten1987
Copy link
Contributor

CI (with all three rmw):

win: Build Status
osx: Build Status
linux: Build Status

@Karsten1987
Copy link
Contributor

new CI (rebased):

win: Build Status
osx: Build Status
linux: Build Status

@Karsten1987
Copy link
Contributor

So there are some (seemingly flaky) test failures between fastrtps and opensplice on windows.
I ran this job multiple times, where the test_communication fails between the two of them in one or two cases (clock and test_msgs/Empty).

as we are not running Opensplice continuously on our farm, we could technically merge it. I'll have a look at the error messages as soon as I can.

@Karsten1987
Copy link
Contributor

I can reproduce the behavior locally on my OSX machine as well when compiled in debug (I guess this is why the buildfarm didn't notice it).
From a little debugging, I see that submessages are getting received and processed (in fastrtps), but no subscriber is notified. My guess is that the submessages for the payload (serialized data) is interpreted differently by fastrtps, because the message is empty. I'll have to dig around a bit deeper to find out what's happening.

Only one combination fails: FastRTPS subscribes, Opensplice sends. Only for message type test_msgs/Empty. All other combinations work just fine.

@Karsten1987
Copy link
Contributor

update:

When increasing the publishing frequency from 10 Hz to 1 Hz the messages are getting transmitted correctly. Ways for me to reproduce the behavior:

1.) Start subscriber with FastRTPS on topic /ns/test with message type test_msgs/Empty.
2.a) ros2 topic pub test_msgs/Empty /ns/test (-r 1) --> will eventually communicate
2.b) ros2 topic pub test_msgs/Empty /ns/test -r 100 --> will not go through

When applying 2.b it seems like OpenSplice so-to-say gives up and the message eventually goes through.

========================================================================================
Report      : WARNING
Date        : Tue May 29 17:37:30 PDT 2018
Description : writer 473ff4c3:20c:1:303 topic rt/ns/test/message/Empty considering reader 10f01ea:63000100:0:e04 non-responsive

Node        : karsten-macbook-pro
Process     : Python <65636>
Thread      : xmit.user 700007580000
Internals   : 6.7.180404OSS/de8c6f4f/de8c6f4f/ddsi2/q_transmit.c/822/0/1527640650.846078000/71

After debugging a fair amount of variations with @dhood, the tests will constantly pass when putting a sleep on the publisher side (OpenSplice) between advertising the publisher and starting the actual publishing process. That is, we can make sure that the publisher got correctly matched on fastrtps side.

When publishing with a higher frequency, it seems fastrtps gets confused and can't handle the amount of incoming package while not having setup the subscriber correctly.
Equivalently, the test passes when putting the QoS settings to best effort as OpenSplice simply fires-and-forgets the messages and FastRTPS presumably is able to handle the incoming messages.

So it looks to me as if they are two issues coming together here. The first one not being able to send messages if the publisher and subscriber aren't correctly set up and thus somewhat synchronized.
The second problem I see is that FastRTPS can't handle to many incoming messages when sub/pub are not yet matched.

I don't know how to fix that neither am I sure that this is really the case, but it simply looks like OpenSplice sets things up way too fast for FastRTPS to handle. It is correct to say though that having the API to wait for publisher and subscribers to have matched is a way to get these tests to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants