Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimachine communication #432

Merged
merged 16 commits into from
Feb 7, 2020
Merged

Multimachine communication #432

merged 16 commits into from
Feb 7, 2020

Conversation

claireyywang
Copy link
Contributor

@claireyywang claireyywang commented Jan 9, 2020

How about a verb called “yell” (or something more clever), where the program just spins and sends “hello-world” and it’s hostname once per second, over both a hard-coded ROS2/DDS topic and a “custom” multicast UDP packet as well. At the same time, it is listening for other messages on that same ROS2 topic and/or UDP multicast port and printing a summary table periodically of which other hosts it has heard from, message counts, and so on. This would help debug the current #1 problem we are seeing, where people have messaging working on localhost but can’t receive those messages on another host. This spins into a debug cycle of trying to figure out if the problem lies with their software, their network configuration, DDS discovery or transport bugs, the network infrastructure, and so on. Currently we debug this using known-good ros2 nodes such as examples_rclpy_minimal_* , wireshark, ping, ros2multicast, ros2node, and so on, but having a nicer way to do it would be great.

Run ros2 doctor call command to test pub/sub, multicast send/receive quality across multiple machines. The command outputs a summary table /1s.

MULTIMACHINE COMMUNICATION SUMMARY
Topic: knockknock, Published Msg Count: 20
Subscribed from:
                Hostname             Msg Count /2s
Multicast Group/Port: 225.0.0.1/49150, Sent Msg Count: 20
Received from:
                Hostname             Msg Count /2s
                Claires-MBP          3         
------------------------------------------------------------

Test cases will be submitted in a separate PR.

@claireyywang claireyywang self-assigned this Jan 9, 2020
@claireyywang claireyywang added the in progress Actively being worked on (Kanban column) label Jan 9, 2020
@claireyywang claireyywang added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Jan 16, 2020
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing the point, I assume we would want two different verbs for testing multi-machine communication: one for sending data (publish ROS messages, send multicast data, etc) and other for receiving. The implementation as it stands is sending/receiving data to/from the same process, which may bypass the network interface (e.g. intra-process pub/sub). (I missed the point)

Beside this, I've left some other minor comments and nitpicks.

ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
@claireyywang
Copy link
Contributor Author

@jacobperron The bypassing network part explained why I wasn't able to get pub/sub working on two machines when testing. Thanks for pointing it out! I copied the feature request in the PR description for completeness. Based on my understanding, it seems like they want one verb to do it all. What do you think? If that's not the best way to do it, I'd be happy to refactor the code and split it into send/pub and receive/sub.

@jacobperron
Copy link
Member

Thanks for updating the description, it's helpful to have the context! I think I missed the point originally; you can disregard my previous comment.

There shouldn't be any network bypassing happening if there are different hosts on the same ROS network. Double check that they are using the same ROS_DOMAIN_ID. I think we can make some improvements to the executor/send-receive loop too.

ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my minor comments below, it would also be nice to have a unit test that exercises the new verb.

ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/call.py Outdated Show resolved Hide resolved
claireyywang added 9 commits February 4, 2020 15:26
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
claireyywang added 4 commits February 5, 2020 12:28
…t lock. prefix variables, add infinite loop

Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Signed-off-by: claireyywang <clairewang@openrobotics.org>
Copy link
Member

@jacobperron jacobperron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just a couple minor comments.

ros2doctor/ros2doctor/verb/hello.py Outdated Show resolved Hide resolved
ros2doctor/ros2doctor/verb/hello.py Outdated Show resolved Hide resolved
ros2doctor/test/test_hello.py Outdated Show resolved Hide resolved
Signed-off-by: claireyywang <clairewang@openrobotics.org>
@claireyywang
Copy link
Contributor Author

claireyywang commented Feb 7, 2020

CI
Linux Build Status
Linux-aarch64 Build Status
osx Build Status
windows Build Status
windows-container Build Status

Signed-off-by: claireyywang <clairewang@openrobotics.org>
@claireyywang
Copy link
Contributor Author

Linux-aarch64: Build Status
windows-container: Build Status

cmd=['ros2', 'daemon', 'start'],
name='daemon-start',
on_exit=[
OpaqueFunction(function=lambda context: ready_fn())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is the old (soon-to-be deprecated) way of doing this. Use launch_testing.actions.ReadyToTest() instead.

Signed-off-by: claireyywang <clairewang@openrobotics.org>
@claireyywang
Copy link
Contributor Author

CI again
linux Build Status
linux-aarch64 Build Status
osx Build Status
windows Build Status
windows-container Build Status

@claireyywang claireyywang merged commit 9f43483 into master Feb 7, 2020
@delete-merged-branch delete-merged-branch bot deleted the claire/multimachine-comm branch February 7, 2020 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants