-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to disable multicast in fastrtps #80
Conversation
this is required to fix unicast pub/sub on the same computer
if the env variable RMW_FASTRTPS_DISABLE_MULTICAST is defined, then it will be disabled
@wjwwood I will improve the usage of unicast without multicast. Currently two processes using the same domainId try to open the same unicast ports. One of them is able to use the ports, but the other after the failure tries to open other unicast ports. While the discovery uses multicast, there is no problem because this information of which unicast ports are being used is sent to remote participants. But without multicast a participant only tries to send discovery data to the default unicast ports. I will try to improve that. |
@richiprosima cool, that would be great. I'm not sure if we'll have time to integrate it or not into our upcoming Beta1, but I will try. I would have let you guys know in separate issues and sooner but we decided to try and address this issue really late in our cycle. I was planning on creating separate issues for each of these things on the fast-rtps repository, do you still want me to do this? |
Also, if I can give some input on how to resolve the unique participant id issue, I'd recommend doing something like @codebot suggested, where you try each participant id in turn, i.e. 0, then 1, then 2, calculating the unicast port for each and trying to see if you can acquire it. If you can acquire it then you have the port and the participant id (you need to hold the port to prevent others from coming to the same conclusion) and if you cannot acquire it, then you just increase the participant id and try again. My approach using boost interprocess is hacky and problematic, so I'd recommend avoiding a similar solution within Fast-RTPS. |
@wjwwood should do something about this. |
@richiprosima any updates on this. I said previously:
Do you want me to open issues still or just leave this one and let you handle it? |
@wjwwood Multicast is no longer default in user's subscribers since eProsima/Fast-DDS@d4d6474. I still have to check when builtin subscribers only use unicast. I will schedule it to next week. |
@wjwwood eProsima/Fast-DDS@5b12476 adds support for disabling all multicast traffic. How to do it is explained in our documentation. |
Thanks @richiware I'll have a look at it as soon as possible. |
to be tested by @wjwwood |
@richiware I tested this by:
diff --git a/rmw_fastrtps_cpp/src/functions.cpp b/rmw_fastrtps_cpp/src/functions.cpp
index 14943ce..f099e17 100644
--- a/rmw_fastrtps_cpp/src/functions.cpp
+++ b/rmw_fastrtps_cpp/src/functions.cpp
@@ -645,6 +645,22 @@ rmw_node_t * rmw_create_node(const char * name, const char * namespace_, size_t
participantParam.rtps.builtin.domainId = static_cast<uint32_t>(domain_id);
participantParam.rtps.setName(name);
+ /* Start: Disabling Multicast */
+ // This is based on: http://eprosima-fast-rtps.readthedocs.io/en/latest/pubsub.html#tips
+
+ // Metatraffic Multicast Locator List will be empty.
+ // Metatraffic Unicast Locator List will contain one locator, with null address and null port.
+ // Then eProsima Fast RTPS will use all network interfaces to receive network messages using a well-known port.
+ Locator_t default_unicast_locator;
+ participantParam.rtps.builtin.metatrafficUnicastLocatorList.push_back(default_unicast_locator);
+
+ // Initial peer will be UDPv4 addresss 192.168.0.1. The port will be a well-known port.
+ // Initial discovery network messages will be sent to this UDPv4 address.
+ Locator_t initial_peer;
+ initial_peer.set_IP4_address(192, 168, 0, 1);
+ participantParam.rtps.builtin.initialPeersList.push_back(initial_peer);
+ /* End: Disabling Multicast */
+
participant = Domain::createParticipant(participantParam);
if (!participant) {
RMW_SET_ERROR_MSG("create_node() could not create participant");
With that patch, our basic talker/listener programs no longer work. Looking at wireshark there is indeed no multicast traffic, but that also seems to extend to the participant discovery. So I think this might be too aggressively disabling multicast traffic. I just wanted to disable multicast for data transmission. The other issue, which I'm unable to test atm because discovery doesn't work, is related to the ports used by endpoints. The main part of this pull request was aimed at addressing this issue, which is that unicast endpoint port is calculated based on the participant ID, but the participant ID is only unique within a single process (it is a static counter in one of the classes), see:
In the last link, the variable Therefore, because there is no variability in port number calculation between processes, the same unicast endpoint port number is arrived at multiple times. That means that two processes on the same machine would always try to use the same listening port, preventing one of them from listening and ultimately preventing communication. I don't think this has been addressed at all yet, and until it is, it doesn't matter if we can turn off multicast because unicast won't work anyways. So unfortunately this is not resolved in two ways. Please let me know if I can provide anymore help or details which might move this along. |
Ok, so of course reading it closer, I had my IP set wrong. If I set my IP address correctly in the code talker/listener now work, without any multicast. So that's great! However, I have two questions still:
|
Since eProsima/Fast-DDS@d4d6474, user's subscribers and publishers don't use multicast by default to send data.
Do I answer you? |
Thanks for the quick response @richiware.
So, without any changes multicast data is off by default? That's sounds good, I'll double check that with wireshark asap.
Ah, so this is what I misunderstood. I thought I needed to do this to disable multicast for data, but this is to disable multicast for "metatraffic" as well, right? I don't think we need that right now, but that's good to know we can do that if we need to in the future. It might also be useful to augment the multicast metatraffic with a unicast "initial peer" for talking with a daemon in our command line tools.
Right. I understand that now. But this is a moot point since I think we'd like to keep the multicast metatraffic right now.
Right, this is how @codebot worked out the unicast port in his RTPS implementation for microcontrollers. Basically he would try to open ports until one was available.
Right, but what you're describing here is for discovery only and doesn't happen (isn't necessary) when multicast metatraffic is used right? I was wondering about the port selection process for unicast data traffic (i.e. not metatraffic or discovery). Are you using the same port for both discovery and data in this case? Does the same limitation about the four participants on the same machine apply when using multicast for metatraffic and unicast for data (the default behavior right now I think)?
I believe so, but I think we might be talking about different things on the last point (metatraffic versus data traffic). Thanks for the quick response! |
Yes, it's for removing metatraffic multicast.
The mechanism I've described only affects when user sets the initial peers. In other case multicast is used to send participant announcements, instead of the initial peers. After reception of a remote participant announcement, the locators (IPs and ports) provided by the announcement are used to send metatraffic data and user data.
This link talks about that. Is it all information you need? Maybe we have to extend the explication.
No. For multicast the same port can be used for several processes. In case of unicast for data, fastrtps tries to open a port. Then this port is sent in the participant announcement. The limitation only applies when you remove all multicast and the participant announcement mechanism has to rely in unicast. |
That's actually really helpful documentation, and yes that's what I'm talking about. What's surprising to me is that you don't get issues with two processes on the same machine which each have a single participant. It seems that this algorithm:
Would lead both participants to have the same ID and therefore try and use the same port. I think this because
So in the case I described, I would expect both participants to have the same id, and therefore would try to use the same port. This exact issue is something that @codebot had to work around by:
I didn't see any code like that when reading through Fast-RTPS, however. It seems to work, so it's getting addressed somehow it seems, but I just don't understand how atm. But my understanding isn't strictly required. I'll try to make sure this gets applied to our use case, but I'll have to wait until we start using Fast-RTPS master again, which I think is underway. Thanks for all the insight! |
I'm going to close this pr, since we're now using the latest Fast-RTPS. |
So, based on the feedback from @codebot with respect to #79, I set out to disable multicast in fastrtps (or provide an option to do so). This pr is trying to do that, but along the way I ran into some serious issues and I'm not sure we should spend the time to merge this at this point.
My goal was to disable multicast to avoid the issue of disrupting your WiFi when running the image demo on your laptop due to lots of multicast data going nowhere. Also, my goal was do this while avoiding changes to Fast-RTPS so that we didn't have to maintain a patch for the beta release.
I was able to disable the multicast (with some duplication of code from Fast-RTPS), but the issue I ran into at that point was that the participant id's generated by Fast-RTPS are only unique within a single process. The issue with that is that the participant id is the most important factor used to calculate the unicast port for communication. Since the participant ids were duplicated within different processes, more than one process would try to use the same UDP port and one would silently fail to listen on that port. Combined with the silent failure to acquire the overlapping port in all processes, the issue was masked because data could always be delivered by the multicast communication. When I removed the multicast locator and disabled multicast for topics, the talker listener example stopped working. @codebot and I figured out that the port was the issue by looking at wireshark and then that duplicate participant ids were the root cause of the problem.
I was able to hack the talker and listener without multicast back to working by adding random numbers to the participant id. So I tried to figure out how to address this problem without modifying Fast-RTPS, and I thought I could address the issue by explicitly setting the participant id (which is possible) and ensuring those ids were unique for participants on the same computer. @codebot showed me how he did this in freertps by trying ports until he found a participant/port combination which were free. I wanted to do this, but because Fast-RTPS did not fail when the port was not available I could not easily do this without a race condition.
So I ended up trying to solve this by using
boost::interprocess
and that's what is in this pr. I use anamed_mutex
and ashared_memory_object
containing the equivalent of astd::set
. I used the set to keep track of which participant ids were in use and the mutex to protect access to the shared memory. There are some drawbacks to this approach and I only intended this for a short term solution until we could get a proper fix into Fast-RTPS, but I thought the issues would be manageable. However, once I started testing on the build farm I started running into bad_alloc's associated with the shared memory. So something is wrong with the implementation and I don't think I have time to address them before the beta. I just opened this pr for anyone who was interested in what I had come up with.I think it's important to point out that without some kind of patch unicast pub/sub does not work between two processes on the same computer with Fast-RTPS at all. It works between different computers because they don't share a port number pool with each other, i.e. they are namespaced by ip address.
At this point I don't know what to do in order to address this issue of multicast interfering with WiFi even when running demos all on one computer. I'm open to suggestions as to what we could do between now and the beta which would address this problem. I think my other patch in #79 is still viable, but I guess it was deemed too intrusive at this late time.
This pr also uses ros2/rcl#97 and ros2/rmw#87 so that a "getenv" function is available at a lower level than
rcl
.In this implementation, you could disable multicast by setting the env variable
RMW_FASTRTPS_DISABLE_MULTICAST
to any value. I could easily invert the logic if we wanted to.