Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to disable multicast in fastrtps #80

Closed
wants to merge 4 commits into from

Conversation

wjwwood
Copy link
Member

@wjwwood wjwwood commented Dec 14, 2016

So, based on the feedback from @codebot with respect to #79, I set out to disable multicast in fastrtps (or provide an option to do so). This pr is trying to do that, but along the way I ran into some serious issues and I'm not sure we should spend the time to merge this at this point.


My goal was to disable multicast to avoid the issue of disrupting your WiFi when running the image demo on your laptop due to lots of multicast data going nowhere. Also, my goal was do this while avoiding changes to Fast-RTPS so that we didn't have to maintain a patch for the beta release.

I was able to disable the multicast (with some duplication of code from Fast-RTPS), but the issue I ran into at that point was that the participant id's generated by Fast-RTPS are only unique within a single process. The issue with that is that the participant id is the most important factor used to calculate the unicast port for communication. Since the participant ids were duplicated within different processes, more than one process would try to use the same UDP port and one would silently fail to listen on that port. Combined with the silent failure to acquire the overlapping port in all processes, the issue was masked because data could always be delivered by the multicast communication. When I removed the multicast locator and disabled multicast for topics, the talker listener example stopped working. @codebot and I figured out that the port was the issue by looking at wireshark and then that duplicate participant ids were the root cause of the problem.

I was able to hack the talker and listener without multicast back to working by adding random numbers to the participant id. So I tried to figure out how to address this problem without modifying Fast-RTPS, and I thought I could address the issue by explicitly setting the participant id (which is possible) and ensuring those ids were unique for participants on the same computer. @codebot showed me how he did this in freertps by trying ports until he found a participant/port combination which were free. I wanted to do this, but because Fast-RTPS did not fail when the port was not available I could not easily do this without a race condition.

So I ended up trying to solve this by using boost::interprocess and that's what is in this pr. I use a named_mutex and a shared_memory_object containing the equivalent of a std::set. I used the set to keep track of which participant ids were in use and the mutex to protect access to the shared memory. There are some drawbacks to this approach and I only intended this for a short term solution until we could get a proper fix into Fast-RTPS, but I thought the issues would be manageable. However, once I started testing on the build farm I started running into bad_alloc's associated with the shared memory. So something is wrong with the implementation and I don't think I have time to address them before the beta. I just opened this pr for anyone who was interested in what I had come up with.

I think it's important to point out that without some kind of patch unicast pub/sub does not work between two processes on the same computer with Fast-RTPS at all. It works between different computers because they don't share a port number pool with each other, i.e. they are namespaced by ip address.


At this point I don't know what to do in order to address this issue of multicast interfering with WiFi even when running demos all on one computer. I'm open to suggestions as to what we could do between now and the beta which would address this problem. I think my other patch in #79 is still viable, but I guess it was deemed too intrusive at this late time.


This pr also uses ros2/rcl#97 and ros2/rmw#87 so that a "getenv" function is available at a lower level than rcl.

In this implementation, you could disable multicast by setting the env variable RMW_FASTRTPS_DISABLE_MULTICAST to any value. I could easily invert the logic if we wanted to.

this is required to fix unicast pub/sub on the
same computer
if the env variable RMW_FASTRTPS_DISABLE_MULTICAST
is defined, then it will be disabled
@wjwwood wjwwood added the in progress Actively being worked on (Kanban column) label Dec 14, 2016
@richiprosima
Copy link
Contributor

@wjwwood I will improve the usage of unicast without multicast.

Currently two processes using the same domainId try to open the same unicast ports. One of them is able to use the ports, but the other after the failure tries to open other unicast ports. While the discovery uses multicast, there is no problem because this information of which unicast ports are being used is sent to remote participants. But without multicast a participant only tries to send discovery data to the default unicast ports. I will try to improve that.

@wjwwood
Copy link
Member Author

wjwwood commented Dec 14, 2016

@richiprosima cool, that would be great. I'm not sure if we'll have time to integrate it or not into our upcoming Beta1, but I will try. I would have let you guys know in separate issues and sooner but we decided to try and address this issue really late in our cycle. I was planning on creating separate issues for each of these things on the fast-rtps repository, do you still want me to do this?

@wjwwood
Copy link
Member Author

wjwwood commented Dec 14, 2016

Also, if I can give some input on how to resolve the unique participant id issue, I'd recommend doing something like @codebot suggested, where you try each participant id in turn, i.e. 0, then 1, then 2, calculating the unicast port for each and trying to see if you can acquire it. If you can acquire it then you have the port and the participant id (you need to hold the port to prevent others from coming to the same conclusion) and if you cannot acquire it, then you just increase the participant id and try again.

My approach using boost interprocess is hacky and problematic, so I'd recommend avoiding a similar solution within Fast-RTPS.

@nkoenig
Copy link

nkoenig commented Feb 7, 2017

@wjwwood should do something about this.

@wjwwood
Copy link
Member Author

wjwwood commented Feb 7, 2017

@richiprosima any updates on this. I said previously:

I was planning on creating separate issues for each of these things on the fast-rtps repository, do you still want me to do this?

Do you want me to open issues still or just leave this one and let you handle it?

@richiprosima
Copy link
Contributor

@wjwwood Multicast is no longer default in user's subscribers since eProsima/Fast-DDS@d4d6474. I still have to check when builtin subscribers only use unicast. I will schedule it to next week.

@richiware
Copy link
Contributor

@wjwwood eProsima/Fast-DDS@5b12476 adds support for disabling all multicast traffic. How to do it is explained in our documentation.

@wjwwood
Copy link
Member Author

wjwwood commented Mar 8, 2017

Thanks @richiware I'll have a look at it as soon as possible.

@mikaelarguedas
Copy link
Member

to be tested by @wjwwood

@wjwwood
Copy link
Member Author

wjwwood commented Apr 6, 2017

@richiware I tested this by:

diff --git a/rmw_fastrtps_cpp/src/functions.cpp b/rmw_fastrtps_cpp/src/functions.cpp
index 14943ce..f099e17 100644
--- a/rmw_fastrtps_cpp/src/functions.cpp
+++ b/rmw_fastrtps_cpp/src/functions.cpp
@@ -645,6 +645,22 @@ rmw_node_t * rmw_create_node(const char * name, const char * namespace_, size_t
   participantParam.rtps.builtin.domainId = static_cast<uint32_t>(domain_id);
   participantParam.rtps.setName(name);
 
+  /* Start: Disabling Multicast */
+  // This is based on: http://eprosima-fast-rtps.readthedocs.io/en/latest/pubsub.html#tips
+
+  // Metatraffic Multicast Locator List will be empty.
+  // Metatraffic Unicast Locator List will contain one locator, with null address and null port.
+  // Then eProsima Fast RTPS will use all network interfaces to receive network messages using a well-known port.
+  Locator_t default_unicast_locator;
+  participantParam.rtps.builtin.metatrafficUnicastLocatorList.push_back(default_unicast_locator);
+
+  // Initial peer will be UDPv4 addresss 192.168.0.1. The port will be a well-known port.
+  // Initial discovery network messages will be sent to this UDPv4 address.
+  Locator_t initial_peer;
+  initial_peer.set_IP4_address(192, 168, 0, 1);
+  participantParam.rtps.builtin.initialPeersList.push_back(initial_peer);
+  /* End:   Disabling Multicast */
+
   participant = Domain::createParticipant(participantParam);
   if (!participant) {
     RMW_SET_ERROR_MSG("create_node() could not create participant");

With that patch, our basic talker/listener programs no longer work. Looking at wireshark there is indeed no multicast traffic, but that also seems to extend to the participant discovery. So I think this might be too aggressively disabling multicast traffic. I just wanted to disable multicast for data transmission.


The other issue, which I'm unable to test atm because discovery doesn't work, is related to the ports used by endpoints. The main part of this pull request was aimed at addressing this issue, which is that unicast endpoint port is calculated based on the participant ID, but the participant ID is only unique within a single process (it is a static counter in one of the classes), see:

In the last link, the variable m_maxRTPSParticipantID is the only thing used to generate (and basically make unique) the participant IDs. Since m_maxRTPSParticipantID is only global to the process, participant IDs are repeated in other processes on the same machine.

Therefore, because there is no variability in port number calculation between processes, the same unicast endpoint port number is arrived at multiple times. That means that two processes on the same machine would always try to use the same listening port, preventing one of them from listening and ultimately preventing communication.

I don't think this has been addressed at all yet, and until it is, it doesn't matter if we can turn off multicast because unicast won't work anyways.


So unfortunately this is not resolved in two ways. Please let me know if I can provide anymore help or details which might move this along.

@wjwwood
Copy link
Member Author

wjwwood commented Apr 6, 2017

Ok, so of course reading it closer, I had my IP set wrong. If I set my IP address correctly in the code talker/listener now work, without any multicast. So that's great!

However, I have two questions still:

  • How do I avoid needing to set an IP explicitly, or is that required?
  • How is the unicast port calculated in a way that is free from collisions since the participant ID is not unique across processes?

@richiware
Copy link
Contributor

Since eProsima/Fast-DDS@d4d6474, user's subscribers and publishers don't use multicast by default to send data.
Since eProsima/Fast-DDS@5b12476, a user can disable multicast completely as you tested.

  • How do I avoid needing to set an IP explicitly, or is that required?
    Removing multicast, the participant has to know where to send discovery messages. This is the reason you have to set the remote IP in initial peers

  • How is the unicast port calculated in a way that is free from collisions since the participant ID is not unique across processes?
    All participants across processes have to know the port where they will send discovery messages. This rule will calculate the same port for all of them. As unicast is being used, only one will be able to open this port. The rest of them will open consecutive ports. For example, if the rule calculate port 7410 and there are three participants, they will open 7410, 7412 and 7414.
    At this moment, by default, a participant will send discovery messages to the first four consecutive ports. In the example each participant will send discovery messages to ports 7410, 7412, 7414 and 7416. We want to make configurable this in the future. Currently a user only can have four participants (in diferrent processes) in the same machine, if he disabled all multicast.

Do I answer you?

@wjwwood
Copy link
Member Author

wjwwood commented Apr 6, 2017

Thanks for the quick response @richiware.

Since eProsima/Fast-DDS@d4d6474, user's subscribers and publishers don't use multicast by default to send data.

So, without any changes multicast data is off by default? That's sounds good, I'll double check that with wireshark asap.

Since eProsima/Fast-DDS@5b12476, a user can disable multicast completely as you tested.

Ah, so this is what I misunderstood. I thought I needed to do this to disable multicast for data, but this is to disable multicast for "metatraffic" as well, right?

I don't think we need that right now, but that's good to know we can do that if we need to in the future. It might also be useful to augment the multicast metatraffic with a unicast "initial peer" for talking with a daemon in our command line tools.

Removing multicast, the participant has to know where to send discovery messages. This is the reason you have to set the remote IP in initial peers

Right. I understand that now. But this is a moot point since I think we'd like to keep the multicast metatraffic right now.

All participants across processes have to know the port where they will send discovery messages. This rule will calculate the same port for all of them. As unicast is being used, only one will be able to open this port. The rest of them will open consecutive ports. For example, if the rule calculate port 7410 and there are three participants, they will open 7410, 7412 and 7414.

Right, this is how @codebot worked out the unicast port in his RTPS implementation for microcontrollers. Basically he would try to open ports until one was available.

At this moment, by default, a participant will send discovery messages to the first four consecutive ports. In the example each participant will send discovery messages to ports 7410, 7412, 7414 and 7416. We want to make configurable this in the future. Currently a user only can have four participants (in diferrent processes) in the same machine, if he disabled all multicast.

Right, but what you're describing here is for discovery only and doesn't happen (isn't necessary) when multicast metatraffic is used right?

I was wondering about the port selection process for unicast data traffic (i.e. not metatraffic or discovery). Are you using the same port for both discovery and data in this case?

Does the same limitation about the four participants on the same machine apply when using multicast for metatraffic and unicast for data (the default behavior right now I think)?

Do I answer you?

I believe so, but I think we might be talking about different things on the last point (metatraffic versus data traffic).

Thanks for the quick response!

@richiware
Copy link
Contributor

Ah, so this is what I misunderstood. I thought I needed to do this to disable multicast for data, but this is to disable multicast for "metatraffic" as well, right?

Yes, it's for removing metatraffic multicast.

Right, but what you're describing here is for discovery only and doesn't happen (isn't necessary) when multicast metatraffic is used right?

The mechanism I've described only affects when user sets the initial peers. In other case multicast is used to send participant announcements, instead of the initial peers. After reception of a remote participant announcement, the locators (IPs and ports) provided by the announcement are used to send metatraffic data and user data.

I was wondering about the port selection process for unicast data traffic (i.e. not metatraffic or discovery). Are you using the same port for both discovery and data in this case?

This link talks about that. Is it all information you need? Maybe we have to extend the explication.

Does the same limitation about the four participants on the same machine apply when using multicast for metatraffic and unicast for data (the default behavior right now I think)?

No. For multicast the same port can be used for several processes. In case of unicast for data, fastrtps tries to open a port. Then this port is sent in the participant announcement. The limitation only applies when you remove all multicast and the participant announcement mechanism has to rely in unicast.

@wjwwood
Copy link
Member Author

wjwwood commented Apr 7, 2017

I was wondering about the port selection process for unicast data traffic (i.e. not metatraffic or discovery). Are you using the same port for both discovery and data in this case?

This link talks about that. Is it all information you need? Maybe we have to extend the explication.

That's actually really helpful documentation, and yes that's what I'm talking about.

What's surprising to me is that you don't get issues with two processes on the same machine which each have a single participant. It seems that this algorithm:

User unicast | PB + DG * domainId + offsetd3 + PG * participantId

Would lead both participants to have the same ID and therefore try and use the same port.

I think this because PB, DG, offsetd3, and PG are all constants and domainId is shared between the two participants. The only thing left to make sure they get a different unicast port would be the participantId, but that seems to only be unique to the process, see my previous comment:

So in the case I described, I would expect both participants to have the same id, and therefore would try to use the same port.

This exact issue is something that @codebot had to work around by:

  • choosing a participant id
  • trying to open the port calculated with that id
  • if it opens, then your done
  • else increment the participant id and retry

I didn't see any code like that when reading through Fast-RTPS, however.

It seems to work, so it's getting addressed somehow it seems, but I just don't understand how atm. But my understanding isn't strictly required. I'll try to make sure this gets applied to our use case, but I'll have to wait until we start using Fast-RTPS master again, which I think is underway.

Thanks for all the insight!

@wjwwood
Copy link
Member Author

wjwwood commented Apr 13, 2017

I'm going to close this pr, since we're now using the latest Fast-RTPS.

@wjwwood wjwwood closed this Apr 13, 2017
@wjwwood wjwwood deleted the disable_multicast_fastrtps branch April 13, 2017 00:17
@wjwwood wjwwood removed the in progress Actively being worked on (Kanban column) label Apr 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants