New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RS w/ UnicastOnly=off & ClientList triggers RA flood #63
Comments
If |
Hmm.. I can picture the code, and that makes sense, and it should be fixed. One of the problems is that the RA is randomly delayed, so once radvd gets an RS, it just schedules the RA to happen after a short delay, but doesn't remember the RS. So the info in the RS needs to be linked to the RA somehow so it known to whom it should send the RA. The schedule/delay queue will just need to store the source of the RS, but this is where it can get complicated. Currently, only one RA is scheduled. If 0, 1, or N RS's come in, still just one RA is scheduled. This is fine because the RA is multicast to the all-nodes group. So the RA timer queue will need to have one slot per client in the client list (I guess). Kinda just thinking out loud here, I'll look into the code more later when I get time. If you want to take a stab at fixing it, please do. :) |
See RFC 7772 section 2.1: https://tools.ietf.org/html/rfc7772#section-2.1 Suggested new behavior:
|
That does sound like a better idea. Just to clarify... Leave all-node multicast cycle alone? Unicast responses to RS's? |
Yes, leave the multicast cycle entirely alone. In the worst case, immediately after we send a complete set of unicast RA to a single client, we have our regularly scheduled multicast, which for this window aligns to schedule ALL packets. The overall objective here is that it should be feasible to INCREASE the multicast interval, but still get a near-immediate unicast response to RS, without flooding the network. |
Ok, this is going to be easiest to fix after RA-splitting is merged, because:
|
I'd like to add a +1 to this |
Yes, this is still an issue; I started a patchset for it, but got sidetracked by other work. The major issue was trying to decide on scheduling and tracking requesting nodes issues. The latter is a problem because it represents a potential DoS attack if we try to track all unicast clients (if they just spoof lots of different addresses, they can overwhelm radvd). |
RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Fixes: radvd-project#69 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
This PR represents the minimal fix. There are further improvements suggested in TODO for later, like deliberately deferring the unicast RA if we are getting close to the time of multicast RA; as well as forcing more multicast RAs when topology is changing. |
Hi, thanks for the quick reply. |
I can trigger it by having Linux restart an interface (libvirt instance of Ubuntu 14.04, patched quagga running on the bridge
In your prod environment, can you look for other RS and see how many of them do have SLLA set? If some OS doesn't properly set the SLLA option, it might be worthwhile to have a variant that doesn't require SLLA to be set, with a suitable warning that it would trigger multicast ND. |
Not able to test the full prod environment as it is a meeting/conference network, and the next conference is in just over a month :)
So it isn't universally supported, and at least the iPhone is definitely in the category of small battery-based device (actually, all the above devices are battery-powered!) Not sure why macOS sometimes sets SLLA and sometimes doesn't. Also, here are some observations from part of the IETF discussion of the draft that lead to RFC7772: Note that you mention an option with a warning that it would trigger multicast ND from the router - on the router w/radvd that I tested, the NS was sent by the router to a solicited-node multicast address, targeting the link-local address it is trying to resolve. This is still better than sending to all-nodes, as the solicited-node multicast can be discarded by the device radio in non-targetted devices, without waking up the CPU. Whereas the all-nodes RA must wake up the CPU. |
Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices. @spakka sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days. @reubenhwk are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option?
|
I think I missed the SLLA requirement dropping question. RADVD only set it
optionally. Why is it in question right now?
…On Thu, Mar 16, 2017 at 1:27 PM, Robin H. Johnson ***@***.***> wrote:
Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle.
Apple iOS devices, Network Manager and Windows do represent a LOT of
devices.
@spakka <https://github.com/spakka> sounds like we should file a bug for
upstream networkmanager to get RS right :-) as well. I don't know if
there's any good way to submit bugs for Windows/iOS these days.
@reubenhwk <https://github.com/reubenhwk> are you ok with just dropping
the SLLA requirement, or would you like respecting it to be a config option?
13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1
source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f
0x0000: 5254 001b 2f1f
13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override]
destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1
0x0000: 5254 00cc 65a1
13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16
hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s
13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1
source link-address option (1), length 8 (1): 52:54:00:cc:65:a1
0x0000: 5254 00cc 65a1
13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited]
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#63 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMTe3nRWZnzY_yQQTkghFlKInuWOVwHks5rmZsrgaJpZM4K3Hwq>
.
|
RFC7772 5.1.1 says to qualify for a unicast RA response, the RS needs to
@spakka tested and found that lots of clients don't set an SLLA option in the RS (including Windows, NetworkManager & iOS mobile devices). The language in the RFC does not say MUST contain an SLLA, so I propose to only have the unspecified address test. |
|
Ah. Got it. I'm ok with dropping the requirement.
…On Sat, Mar 18, 2017 at 2:51 PM, Robin H. Johnson ***@***.***> wrote:
- The PR is implemented with both of the requirements at the moment,
but could be changed to just unspecified easily.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#63 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAMTe6gKEtWK0UimiM8fOXduACZs6DBEks5rnFH1gaJpZM4K3Hwq>
.
|
RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. We do differ from the RFC in that we do not require the SLLA option to be set in the RS, as testing shows many clients are not setting it. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Fixes: radvd-project#69 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. We do differ from the RFC in that we do not require the SLLA option to be set in the RS, as testing shows many clients are not setting it. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Fixes: radvd-project#69 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Updated to drop the requirement. |
Many clients don't include SLLA option in router solicitations: radvd-project/radvd#63 (comment) Answer these router solicitations by triggering a neighbour solicitation when SLLA is not included, and then waiting for the neighbour advertisment.
Expected result:
If a
UnicastOnly=no
instance with a fixed client list gets a RS, it should only respond to that client.Actual result:
The instance sends RAs to every single client.
Effects:
Hosts that did not solicit the RA are flooded by RAs when not needed.
Host setup:
radvd.conf:
Test trigger:
debug=5 logging
Network capture:
The text was updated successfully, but these errors were encountered: