RS w/ UnicastOnly=off & ClientList triggers RA flood #63

robbat2 · 2016-11-19T01:46:50Z

Expected result:
If a UnicastOnly=no instance with a fixed client list gets a RS, it should only respond to that client.

Actual result:
The instance sends RAs to every single client.

Effects:
Hosts that did not solicit the RA are flooded by RAs when not needed.

Host setup:

modprobe dummy
ifconfig dummy0 up multicast
ip -6 addr add  fe80::1/64 dev dummy0
ip -6 addr add  fe80::2/64 dev dummy0

radvd.conf:

interface dummy0 {
  AdvSendAdvert on;
  AdvRASrcAddress { fe80::1; };
  route 2001:db8:f001::/64 { };
  UnicastOnly off;
  clients { fe80::2; fe80::3; fe80::4; fe80::5; fe80::6; fe80::7; fe80::8; };
};

Test trigger:

# rdisc6 dummy0
Soliciting ff02::2 (ff02::2) on dummy0...

Hop limit                 :           64 (      0x40)
Stateful address conf.    :           No
Stateful other conf.      :           No
Mobile home agent         :           No
Router preference         :       medium
Neighbor discovery proxy  :           No
Router lifetime           :         1800 (0x00000708) seconds
Reachable time            :  unspecified (0x00000000)
Retransmit time           :  unspecified (0x00000000)
 Route                    : 2001:db8:f001::/64
  Route preference        :       medium
  Route lifetime          :         1800 (0x00000708) seconds
 Source link-layer address: 96:29:9A:3F:1A:A6
 from fe80::1

debug=5 logging

[Nov 18 17:43:12] radvd (16547): dummy0 recvmsg len=8
[Nov 18 17:43:12] radvd (16547): dummy0 received a packet
[Nov 18 17:43:12] radvd (16547): dummy0 received RS from: fe80::2
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::8 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::7 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::6 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::5 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::4 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::3 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::2 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): dummy0 next scheduled RA in 496.716 second(s)
[Nov 18 17:43:12] radvd (16547): dummy0 processed an RS
[Nov 18 17:43:12] radvd (16547): polling for 496.716 second(s), next iface is dummy0
[Nov 18 17:43:12] radvd (16547): dummy0 recvmsg len=48
[Nov 18 17:43:12] radvd (16547): dummy0 received a packet
[Nov 18 17:43:12] radvd (16547): dummy0 received RA from: fe80::1
[Nov 18 17:43:12] radvd (16547): processed RA on dummy0

Network capture:

17:43:12.500909 IP6 fe80::2 > ff02::2: ICMP6, router solicitation, length 8
17:43:12.501121 IP6 fe80::1 > fe80::8: ICMP6, router advertisement, length 48
17:43:12.501170 IP6 fe80::1 > fe80::7: ICMP6, router advertisement, length 48
17:43:12.501201 IP6 fe80::1 > fe80::6: ICMP6, router advertisement, length 48
17:43:12.501233 IP6 fe80::1 > fe80::5: ICMP6, router advertisement, length 48
17:43:12.501265 IP6 fe80::1 > fe80::4: ICMP6, router advertisement, length 48
17:43:12.501313 IP6 fe80::1 > fe80::3: ICMP6, router advertisement, length 48
17:43:12.501343 IP6 fe80::1 > fe80::2: ICMP6, router advertisement, length 48

The text was updated successfully, but these errors were encountered:

robbat2 · 2016-11-19T01:48:01Z

If UnicastOnly=on is set, then the flood does not happen.

reubenhwk · 2016-11-19T06:51:30Z

Hmm.. I can picture the code, and that makes sense, and it should be fixed. One of the problems is that the RA is randomly delayed, so once radvd gets an RS, it just schedules the RA to happen after a short delay, but doesn't remember the RS. So the info in the RS needs to be linked to the RA somehow so it known to whom it should send the RA. The schedule/delay queue will just need to store the source of the RS, but this is where it can get complicated. Currently, only one RA is scheduled. If 0, 1, or N RS's come in, still just one RA is scheduled. This is fine because the RA is multicast to the all-nodes group. So the RA timer queue will need to have one slot per client in the client list (I guess).

Kinda just thinking out loud here, I'll look into the code more later when I get time. If you want to take a stab at fixing it, please do. :)

robbat2 · 2016-11-19T09:36:26Z

See RFC 7772 section 2.1: https://tools.ietf.org/html/rfc7772#section-2.1
In the case of a large network, this leads to a all-nodes RA every 3 seconds, which wakes up all of the devices on the network.

Suggested new behavior:

when the process gets a RS, always send an immediate unicast response to only that client, with the complete set of RA options.
Leave the all-nodes RA cycle alone (the new RA-splitting code has placeholders for scheduling RA options at different rates).

reubenhwk · 2016-11-19T10:17:33Z

That does sound like a better idea. Just to clarify...

Leave all-node multicast cycle alone? Unicast responses to RS's?

robbat2 · 2016-11-19T10:26:46Z

Yes, leave the multicast cycle entirely alone.

In the worst case, immediately after we send a complete set of unicast RA to a single client, we have our regularly scheduled multicast, which for this window aligns to schedule ALL packets.

The overall objective here is that it should be feasible to INCREASE the multicast interval, but still get a near-immediate unicast response to RS, without flooding the network.

robbat2 · 2016-11-19T20:56:19Z

Ok, this is going to be easiest to fix after RA-splitting is merged, because:

it's going to conflict horribly otherwise
in the case of no client list, we need an additional data structure of all requesting nodes, so we can rate-limit each node to one RS per $INTERVAL (thinking of just using the MIN_DELAY_BETWEEN_RAS/MIN_DELAY_BETWEEN_RAS_MIPv6 values).

spakka · 2017-03-15T16:01:47Z

I'd like to add a +1 to this
Having a periodic multicast advertisement, but unicast responses to RS, would help us also with RFC7772 issues.
It would be good to be able to do this without having to configure a list of unicast clients as well.
We have many battery-powered devices coming and going from a large wireless lan, and every time one connects and sends an RS, all the others get their radio woken up by the multicast RA reply.

robbat2 · 2017-03-15T19:35:18Z

Yes, this is still an issue; I started a patchset for it, but got sidetracked by other work.

The major issue was trying to decide on scheduling and tracking requesting nodes issues. The latter is a problem because it represents a potential DoS attack if we try to track all unicast clients (if they just spoof lots of different addresses, they can overwhelm radvd).

RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Fixes: radvd-project#69 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

robbat2 · 2017-03-16T00:06:13Z

@spakka can you read/test out PR #69?

robbat2 · 2017-03-16T00:07:47Z

This PR represents the minimal fix. There are further improvements suggested in TODO for later, like deliberately deferring the unicast RA if we are getting close to the time of multicast RA; as well as forcing more multicast RAs when topology is changing.

spakka · 2017-03-16T13:36:06Z

Hi, thanks for the quick reply.
I've reviewed the patch and it looks in compliance with the RFC. I tried to test it but wasn't able to generate an RS with the SLLA option, so i moved the rfc7772_unicast_response test outside the option block, to make it apply to all RSs, and it works great. Only thing is that if a unicast response is destined for a link-local address and there is no existing ND entry on the router (and the SLLA option hasn't been provided in the RS), then it triggers an ND/NA from the router to resolve the link-local address so it can form the reply RA.
Anyway basically yea it works well as long as the sender sets the SLLA option.
Thanks!

robbat2 · 2017-03-16T18:04:20Z

I can trigger it by having Linux restart an interface (libvirt instance of Ubuntu 14.04, patched quagga running on the bridge virbr0):

11:00:10.426779 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::5054:ff:feca:a591 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
	  source link-address option (1), length 8 (1): 52:54:00:ca:a5:91
	    0x0000:  5254 00ca a591
11:00:10.427420 IP6 (flowlabel 0xea489, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:feca:a591: [icmp6 sum ok] ICMP6, router advertisement, length 16
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s

In your prod environment, can you look for other RS and see how many of them do have SLLA set? If some OS doesn't properly set the SLLA option, it might be worthwhile to have a variant that doesn't require SLLA to be set, with a suitable warning that it would trigger multicast ND.

spakka · 2017-03-16T19:53:04Z

Not able to test the full prod environment as it is a meeting/conference network, and the next conference is in just over a month :)
But I tested with all the clients I have nearby, here are the results:

macOS 10.12.3 sierra - sends SLLA 2 times out of 4
iOS 10.2.1 (iPhone 5s) - no SLLA
Android 7.0 (sony xperia phone) - sends SLLA
Android 5.1 (nexus 7 tablet) - sends SLLA
Ubuntu 16.04.2 LTS w/ Network Manager, running 4.9 kernel (dell laptop) - no SLLA
Windows 10 (dell laptop) - no SLLA

So it isn't universally supported, and at least the iPhone is definitely in the category of small battery-based device (actually, all the above devices are battery-powered!) Not sure why macOS sometimes sets SLLA and sometimes doesn't.

Also, here are some observations from part of the IETF discussion of the draft that lead to RFC7772:
https://www.ietf.org/mail-archive/web/v6ops/current/msg22464.html

Note that you mention an option with a warning that it would trigger multicast ND from the router - on the router w/radvd that I tested, the NS was sent by the router to a solicited-node multicast address, targeting the link-local address it is trying to resolve. This is still better than sending to all-nodes, as the solicited-node multicast can be discarded by the device radio in non-targetted devices, without waking up the CPU. Whereas the all-nodes RA must wake up the CPU.

robbat2 · 2017-03-16T20:27:22Z

Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices.

@spakka sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days.

@reubenhwk are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option?

13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1
	  source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f
	    0x0000:  5254 001b 2f1f
13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override]
	  destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1
	    0x0000:  5254 00cc 65a1
13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s
13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1
	  source link-address option (1), length 8 (1): 52:54:00:cc:65:a1
	    0x0000:  5254 00cc 65a1
13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited]

reubenhwk · 2017-03-18T06:58:05Z

I think I missed the SLLA requirement dropping question. RADVD only set it optionally. Why is it in question right now?

…

On Thu, Mar 16, 2017 at 1:27 PM, Robin H. Johnson ***@***.***> wrote: Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices. @spakka <https://github.com/spakka> sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days. @reubenhwk <https://github.com/reubenhwk> are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option? 13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8 13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1 source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f 0x0000: 5254 001b 2f1f 13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override] destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1 0x0000: 5254 00cc 65a1 13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16 hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s 13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1 source link-address option (1), length 8 (1): 52:54:00:cc:65:a1 0x0000: 5254 00cc 65a1 13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMTe3nRWZnzY_yQQTkghFlKInuWOVwHks5rmZsrgaJpZM4K3Hwq> .

robbat2 · 2017-03-18T21:50:15Z

RFC7772 5.1.1 says to qualify for a unicast RA response, the RS needs to

not be from the unspecified address
contain a SLLA option.

@spakka tested and found that lots of clients don't set an SLLA option in the RS (including Windows, NetworkManager & iOS mobile devices).

The language in the RFC does not say MUST contain an SLLA, so I propose to only have the unspecified address test.

robbat2 · 2017-03-18T21:51:49Z

The PR is implemented with both of the requirements at the moment, but could be changed to just unspecified easily.

reubenhwk · 2017-03-18T22:03:24Z

Ah. Got it. I'm ok with dropping the requirement.

…

On Sat, Mar 18, 2017 at 2:51 PM, Robin H. Johnson ***@***.***> wrote: - The PR is implemented with both of the requirements at the moment, but could be changed to just unspecified easily. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMTe6gKEtWK0UimiM8fOXduACZs6DBEks5rnFH1gaJpZM4K3Hwq> .

RFC 7772 section 2.1 describes a network flooded with multicast RA at the maximum 3 second interval due a high client turnover. The mitigation described in RFC 7772 section 5.1.1 states that a unicast RA response can be sent, but SHOULD be a configurable option (AdvRASolicitedUnicast), and that networks containing tens or hundreds of battery-powered devices SHOULD enable the option. The new option is defaulted to on, as it has very few downsides, and represents significent battery life improvements for many clients. We do differ from the RFC in that we do not require the SLLA option to be set in the RS, as testing shows many clients are not setting it. See TODO for further possible improvements to AdvRASolicitedUnicast. Fixes: radvd-project#63 Fixes: radvd-project#69 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

robbat2 · 2017-03-19T17:15:49Z

Updated to drop the requirement.

Many clients don't include SLLA option in router solicitations: radvd-project/radvd#63 (comment) Answer these router solicitations by triggering a neighbour solicitation when SLLA is not included, and then waiting for the neighbour advertisment.

robbat2 changed the title ~~RS w/ UnicastOnly=off triggers RA flood~~ RS w/ UnicastOnly=off & ClienList triggers RA flood Nov 19, 2016

robbat2 changed the title ~~RS w/ UnicastOnly=off & ClienList triggers RA flood~~ RS w/ UnicastOnly=off & ClientList triggers RA flood Nov 19, 2016

robbat2 mentioned this issue Mar 16, 2017

AdvRASolicitedUnicast: unicast RA response to RS. #69

Merged

reubenhwk closed this as completed in #69 Jul 2, 2017

jhg8 mentioned this issue Jan 19, 2024

Support router solicitation without SLLA option jech/sroamd#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

robbat2 commented Nov 19, 2016

robbat2 commented Nov 19, 2016

reubenhwk commented Nov 19, 2016

robbat2 commented Nov 19, 2016

reubenhwk commented Nov 19, 2016

robbat2 commented Nov 19, 2016

robbat2 commented Nov 19, 2016

spakka commented Mar 15, 2017

robbat2 commented Mar 15, 2017

robbat2 commented Mar 16, 2017

robbat2 commented Mar 16, 2017

spakka commented Mar 16, 2017

robbat2 commented Mar 16, 2017

spakka commented Mar 16, 2017

robbat2 commented Mar 16, 2017

reubenhwk commented Mar 18, 2017 via email

robbat2 commented Mar 18, 2017

robbat2 commented Mar 18, 2017

reubenhwk commented Mar 18, 2017 via email

robbat2 commented Mar 19, 2017

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

Comments

robbat2 commented Nov 19, 2016

robbat2 commented Nov 19, 2016

reubenhwk commented Nov 19, 2016

robbat2 commented Nov 19, 2016

reubenhwk commented Nov 19, 2016

robbat2 commented Nov 19, 2016

robbat2 commented Nov 19, 2016

spakka commented Mar 15, 2017

robbat2 commented Mar 15, 2017

robbat2 commented Mar 16, 2017

robbat2 commented Mar 16, 2017

spakka commented Mar 16, 2017

robbat2 commented Mar 16, 2017

spakka commented Mar 16, 2017

robbat2 commented Mar 16, 2017

reubenhwk commented Mar 18, 2017 via email

robbat2 commented Mar 18, 2017

robbat2 commented Mar 18, 2017

reubenhwk commented Mar 18, 2017 via email

robbat2 commented Mar 19, 2017