Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

Closed
robbat2 opened this issue Nov 19, 2016 · 19 comments
Closed

RS w/ UnicastOnly=off & ClientList triggers RA flood #63

robbat2 opened this issue Nov 19, 2016 · 19 comments

Comments

@robbat2
Copy link
Member

robbat2 commented Nov 19, 2016

Expected result:
If a UnicastOnly=no instance with a fixed client list gets a RS, it should only respond to that client.

Actual result:
The instance sends RAs to every single client.

Effects:
Hosts that did not solicit the RA are flooded by RAs when not needed.

Host setup:

modprobe dummy
ifconfig dummy0 up multicast
ip -6 addr add  fe80::1/64 dev dummy0
ip -6 addr add  fe80::2/64 dev dummy0

radvd.conf:

interface dummy0 {
  AdvSendAdvert on;
  AdvRASrcAddress { fe80::1; };
  route 2001:db8:f001::/64 { };
  UnicastOnly off;
  clients { fe80::2; fe80::3; fe80::4; fe80::5; fe80::6; fe80::7; fe80::8; };
};

Test trigger:

# rdisc6 dummy0
Soliciting ff02::2 (ff02::2) on dummy0...

Hop limit                 :           64 (      0x40)
Stateful address conf.    :           No
Stateful other conf.      :           No
Mobile home agent         :           No
Router preference         :       medium
Neighbor discovery proxy  :           No
Router lifetime           :         1800 (0x00000708) seconds
Reachable time            :  unspecified (0x00000000)
Retransmit time           :  unspecified (0x00000000)
 Route                    : 2001:db8:f001::/64
  Route preference        :       medium
  Route lifetime          :         1800 (0x00000708) seconds
 Source link-layer address: 96:29:9A:3F:1A:A6
 from fe80::1

debug=5 logging

[Nov 18 17:43:12] radvd (16547): dummy0 recvmsg len=8
[Nov 18 17:43:12] radvd (16547): dummy0 received a packet
[Nov 18 17:43:12] radvd (16547): dummy0 received RS from: fe80::2
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::8 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::7 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::6 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::5 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::4 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::3 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): sending RA to fe80::2 on dummy0 (fe80::1)
[Nov 18 17:43:12] radvd (16547): dummy0 next scheduled RA in 496.716 second(s)
[Nov 18 17:43:12] radvd (16547): dummy0 processed an RS
[Nov 18 17:43:12] radvd (16547): polling for 496.716 second(s), next iface is dummy0
[Nov 18 17:43:12] radvd (16547): dummy0 recvmsg len=48
[Nov 18 17:43:12] radvd (16547): dummy0 received a packet
[Nov 18 17:43:12] radvd (16547): dummy0 received RA from: fe80::1
[Nov 18 17:43:12] radvd (16547): processed RA on dummy0

Network capture:

17:43:12.500909 IP6 fe80::2 > ff02::2: ICMP6, router solicitation, length 8
17:43:12.501121 IP6 fe80::1 > fe80::8: ICMP6, router advertisement, length 48
17:43:12.501170 IP6 fe80::1 > fe80::7: ICMP6, router advertisement, length 48
17:43:12.501201 IP6 fe80::1 > fe80::6: ICMP6, router advertisement, length 48
17:43:12.501233 IP6 fe80::1 > fe80::5: ICMP6, router advertisement, length 48
17:43:12.501265 IP6 fe80::1 > fe80::4: ICMP6, router advertisement, length 48
17:43:12.501313 IP6 fe80::1 > fe80::3: ICMP6, router advertisement, length 48
17:43:12.501343 IP6 fe80::1 > fe80::2: ICMP6, router advertisement, length 48
@robbat2 robbat2 changed the title RS w/ UnicastOnly=off triggers RA flood RS w/ UnicastOnly=off & ClienList triggers RA flood Nov 19, 2016
@robbat2 robbat2 changed the title RS w/ UnicastOnly=off & ClienList triggers RA flood RS w/ UnicastOnly=off & ClientList triggers RA flood Nov 19, 2016
@robbat2
Copy link
Member Author

robbat2 commented Nov 19, 2016

If UnicastOnly=on is set, then the flood does not happen.

@reubenhwk
Copy link
Collaborator

Hmm.. I can picture the code, and that makes sense, and it should be fixed. One of the problems is that the RA is randomly delayed, so once radvd gets an RS, it just schedules the RA to happen after a short delay, but doesn't remember the RS. So the info in the RS needs to be linked to the RA somehow so it known to whom it should send the RA. The schedule/delay queue will just need to store the source of the RS, but this is where it can get complicated. Currently, only one RA is scheduled. If 0, 1, or N RS's come in, still just one RA is scheduled. This is fine because the RA is multicast to the all-nodes group. So the RA timer queue will need to have one slot per client in the client list (I guess).

Kinda just thinking out loud here, I'll look into the code more later when I get time. If you want to take a stab at fixing it, please do. :)

@robbat2
Copy link
Member Author

robbat2 commented Nov 19, 2016

See RFC 7772 section 2.1: https://tools.ietf.org/html/rfc7772#section-2.1
In the case of a large network, this leads to a all-nodes RA every 3 seconds, which wakes up all of the devices on the network.

Suggested new behavior:

  • when the process gets a RS, always send an immediate unicast response to only that client, with the complete set of RA options.
  • Leave the all-nodes RA cycle alone (the new RA-splitting code has placeholders for scheduling RA options at different rates).

@reubenhwk
Copy link
Collaborator

That does sound like a better idea. Just to clarify...

Leave all-node multicast cycle alone? Unicast responses to RS's?

@robbat2
Copy link
Member Author

robbat2 commented Nov 19, 2016

Yes, leave the multicast cycle entirely alone.

In the worst case, immediately after we send a complete set of unicast RA to a single client, we have our regularly scheduled multicast, which for this window aligns to schedule ALL packets.

The overall objective here is that it should be feasible to INCREASE the multicast interval, but still get a near-immediate unicast response to RS, without flooding the network.

@robbat2
Copy link
Member Author

robbat2 commented Nov 19, 2016

Ok, this is going to be easiest to fix after RA-splitting is merged, because:

  • it's going to conflict horribly otherwise
  • in the case of no client list, we need an additional data structure of all requesting nodes, so we can rate-limit each node to one RS per $INTERVAL (thinking of just using the MIN_DELAY_BETWEEN_RAS/MIN_DELAY_BETWEEN_RAS_MIPv6 values).

@spakka
Copy link

spakka commented Mar 15, 2017

I'd like to add a +1 to this
Having a periodic multicast advertisement, but unicast responses to RS, would help us also with RFC7772 issues.
It would be good to be able to do this without having to configure a list of unicast clients as well.
We have many battery-powered devices coming and going from a large wireless lan, and every time one connects and sends an RS, all the others get their radio woken up by the multicast RA reply.

@robbat2
Copy link
Member Author

robbat2 commented Mar 15, 2017

Yes, this is still an issue; I started a patchset for it, but got sidetracked by other work.

The major issue was trying to decide on scheduling and tracking requesting nodes issues. The latter is a problem because it represents a potential DoS attack if we try to track all unicast clients (if they just spoof lots of different addresses, they can overwhelm radvd).

robbat2 added a commit to robbat2/radvd that referenced this issue Mar 16, 2017
RFC 7772 section 2.1 describes a network flooded with multicast RA at
the maximum 3 second interval due a high client turnover.

The mitigation described in RFC 7772 section 5.1.1 states that a unicast
RA response can be sent, but SHOULD be a configurable option
(AdvRASolicitedUnicast), and that networks containing tens or hundreds
of battery-powered devices SHOULD enable the option.

The new option is defaulted to on, as it has very few downsides, and
represents significent battery life improvements for many clients.

See TODO for further possible improvements to AdvRASolicitedUnicast.

Fixes: radvd-project#63
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
robbat2 added a commit to robbat2/radvd that referenced this issue Mar 16, 2017
RFC 7772 section 2.1 describes a network flooded with multicast RA at
the maximum 3 second interval due a high client turnover.

The mitigation described in RFC 7772 section 5.1.1 states that a unicast
RA response can be sent, but SHOULD be a configurable option
(AdvRASolicitedUnicast), and that networks containing tens or hundreds
of battery-powered devices SHOULD enable the option.

The new option is defaulted to on, as it has very few downsides, and
represents significent battery life improvements for many clients.

See TODO for further possible improvements to AdvRASolicitedUnicast.

Fixes: radvd-project#63
Fixes: radvd-project#69
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
@robbat2
Copy link
Member Author

robbat2 commented Mar 16, 2017

@spakka can you read/test out PR #69?

@robbat2
Copy link
Member Author

robbat2 commented Mar 16, 2017

This PR represents the minimal fix. There are further improvements suggested in TODO for later, like deliberately deferring the unicast RA if we are getting close to the time of multicast RA; as well as forcing more multicast RAs when topology is changing.

@spakka
Copy link

spakka commented Mar 16, 2017

Hi, thanks for the quick reply.
I've reviewed the patch and it looks in compliance with the RFC. I tried to test it but wasn't able to generate an RS with the SLLA option, so i moved the rfc7772_unicast_response test outside the option block, to make it apply to all RSs, and it works great. Only thing is that if a unicast response is destined for a link-local address and there is no existing ND entry on the router (and the SLLA option hasn't been provided in the RS), then it triggers an ND/NA from the router to resolve the link-local address so it can form the reply RA.
Anyway basically yea it works well as long as the sender sets the SLLA option.
Thanks!

@robbat2
Copy link
Member Author

robbat2 commented Mar 16, 2017

I can trigger it by having Linux restart an interface (libvirt instance of Ubuntu 14.04, patched quagga running on the bridge virbr0):

11:00:10.426779 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::5054:ff:feca:a591 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
	  source link-address option (1), length 8 (1): 52:54:00:ca:a5:91
	    0x0000:  5254 00ca a591
11:00:10.427420 IP6 (flowlabel 0xea489, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:feca:a591: [icmp6 sum ok] ICMP6, router advertisement, length 16
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s

In your prod environment, can you look for other RS and see how many of them do have SLLA set? If some OS doesn't properly set the SLLA option, it might be worthwhile to have a variant that doesn't require SLLA to be set, with a suitable warning that it would trigger multicast ND.

@spakka
Copy link

spakka commented Mar 16, 2017

Not able to test the full prod environment as it is a meeting/conference network, and the next conference is in just over a month :)
But I tested with all the clients I have nearby, here are the results:

macOS 10.12.3 sierra - sends SLLA 2 times out of 4
iOS 10.2.1 (iPhone 5s) - no SLLA
Android 7.0 (sony xperia phone) - sends SLLA
Android 5.1 (nexus 7 tablet) - sends SLLA
Ubuntu 16.04.2 LTS w/ Network Manager, running 4.9 kernel (dell laptop) - no SLLA
Windows 10 (dell laptop) - no SLLA

So it isn't universally supported, and at least the iPhone is definitely in the category of small battery-based device (actually, all the above devices are battery-powered!) Not sure why macOS sometimes sets SLLA and sometimes doesn't.

Also, here are some observations from part of the IETF discussion of the draft that lead to RFC7772:
https://www.ietf.org/mail-archive/web/v6ops/current/msg22464.html

Note that you mention an option with a warning that it would trigger multicast ND from the router - on the router w/radvd that I tested, the NS was sent by the router to a solicited-node multicast address, targeting the link-local address it is trying to resolve. This is still better than sending to all-nodes, as the solicited-node multicast can be discarded by the device radio in non-targetted devices, without waking up the CPU. Whereas the all-nodes RA must wake up the CPU.

@robbat2
Copy link
Member Author

robbat2 commented Mar 16, 2017

Ok, let's drop the SLLA requirement then, and trigger the NS/NA cycle. Apple iOS devices, Network Manager and Windows do represent a LOT of devices.

@spakka sounds like we should file a bug for upstream networkmanager to get RS right :-) as well. I don't know if there's any good way to submit bugs for Windows/iOS these days.

@reubenhwk are you ok with just dropping the SLLA requirement, or would you like respecting it to be a config option?

13:23:14.314745 52:54:00:cc:65:a1 > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 62: (hlim 255, next-header ICMPv6 (58) payload length: 8) fe80::5054:ff:fecc:65a1 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 8
13:23:14.315044 52:54:00:1b:2f:1f > 33:33:ff:cc:65:a1, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::1 > ff02::1:ffcc:65a1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::5054:ff:fecc:65a1
	  source link-address option (1), length 8 (1): 52:54:00:1b:2f:1f
	    0x0000:  5254 001b 2f1f
13:23:14.315300 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::5054:ff:fecc:65a1, Flags [solicited, override]
	  destination link-address option (2), length 8 (1): 52:54:00:cc:65:a1
	    0x0000:  5254 00cc 65a1
13:23:14.315318 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 70: (flowlabel 0xcb1e2, hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, router advertisement, length 16
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 1800s, reachable time 10000s, retrans time 5000s
13:23:19.319537 52:54:00:cc:65:a1 > 52:54:00:1b:2f:1f, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5054:ff:fecc:65a1 > fe80::1: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::1
	  source link-address option (1), length 8 (1): 52:54:00:cc:65:a1
	    0x0000:  5254 00cc 65a1
13:23:19.319566 52:54:00:1b:2f:1f > 52:54:00:cc:65:a1, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::1 > fe80::5054:ff:fecc:65a1: [icmp6 sum ok] ICMP6, neighbor advertisement, length 24, tgt is fe80::1, Flags [solicited]

@reubenhwk
Copy link
Collaborator

reubenhwk commented Mar 18, 2017 via email

@robbat2
Copy link
Member Author

robbat2 commented Mar 18, 2017

RFC7772 5.1.1 says to qualify for a unicast RA response, the RS needs to

  1. not be from the unspecified address
  2. contain a SLLA option.

@spakka tested and found that lots of clients don't set an SLLA option in the RS (including Windows, NetworkManager & iOS mobile devices).

The language in the RFC does not say MUST contain an SLLA, so I propose to only have the unspecified address test.

@robbat2
Copy link
Member Author

robbat2 commented Mar 18, 2017

  • The PR is implemented with both of the requirements at the moment, but could be changed to just unspecified easily.

@reubenhwk
Copy link
Collaborator

reubenhwk commented Mar 18, 2017 via email

robbat2 added a commit to robbat2/radvd that referenced this issue Mar 19, 2017
RFC 7772 section 2.1 describes a network flooded with multicast RA at
the maximum 3 second interval due a high client turnover.

The mitigation described in RFC 7772 section 5.1.1 states that a unicast
RA response can be sent, but SHOULD be a configurable option
(AdvRASolicitedUnicast), and that networks containing tens or hundreds
of battery-powered devices SHOULD enable the option.

The new option is defaulted to on, as it has very few downsides, and
represents significent battery life improvements for many clients.

We do differ from the RFC in that we do not require the SLLA option to
be set in the RS, as testing shows many clients are not setting it.

See TODO for further possible improvements to AdvRASolicitedUnicast.

Fixes: radvd-project#63
Fixes: radvd-project#69
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
robbat2 added a commit to robbat2/radvd that referenced this issue Mar 19, 2017
RFC 7772 section 2.1 describes a network flooded with multicast RA at
the maximum 3 second interval due a high client turnover.

The mitigation described in RFC 7772 section 5.1.1 states that a unicast
RA response can be sent, but SHOULD be a configurable option
(AdvRASolicitedUnicast), and that networks containing tens or hundreds
of battery-powered devices SHOULD enable the option.

The new option is defaulted to on, as it has very few downsides, and
represents significent battery life improvements for many clients.

We do differ from the RFC in that we do not require the SLLA option to
be set in the RS, as testing shows many clients are not setting it.

See TODO for further possible improvements to AdvRASolicitedUnicast.

Fixes: radvd-project#63
Fixes: radvd-project#69
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
@robbat2
Copy link
Member Author

robbat2 commented Mar 19, 2017

Updated to drop the requirement.

jhg8 added a commit to jhg8/sroamd that referenced this issue Jan 19, 2024
Many clients don't include SLLA option in router solicitations:
radvd-project/radvd#63 (comment)
Answer these router solicitations by triggering a neighbour
solicitation when SLLA is not included, and then waiting for the
neighbour advertisment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants