Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udhcpc: no MSG_DONTROUTE when sending packet #1017

Closed
wants to merge 1 commit into from
Closed

udhcpc: no MSG_DONTROUTE when sending packet #1017

wants to merge 1 commit into from

Conversation

adiov
Copy link
Contributor

@adiov adiov commented Jun 6, 2018

TL;DR Don't set MSG_DONTROUTE when sending packets, otherwise it forces the client into non-compliant behaviour that is rejected by properly configured servers, thus creating unnecessary interruptions and connection breakages.

This patch reverts a change made in Sep 2017 which introduced MSG_DONTROUTE flag to prevent udhcpc from reaching out to servers on a different subnet. That change violates RFC2131 by forcing fully configured clients, who got their configurations through an offer relayed by a DHCP relay, from renewing through a unicast request directly to the DHCP server, resulting in the client resorting to boradcasting lease extension requests instead of unicasting them, further breaking RFC2131.

The problem with MSG_DONTROUTE appears when talking to a properly configured DHCP server that rejects non-compliant requests. Such server will reject lease extension attempts sent via broadcast rather than unicast, as is the case with Finnish ISPs Telia and DNA as well as Estonian ISP Starman. Once the lease expires without renewal, udhcpc enters init mode, taking down the interfaces with it, and thus causing interruption on every lease expiry. On some ISPs (such as the ones mentioned above) that can be once every 10-20 minutes. The interruptions appear in the logs as such:

udhcpc: sending renew to x.x.x.x
udhcpc: send: Network unreachable
udhcpc: sending renew to 0.0.0.0
udhcpc: sending renew to 0.0.0.0
...
udhcpc: lease lost, entering init state
Interface 'wan' has lost the connection
Interface 'wan' is now down
Network alias 'eth0' link is down
udhcpc: sending select for y.y.y.y
udhcpc: lease of y.y.y.y obtained, lease time 1200
Network alias 'eth0' link is up
Interface 'wan' is now up

This keeps repeating every lease period, breaking VPN connections, SSH sessions, etc.

During lease extension, a fully configured client should be able to reach out to the server from which it received the lease for extension, regardless in which network it is; that's up to the gateway to find. This patch ensures that.

It's worth noting that this correct behaviour exists in every mainstream DHCP client I checked, including the Internet System Consortium's.

P.S. I've made the change locally, built it, and have been testing it on my Archer C7 v2 for quite a while. This part of the logs from half an hour ago shows it's working as expected without interruption:

Wed Jun  6 22:39:07 2018 daemon.notice netifd: wan (1299): udhcpc: sending renew to x.x.x.x
Wed Jun  6 22:39:07 2018 daemon.notice netifd: wan (1299): udhcpc: lease of y.y.y.y obtained, lease time 1200
Wed Jun  6 22:49:07 2018 daemon.notice netifd: wan (1299): udhcpc: sending renew to x.x.x.x
Wed Jun  6 22:49:07 2018 daemon.notice netifd: wan (1299): udhcpc: lease of y.y.y.y obtained, lease time 1200
Wed Jun  6 22:59:07 2018 daemon.notice netifd: wan (1299): udhcpc: sending renew to x.x.x.x
Wed Jun  6 22:59:07 2018 daemon.notice netifd: wan (1299): udhcpc: lease of y.y.y.y obtained, lease time 1200

This reverts a change made in Sep 2017 [1] which introduced
MSG_DONTROUTE flag to prevent udhcpc from reaching out to servers on a
different subnet. That change violates RFC2131 by forcing fully
configured clients, who got their configurations through an offer
relayed by a DHCP relay, from renewing through a unicast request
directly to the DHCP server, resulting in the client resorting to
boradcasting lease extension requests instead of unicasting them,
further breaking RFC2131.

The problem with MSG_DONTROUTE appears when talking to a properly
configured DHCP server that rejects non-compliant requests. Such server
will reject lease extension attempts sent via broadcast rather than
unicast, as is the case with Finnish ISPs Telia and DNA as well as
Estonian ISP Starman. Once the lease expires without renewal, udhcpc
enters init mode, taking down the interfaces with it, and thus causing
interruption on every lease expiry. On some ISPs (such as the ones
mentioned above) that can be once every 10-20 minutes. The interruptions
appear in the logs as such:
----
udhcpc: sending renew to x.x.x.x
udhcpc: send: Network unreachable
udhcpc: sending renew to 0.0.0.0
udhcpc: sending renew to 0.0.0.0
...
udhcpc: lease lost, entering init state
Interface 'wan' has lost the connection
Interface 'wan' is now down
Network alias 'eth0' link is down
udhcpc: sending select for y.y.y.y
udhcpc: lease of y.y.y.y obtained, lease time 1200
Network alias 'eth0' link is up
Interface 'wan' is now up
----

During lease extension, a fully configured client should be able to
reach out to the server from which it recieved the lease for extension,
regardless in which network it is; that's up to the gateway to find. [2]
This patch ensures that.

[1]
http://lists.busybox.net/pipermail/busybox-cvs/2017-September/037402.html
[2]
https://www.netmanias.com/en/post/techdocs/6000/dhcp-network-protocol/
understanding-dhcp-relay-agents

Signed-off-by: Adi Shammout <adi.shammout@outlook.com>
@jow-
Copy link
Contributor

jow- commented Jun 6, 2018

Impressive work on tracking this down! LGTM. Did you send this upstream by any chance? I think the busybox authors should be made aware of this problem.

cc @dedeckeh

@adiov
Copy link
Contributor Author

adiov commented Jun 6, 2018

Thanks! Indeed, it wasn't easy to track down. It took a full weekend of tcpdumping, digging through how OpenWRT networking works, and reading RFC2131 and uhcpd code :) In the end I'm glad that at least it works for me and for another friend facing the same issue.

Submitting patches to BusyBox isn't as straightforward as here. I attempted to join the BusyBox mailing list, but still no confirmation. Non-members can't send to it. If you're on the list, could you please link them this PR? I was gonna reach out to the maintainers directly, but figured it might be bad etiquette.

@dedeckeh
Copy link
Contributor

dedeckeh commented Jun 7, 2018

@adiov Indeed nice work tracking this down; we can accept the patch but it would be ideal upstream busybox fixes this as well.
If I understand it correctly the issue arises when a DHCP relay is present in the network; did you observe the giaddr field being set in the DHCP offer/ack coming from the DHCP server ? If yes this could be used by the client as an indication it has no direct connection with the DHCP server and as result don't set the MSG_DONROUTE flag.

@dedeckeh dedeckeh added the core packages pull request/issue for core (in-tree) packages label Jun 7, 2018
@adiov
Copy link
Contributor Author

adiov commented Jun 7, 2018

This morning, I filed a bug and attached the patch in BusyBox's Bugzilla. Let's see how it that progresses.

In the dumps I collected, GIADDR is being set by the relay in both DHCPOFFER and DHCPACK messages.

image

Having that said, I don't believe MSG_DONROUTE should be set even if GIADDR is not observed, for two reasons:

  1. A misbehaving actor capable of injecting a bogus Server Identifier can also inject or omit GIADDR, give a false SIADDR, and pretty much manipulate the whole DHCP exchange to achieve whatever purpose they have in mind. In my opinion, conditionally setting MSG_DONROUTE is as useful as not setting it at all. It's up to the fully-configured client and its networking stack to decide how to pass around that message.
  2. No other DHCP client is doing it.

I suggest that no flag is passed.

@ldir-EDB0
Copy link
Contributor

Had this patch in local tree for 24 hours - All is well for me, in fact DHCPv4 renewals with my ISP are cleaner, going straight to the DHCP server. Before this change the direct route would fail and would fall back to a broadcast renewal.

Tested-by: Kevin Darbyshire-Bryant ldir@darbyshire-bryant.me.uk

@dedeckeh
Copy link
Contributor

dedeckeh commented Jun 7, 2018

As the issue is critical I've pushed the patch to both master and openwrt-18.06. If upstream busybox fixes the issue in another way we can backport the patch; thx

@dedeckeh dedeckeh closed this Jun 7, 2018
@adiov
Copy link
Contributor Author

adiov commented Jun 8, 2018

Tested with default OpenWrt SNAPSHOT r7137-e4259bed3f on TP-Link Archer C7 v2. It works as expected.

Fri Jun  8 09:50:48 2018 daemon.notice netifd: wan (1322): udhcpc: sending renew to x.x.x.x
Fri Jun  8 09:50:48 2018 daemon.notice netifd: wan (1322): udhcpc: lease of y.y.y.y obtained, lease time 1200

x.x.x.x is on a different network.

@adiov
Copy link
Contributor Author

adiov commented Jun 21, 2018

@dedeckeh
Copy link
Contributor

@adiov Thanks for informing me; replaced the patch by the upstream "fix"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core packages pull request/issue for core (in-tree) packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants