Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved: Mitigate DVE-2018-0001, by retrying NXDOMAIN without EDNS0. #8608

Closed
wants to merge 1 commit into from

Conversation

xnox
Copy link
Member

@xnox xnox commented Mar 28, 2018

Some captive portals, lie and do not respond with the captive portal IP
address, if the query is with EDNS0 enabled and D0 bit set to zero. Thus retry
"secure" domain name look ups with less secure methods, upon NXDOMAIN.

Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/bionic/+source/systemd/+bug/1727237
Bug-DNS: https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

@yuwata yuwata added the resolve label Mar 29, 2018
@xnox xnox force-pushed the DVE-2018-0001 branch 2 times, most recently from 7abcaa9 to 26a6c6b Compare May 31, 2018
@xnox
Copy link
Member Author

@xnox xnox commented May 31, 2018

So without this, one cannot resolve and auth with the captive portal at Starbucks in the US. They appear to be deploying aruba networks captive portals at the moment.

Between asserting broken DNS behavior, and coffee.... coffee wins.

@poettering
Copy link
Member

@poettering poettering commented Jun 8, 2018

Urks, so this is of course frickin ugly, but I guess if this is what we need to do this is what we need to do...

Copy link
Member

@poettering poettering left a comment

OK, sounds conceptually OK, but please make the two indicated changes: jump down to UDP mode in one jump please.

And add a check t->scope->dnssec_mode != DNSSEC_YES so that we don't second guess DNSSEC replies in strict DNSSEC mode.

It kinda sucks that this means in permissive DNSSEC mode we'll never be able do do DNSSEC for NXDOMAIN anymore though

@xnox
Copy link
Member Author

@xnox xnox commented Jun 13, 2018

I've tested this against a DNSSEC signed domain, in permissive mode, and it was asserted correctly. The captive portals in question, simply do not know how to rewrite ends0/dnssec queries and pass them through unmodified.

Thus to access the captive portal, one must request/resolve something in a non-dnssec-signed domain, e.g. like start.ubuntu.com or the whatever network-manager uses, and with this fallback alone one experiences correct behaviour.

Thus i'm not sure the extra check t->scope->dnssec_mode != DNSSEC_YES is needed. Unless I am missing something... Does that flip to NO, for the given transaction, if the domain in question is not DNSSEC signed at all? Cause if the domain in question is not dnssec enabled, it should be fine to second guess it, no?

Imho we should still assert DNSSEC for NXDOMAIN in permissive mode for dnssec signed domains, and the users simply must access something non-dnssec-signed to get to the captive portal. Thoughts?

@poettering
Copy link
Member

@poettering poettering commented Jun 13, 2018

If people enable DNSSEC strict mode they basically say "fuck captive portals". It's a no-compromise mode, enabled by folks who do not want to compromise on security, but captive portals by their nature really are a compromise on security since they generally mean rewriting DNS and/or HTTP.

I don't think anyone is helped if we'd second guess NXDOMAIN in strict DNSSEC mode. It sounds like something the security nerds who care about strict DNSSEC mode would just be pissed about, and hence not worth doing. I mean, if you pick DNSSEC strict mode you are in for a hard time anyway... And regular people would never pick DNSSEC strict mode in the first place...

In an ideal world, NetworkManager's captive portal detection would tell resolved to turn off DNSSEC on that interface until connectivity is verified at which point it should be turned on (if the user said so on that interface).

@jrb0001
Copy link

@jrb0001 jrb0001 commented Oct 6, 2018

@xnox: That check is definitely required. Ubuntu bionic seems to use the same patch and it produces SERVFAIL responses if DNSSEC=yes and the upstream returns a NXDOMAIN. The log is spammed with those retry messages followed by "DNSSEC validation failed for question test.asdf IN SOA: incompatible-server" for the SOA and DS of every segment which resulted in a NXDOMAIN.

See also https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1796501.

@ofosos
Copy link

@ofosos ofosos commented Dec 19, 2018

Apparently Ubuntu 18 pulled this patch into their build. As far as I can tell this patch will lead to a reported DNS violation for every request that gets answered with NXDOMAIN and it will then downgrade from EDNS to plain DNS and try it again. Is this correct?

I wish there was a solution that is less obnoxious in production, i.e. something that doesn't dump a lot of messages in a log for a failed dns query.

@jrb0001
Copy link

@jrb0001 jrb0001 commented Dec 19, 2018

@ofosos yes, it will retry with reduced "feature level" until plain DNS and then fail with SERVFAIL because it can't do DNSSEC at that level anymore.

@xnox xnox force-pushed the DVE-2018-0001 branch 2 times, most recently from a1a35a3 to fa0f317 Compare Jan 25, 2019
Some captive portals, lie and do not respond with the captive portal IP
address, if the query is with EDNS0 enabled and DO bit set to zero. Thus retry
all domain name look ups with less secure methods, upon NXDOMAIN. Unless strict
DNSSEC validation is enabled.

Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/bionic/+source/systemd/+bug/1766969
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/bionic/+source/systemd/+bug/1727237
Bug-DNS: https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md
(cherry picked from commit cc0a0eb)
@xnox
Copy link
Member Author

@xnox xnox commented Jan 26, 2019

@poettering requested changes are now done. Drops straight to UDP, and doesn't do anything if script DNSSEC mode is on.

@BryanQuigley
Copy link

@BryanQuigley BryanQuigley commented Mar 2, 2019

+1 to updated patch - definite improvement.

I am curious if we could restrict how often it runs more - say only on a new network, or special case something else about arubanetworks.

To that end, was arubanetworks.com contacted? Possible fixes on their end? Happy to reach out if not.

@BryanQuigley
Copy link

@BryanQuigley BryanQuigley commented Apr 23, 2019

Can anyone rerun the autopkgtests - I'm pretty sure the failure was unrelated to the PR.

@BryanQuigley
Copy link

@BryanQuigley BryanQuigley commented Sep 19, 2019

I've been running this updated patch in Ubuntu 19.10 Dev release with my PPA: https://launchpad.net/~bryanquigley/+archive/ubuntu/1796501/+packages Everything seems to be working fine (DNSSEC=yes for me).

I did report the issue to one of Starbuck's vendors, but have not heard back yet.

@ddstreet
Copy link
Contributor

@ddstreet ddstreet commented Dec 11, 2019

This patch has been included in Ubuntu for a while, and the log messages have widely been reported as annoying:
https://www.google.com/search?q=mitigating+potential+dns+violation

And since its assumption (dns violation) is actually incorrect in the vast majority of cases (i.e. for everyone not using a broken captive portal), it's highly misleading.

Additionally, the dns retry appears to be causing other problems with delays in dns resolution:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1785383

I haven't yet had time to look at a proper way of working around broken captive portals, but I would not recommend applying this patch in its current form.

poettering added a commit to poettering/systemd that referenced this issue Nov 12, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Replaces: systemd#8608
@poettering
Copy link
Member

@poettering poettering commented Nov 12, 2020

I included a forward-port of this (quite reworked) in #17535. Let's close this one.

@poettering poettering closed this Nov 12, 2020
poettering added a commit to poettering/systemd that referenced this issue Nov 12, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
@xnox
Copy link
Member Author

@xnox xnox commented Nov 12, 2020

i am at the point of dropping this patch myself. because internet is a vile place, and neither this, nor captive portals, nor dnssec work at all =( and i am sad.

Let me check your draft PR to see what you are doing there.

poettering added a commit to poettering/systemd that referenced this issue Nov 17, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Nov 18, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Nov 18, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Nov 19, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Nov 20, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Dec 2, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Dec 3, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Dec 4, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Dec 7, 2020
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit to poettering/systemd that referenced this issue Feb 16, 2021
This is an updated version of systemd#8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: systemd#8608
poettering added a commit that referenced this issue Feb 17, 2021
This is an updated version of #8608 with more restrictive logic. To
quite the original bug:

    Some captive portals, lie and do not respond with the captive portal
    IP address, if the query is with EDNS0 enabled and D0 bit set to
    zero. Thus retry "secure" domain name look ups with less secure
    methods, upon NXDOMAIN.

https://github.com/dns-violations/dns-violations/blob/master/2018/DVE-2018-0001.md

Yes, this fix sucks hard, but I guess this is what we need to do to make
sure resolved works IRL.

Heavily based on the original patch from Dimitri John Ledkov, and I
copied the commentary verbatim.

Replaces: #8608
@Rajpratik71
Copy link

@Rajpratik71 Rajpratik71 commented Mar 25, 2021

Anyone who using old 18.04 release , will face this . In background it is creating issue to kube-proxy , which internally use iptables and depend upon underlying networking of OS.

fix which worked for me

unlink /etc/resolv.conf && ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf && systemctl restart systemd-resolved

@xnox
Copy link
Member Author

@xnox xnox commented Mar 25, 2021

Anyone who using old 18.04 release , will face this . In background it is creating issue to kube-proxy , which internally use iptables and depend upon underlying networking of OS.

fix which worked for me

unlink /etc/resolv.conf && ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf && systemctl restart systemd-resolved

This sounds odd, as new enough kube-proxy knows how to look and use /run/systemd/resolve/resolv.conf directly without changing the systems /etc/resolv.conf symlink. Which versions of kube-proxy are you using?

@xnox
Copy link
Member Author

@xnox xnox commented Mar 25, 2021

@Rajpratik71 also this issue by itself shouldn't actually be causing any issues for kub-proxy... so do please explain what you are observing.

Also note that excessive error messages have been fixed in 18.04 release recently, https://launchpad.net/ubuntu/+source/systemd/237-3ubuntu10.45 have you upgraded to latest systemd?

@Rajpratik71
Copy link

@Rajpratik71 Rajpratik71 commented Mar 26, 2021

calico (v3.18.0) as CNI which depends upon kube-proxy , unable to forward Traffic/Unable to access , services running as NodePorts and ClusterIP even from Inside the cluster , internal communication between pods are happening using dns. This issue is consistent on AWS ec2 while same configuration/combination of services are working fine for baremetal environment.

@xnox
Copy link
Member Author

@xnox xnox commented Mar 26, 2021

calico (v3.18.0) as CNI which depends upon kube-proxy , unable to forward Traffic/Unable to access , services running as NodePorts and ClusterIP even from Inside the cluster , internal communication between pods are happening using dns. This issue is consistent on AWS ec2 while same configuration/combination of services are working fine for baremetal environment.

This sounds strange. I will ping people for familiar with kubernetes & kube-proxy. kubernetes itself uses https://github.com/kubernetes/kubernetes/blob/76f2a4d5fd8364edbb31a3611178c918644f415c/cmd/kubeadm/app/componentconfigs/kubelet.go#L59 the systemd-resolved resolv.conf directly when it is active to access the underlying resolvers to pass through to kubernetes. And so should kube-proxy...... The solution of changing /etc/resolv.conf to point at /run/systemd/resolve/resolv.conf instead of the stub, is correct for k8s-only deployments. But I thought we fixed it all a year ago, see kubernetes/kubernetes@28b9a4e

I will check with maintainers of microk8s & Canonical distrubtion of k8s in AWS w.r.t. to this. Do use the mitigation via changing the symlink for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

8 participants