Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolvconf/resolvectl fails to make DNS servers exclusive #17529

Closed
zx2c4 opened this issue Nov 5, 2020 · 22 comments
Closed

resolvconf/resolvectl fails to make DNS servers exclusive #17529

zx2c4 opened this issue Nov 5, 2020 · 22 comments
Labels
downstream/fedora Tracking bugs for Fedora needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer resolve

Comments

@zx2c4
Copy link
Contributor

zx2c4 commented Nov 5, 2020

WireGuard's wg-quick(8) utility uses the resolvconf interface to manage DNS servers in a cross-platform manner. It uses the -x option so that DNS queries are sent only to the specified servers on an interface and not to other servers. The invocation is something like:

printf 'nameserver %s\n' "${DNS[@]}" | resolvconf -a wg0 -m 0 -x

When this was added years ago, I was under the impression that it worked, but recently Fedora 33 users have reported that systemd-resolved strangely sends queries both out of the WireGuard interface/dns-servers AND out of the plaintext ethernet interface, leaking DNS queries into the clear. For many, that's a quasi-serious privacy and/or security issue.

I went and tested this to confirm the bug reports on a fresh Fedora 33 VM. The servers get added to systemd-resolved correctly (checked with resolvectl), but queries are still sent out of the plaintext interface:

resolvconf

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 5, 2020

cc @yuwata @poettering

@keszybz keszybz added downstream/fedora Tracking bugs for Fedora resolve labels Nov 6, 2020
@poettering
Copy link
Member

what's the "resolvectl" output in this case?

My guess is that that the underlying iface also wants "exclusive" DNS traffic, and if two ifaces want that they both get it.

Basically there's are only three levels of priority defined when routing in resolved: "preferable", "regular", "noway". If we look-up a domain and there's an interface with a preferable domain name route, it will get it, and that's the end. If there are multiple "preferable" interfaces, they all get it. If there are no "preferable" interfaces, then the "regular" ones will get it. All of them.

(Well, it's slightly more complex, since we also take the "closeness" of the match into account, i.e. if we look up "foo.bar.com" and one iface has "com" as listed routing domain and the other has "bar.com", then we will only route to the latter, since it's the "closer" match of the two. But in this case this doesn#t matter).

resolvconf's "-x" switch s mapped to the "preferable" routing priority.

Anyway, "resolvectl" should reveal what is configured precisely, and my educated guess is that the other, underlying interface also has a "preferable" routing set up for it, i.e. did the equivalent of "resolvconf -x", too.

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Nov 6, 2020
@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 6, 2020

Looks like that's what's going on, judging by the ~.?

Global
       LLMNR setting: resolve             
MulticastDNS setting: no                  
  DNSOverTLS setting: no                  
      DNSSEC setting: no                  
    DNSSEC supported: no                  
Fallback DNS Servers: 1.1.1.1             
                      8.8.8.8             
                      1.0.0.1             
                      8.8.4.4             
                      2606:4700:4700::1111
                      2001:4860:4860::8888
                      2606:4700:4700::1001
                      2001:4860:4860::8844

Link 2 (enp0s2)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
DefaultRoute setting: yes                      
       LLMNR setting: yes                      
MulticastDNS setting: no                       
  DNSOverTLS setting: no                       
      DNSSEC setting: no                       
    DNSSEC supported: no                       
  Current DNS Server: 10.0.2.3                 
         DNS Servers: 10.0.2.3                 
          DNS Domain: ~.                       

Link 3 (virbr0)
      Current Scopes: none
DefaultRoute setting: no  
       LLMNR setting: yes 
MulticastDNS setting: no  
  DNSOverTLS setting: no  
      DNSSEC setting: no  
    DNSSEC supported: no  

Link 4 (virbr0-nic)
      Current Scopes: none
DefaultRoute setting: no  
       LLMNR setting: yes 
MulticastDNS setting: no  
  DNSOverTLS setting: no  
      DNSSEC setting: no  
    DNSSEC supported: no  

Link 6 (wg0)
      Current Scopes: DNS        
DefaultRoute setting: yes        
       LLMNR setting: yes        
MulticastDNS setting: no         
  DNSOverTLS setting: no         
      DNSSEC setting: no         
    DNSSEC supported: no         
         DNS Servers: 192.168.4.1
          DNS Domain: ~.         

So this would be a bug in whatever is assigning the DNS servers? Fresh Fedora 33, so I assume that's networkmanager setting the "preferred" status instead of "regular"?

CC @thom311

@poettering
Copy link
Member

yes, "~." means "please please preferably route DNS traffic for all domains" here. Both wg0 and enp0s2 have that set, hence both get the traffic.

I am not sure why NM would set this on a regular ethernet interface though. I mean, it makes sense to set something like this on vpn devices, but on regular devices?

maybe NM actively manages the ~. route, and keeps updating it depending on which ifaces it manages? and given that you run the vpn outside of NM it doesn't drop the route ~. from the regular ethernet interface once the vpn is up? If that's what NM does I figure the logic could be improved, to set ~. only on interfaces that are at least of "importance level vpn", i.e. never on regular ethernet stuff.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 6, 2020

If that's what NM does I figure the logic could be improved, to set ~. only on interfaces that are at least of "importance level vpn", i.e. never on regular ethernet stuff.

That seems reasonable to me. Waiting for @thom311's input.

@thom311
Copy link
Contributor

thom311 commented Nov 6, 2020

Basically there's are only three levels of priority defined when routing in resolved: "preferable", "regular", "noway".

How is this priority configured? Is it:

  • "noway": link has no DNS nameservers configured
  • "regular": link has some DNS nameservers, but no search/routing-only domains set
  • "preferable": link has some DNS nameservers and also at least one search/routing-only domain set

And, I assume that if a link does not have "~." but some more specific domains ("example.com" or "~example.com"), then it's only used for requests with that domain and not in general. Is that correct?

@mcatanzaro
Copy link
Contributor

Note there is a corresponding Red Hat bug where I have left a couple comments.

maybe NM actively manages the ~. route, and keeps updating it depending on which ifaces it manages? and given that you run the vpn outside of NM it doesn't drop the route ~. from the regular ethernet interface once the vpn is up?

Correct. If this VPN was configured by NetworkManager, it would do the right thing: either set ~. on the VPN interface (e.g. for a full-tunnel VPN), or leave it on the Ethernet interface and set some other specific DNS domain on the VPN interface (e.g. for a corporate VPN).

If that's what NM does I figure the logic could be improved, to set ~. only on interfaces that are at least of "importance level vpn", i.e. never on regular ethernet stuff.

No, that's no good because it will result in DNS leaks. Users expect that DNS goes only where they have configured it to go. Consider the case of a corporate VPN when there is no ~. search domain on the ethernet interface: then requests to pornhub.com or whatever could leak to the VPN interface by mistake.

IMO the problem here is inconsistent use of NetworkManager. You should either (a) use NetworkManager to configure the VPN, or (b) totally disable NetworkManager. Going behind NetworkManager's back like this is a problem. Tools like wg-quick are great for systems that don't use NetworkManager. If using NetworkManager, I don't see why you would want to use them when you could configure your VPN in NetworkManager instead.

@mcatanzaro
Copy link
Contributor

Well, I'm not 100% certain whether NetworkManager developers would completely agree with that. Let's ask @thom311. It's possible that having NetworkManager try to account for VPN interfaces that it doesn't control might be possible. I don't think we should expect it to, but maybe it would be possible.

@mcatanzaro
Copy link
Contributor

If that's what NM does I figure the logic could be improved, to set ~. only on interfaces that are at least of "importance level vpn", i.e. never on regular ethernet stuff.

That seems reasonable to me. Waiting for @thom311's input.

I think NetworkManager should always set ~. on exactly one interface.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 6, 2020

NetworkManager should not set ~. on non-VPN interfaces. That's preposterous and makes any outside systems' interactions with systemd-resolved impossible.

@snaggen
Copy link

snaggen commented Nov 6, 2020

You must be able to set up a vpn outside NetworkManager. If not, then you are suddenly saying that all VPNs outside the ones supported by NM will never work with any distro using NM. The VPN support on Linux suddenly become quite bad.

@mcatanzaro
Copy link
Contributor

NetworkManager should not set ~. on non-VPN interfaces. That's preposterous and makes any outside systems' interactions with systemd-resolved impossible.

As I've tried to explain: if you try to change this behavior, you're going to cause unexpected DNS leaks. There are really very major consequences to suggesting that this change.

We passed the systemd-resolved Fedora change proposal based on promises that NetworkManager would always configure a ~. domain. If NetworkManager stops doing this, then many of the criticisms we received about systemd-resolved's default behavior of sending DNS to every configured server would become valid. E.g. my statements here would no longer be true, as would various other promises I made throughout that conversation.

@mcatanzaro
Copy link
Contributor

Changing that would also invalidate everything written here, which is almost the entire justification for switching to systemd-resolved in the first place. (If we can't do split DNS correctly, the only significant remaining benefit would be the shared DNS cache.)

@keszybz
Copy link
Member

keszybz commented Nov 6, 2020

NetworkManager should not set ~. on non-VPN interfaces. That's preposterous and makes any outside systems' interactions with systemd-resolved impossible.

I too very much disagree with this. For people who use VPN or VPNs for work, NM setting ~. on non-VPN interfaces is crucial.

If that's what NM does I figure the logic could be improved, to set ~. only on interfaces that are at least of "importance level vpn", i.e. never on regular ethernet stuff.

No, there is no rule that "ethernet" is somehow always inferior to "vpn". For different people, different things have priority. Both cases are very much valid. In fact, people may have multiple VPNs active, with higher and lower priority.

You must be able to set up a vpn outside NetworkManager. If not, then you are suddenly saying that all VPNs outside the ones supported by NM will never work with any distro using NM.

Yeah, but in that case you somehow need to tell NM to use a different config. Essentially, NM was told that the ethernet connection is to be preferably used for all traffic, and it's adhering to that. If you want different config, NM needs to be told somehow.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 6, 2020

I too very much disagree with this. For people who use VPN or VPNs for work, NM setting ~. on non-VPN interfaces is crucial.

I'm not sure I follow what you're saying. Do I have it backwards? Do you have it backwards? If you use a VPN for work, then you want your VPN interface to have ~. (if it's configured to route all your internet traffic) and for your non-VPN interface to have not-~. Otherwise, you'll leak private domain queries and stuff over your clear text non-VPN interface.

(Maybe you're somehow worried about the reverse? Sending your personal traffic over your company's VPN? But in that case, shouldn't the more specific domain suffix for your company's VPN handle that anyway?)

More generally, the exclusivity of ~. only works if the system agrees not to use ~., except when that's the desired behavior from the administrator or some specialized utility. Therefore, NM should not be setting ~. by default for the various interfaces the user connects. Instead it should leave them in "regular mode" and then various users with particular VPN use cases can opt their VPN interfaces into ~., or opt other interfaces into that, depending on what they want. But having NM opt everybody into ~. destroys the purpose of that feature.

At least that's my understanding from Lennart's description above of preferable/regular/noway.

@mcatanzaro
Copy link
Contributor

mcatanzaro commented Nov 6, 2020

Yeah, but in that case you somehow need to tell NM to use a different config. Essentially, NM was told that the ethernet connection is to be preferably used for all traffic, and it's adhering to that. If you want different config, NM needs to be told somehow.

This isn't hard if the wireguard interface corresponds to a NetworkManager connection profile: just give it ipv{4,6}.dns-priority = -1 if it is to be a full-tunnel VPN. That should work.

But in this case, the wireguard connection is not configured via NetworkManager, which complicates things a lot. I'm not sure if there is actually any way to do this or not....

I'm not sure I follow what you're saying. Do I have it backwards? Do you have it backwards? If you use a VPN for work, then you want your VPN interface to have . (if it's configured to route all your internet traffic) and for your non-VPN interface to have not-. Otherwise, you'll leak private domain queries and stuff over your clear text non-VPN interface.

Yes, definitely. In this case, you must not have ~. on the ethernet/wifi interface. That is one of the two main use-cases for VPNs, and the default case in NetworkManager. But as for the second main use-case:

(Maybe you're somehow worried about the reverse? Sending your personal traffic over your company's VPN? But in that case, shouldn't the more specific domain suffix for your company's VPN handle that anyway?)

Correct, that's indeed the problem. This is the second main use-case for VPNs. In this case, you must have ~. on the ethernet/wifi interface or you will leak DNS to the VPN.

In gnome-control-center, you achieve this use-case by selecting "Use this connection only for resources on its network." (Well, that doesn't work for Wireguard yet, since GNOME sadly doesn't support Wireguard yet, but it works for other types of VPNs that can be configured graphically.) With that checkbox checked, you're in this case two, corporate VPN. Otherwise, you get case one, full tunnel VPN.

So you see, the requirements for this second case are the opposite of the requirements for the first case. In case one, we must not have ~. on the non-VPN interface. In case two, we must have it there. NetworkManager has to satisfy both cases, and it will do the right thing depending on whether you checked that checkbox in the UI or not. (I'm honestly not certain what lower-level setting the checkbox corresponds to, but of course there's some non-GUI way to get that behavior as well.)

To make sure there is no ambiguity about where DNS goes, NetworkManager ensures there is always a ~. domain on exactly one interface. If your query matches some more specific-domain, then that controls where it goes; otherwise, it goes to the DNS server corresponding to the interface with the ~. domain. Any configured global domains are ignored and unused. This ensures there are no surprises, at least if you use NetworkManager to configure your VPN. But now we have a problem with wg-quick:

More generally, the exclusivity of ~. only works if the system agrees not to use ~., except when that's the desired behavior from the administrator or some specialized utility.

Again correct. This basically means that resolveconf -x is totally incompatible with NetworkManager if systemd-resolved is in use, because NetworkManager will always set ~. And that is the default behavior of Fedora today. So that's the crux of this bug report.

But note that limitation is documented behavior, see resolvconf(1):

       -x
           This switch for "exclusive" operation is supported only partially.
           It is mapped to an additional configured search domain of "~."  —
           i.e. ensures that DNS traffic is preferably routed to the DNS
           servers on this interface, unless there are other, more specific
           domains configured on other interfaces.

I don't think resolvconf -x is easily fixable. We could make it enumerate all other network interfaces and remove ~. domains from each one. That would be a pretty significant behavior change, of course, but perhaps that's more in line with what users expect from -x. I think that would be OK since nothing except custom scripts should be using resolvconf, and if you use -x then you really want to clobber all other configuration. (But I'm not sure how that would interact with NetworkManager -- it might fight you and add back ~. perhaps? -- so that's why it's nicer for NetworkManager to be the only thing configuring DNS.)

Probably a better solution for be for wg-quick to use resolvectl or the D-Bus API instead, and manually remove ~. from other interfaces. That would probably be more reliable. (But again, I wonder if NetworkManager will undo that if it gets restarted, for example.)

Therefore, NM should not be setting ~. by default for the various interfaces the user connects. Instead it should leave them in "regular mode" and then various users with particular VPN use cases can opt their VPN interfaces into ~., or opt other interfaces into that, depending on what they want. But having NM opt everybody into ~. destroys the purpose of that feature.

Well, there's where we have a couple problems:

  • NetworkManager would have to be smart enough to ensure it will never fail to add ~. to the non-VPN interface in case 2. And I'm not sure if it can actually do that, or if it's reasonable to expect it to do that, when it's not really in charge of everything. (When I say "non-VPN interface," I am abusing terminology a bit, because the non-VPN interface could itself be a full-tunnel VPN.) I really don't know here: we would have to see what NetworkManager developers think is possible.
  • Without ~. on any interface, you'll wind up sending multiple DNS requests per lookup: one for each interface. E.g. it's surprisingly common to have different DNS servers configured for ethernet than for wi-fi. This can even happen with two ethernet interfaces plugged into different routers. So I think lacking ~. would be a controversial default.

@mcatanzaro
Copy link
Contributor

I don't think resolvconf -x is easily fixable. We could make it enumerate all other network interfaces and remove ~. domains from each one. That would be a pretty significant behavior change, of course, but perhaps that's more in line with what users expect from -x. I think that would be OK since nothing except custom scripts should be using resolvconf, and if you use -x then you really want to clobber all other configuration. (But I'm not sure how that would interact with NetworkManager -- it might fight you and add back ~. perhaps? -- so that's why it's nicer for NetworkManager to be the only thing configuring DNS.)

Thinking about this more... maybe we should just do this, and see how well it works in practice. I bet that will be fine.

@poettering
Copy link
Member

poettering commented Nov 19, 2020

Hmm, so DNS routing in resolved has actually one more feature, that I didn't mention above: the "default-route" boolean that an iface can have. If a name is looked up for which no route exists it will be routed to all interfaces that have "default-route" set to true, but not to any that have it to false. (the field defaults to true for all interfaces)

Let's put together a set of typical interfaces:

  1. If you have a "corporate" VPN (i.e. the kind where you want to route only very specific lookups to) then NM should configure its search domain list/route domain list to the relevant domains, and set "default-route" to off for it, and not include "~." in the list. This then means: only DNS traffic to the specified domains will be routed to the VPN, and nothing else. (as an example, let's call an interface like this redhat0, and the routing domain on it shall be ~corp.redhat.com)

  2. If you have a "privacy" VPN (i.e. the kind where you want all DNS to go to, no matter what) then NM should configure its search domain list to include "~.". (The "default-route" boolean for the iface is then irrelevant). (let's call this network interface privacy0)

  3. If you have a regular ethernet/wifi device, then NM should configure its search domain list not to include "~.", but do set "default-route" for it to true. (let's call this network interface eth0)

So, now let's see what happens if these ifaces are turned on.

  1. First eth0 is upped, and is the only iface around. Any lookups are routed there, since default-route is on for it, and no search/route domains are configured whatsoever.

  2. Then the user ups privacy0. Due to the ~- search domain associated with the network interface all DNS traffic will now go there, and eth0 won't get any anymore.

  3. Then the user ups redhat0. Due to the ~corp.redhat.com routing domain, everything from *.corp.redhat.com will now go to that interface. But nothing else ever, because it has no other routes, and default-route for it is also off.

  4. Then the user downs privacy0, leaving redhat0 and eth0 up. The ~. search domain on privacy0 now loses its magic power, and since there's no other iface with such a generic search domain DNS traffic goes to the interfaces that have default-route set to true, which at this point is only eth0.

So in all listed cases the right thing happens.

And now to come back to the this issue itself. In this combination, "resolvectl -x" actually does the right thing, the way @zx2c4 expects it, as it -x translates to the exact same settings as applied for privacy0.

Does this make sense?

@zx2c4
Copy link
Contributor Author

zx2c4 commented Nov 19, 2020

So in all listed cases the right thing happens.

And now to come back to the this issue itself. In this combination, "resolvectl -x" actually does the right thing, the way @zx2c4 expects it, as it -x translates to the exact same settings as applied for privacy0.

Does this make sense?

Right, it seems like you've got all bases covered in resolved. I'm pretty sure the issue is that @thom311 misunderstood how ~. works in systemd, because networkmanager has a similar syntax with slightly different semantics. So the thing to do now would be to stop setting ~. on every interface from networkmanager, which totally destroys the resolved scheme.

@mcatanzaro
Copy link
Contributor

Hm, yes, that sounds like a good proposal. If @thom311 agrees, then we would not need any changes in systemd or in wg-quick: we would only need changes in NetworkManager. (NetworkManager currently doesn't touch the DefaultRoute setting at all, since it uses ~. to force the DNS to go where expected.)

(wg-quick should still probably use something more sophisticated than resolvconf to configure systemd-resolved. But whatever. :)

@thom311
Copy link
Contributor

thom311 commented Nov 19, 2020

I agree, this needs fixing in NetworkManager.

I opened https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/687 for that.

Thanks @poettering for elaborating. Makes sense.
Thanks @zx2c4 for the report, and thanks everybody for engaging!!

IMO this issue can be closed.

@poettering
Copy link
Member

I put together some docs in #17678 that rehash what was discussed here, and extend a bit on it (i.e. there's also focus on reverse domain lookup, i.e. corporate VPNs should probably add some ~….in-addr.arpa routing domain to their interfaces). ptal.

poettering added a commit to poettering/systemd that referenced this issue Nov 24, 2020
keszybz pushed a commit that referenced this issue Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream/fedora Tracking bugs for Fedora needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer resolve
Development

No branches or pull requests

6 participants