Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolved not working corretly with libvirt virbr0 #18761

Open
resdigita opened this issue Feb 23, 2021 · 12 comments
Open

systemd-resolved not working corretly with libvirt virbr0 #18761

resdigita opened this issue Feb 23, 2021 · 12 comments
Labels
needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer resolve

Comments

@resdigita
Copy link

System version 246
Used distribution

Fedora 33…

Linux kernel version used (uname -a)

5.10.9-201.fc33.x86_64

CPU architecture issue was seen on

x86_64

Expected behaviour you didn't see

dns-split working with virbr0 after reboot

Unexpected behaviour you saw

after (re-)boot I had to manually systemctl restart systemd-resolved

Steps to reproduce the problem

Situation is: Fedora Server running several Fedora VMs. VMs and host are using libvirt default network via virbr0 for internal private data exchange. A second public bridge provides external connectivity. On virbr0 the DNS service is activated so host and VMs can find each other via name instead IPs. If you use dig @ the libvirt dnsmasq works fine.

In VMs with 2 interfaces eht0 (external) and eth1 (internal) everything works right out of the box. Internal and external names are resolved.

On host I had to add two entries to /etc/systemd/resolved.conf to make it work.
DNS=192.168.122.1%virbr0#example.lan ## (.lan = private domain)
Domains=example.lan ## search domain, appended to names w/o a dot
But I had to restart systemd-resolved after boot to make it work. Otherwise it doesn't (obviously ignores / does not know about virbr0).

[...]# reboot 
[...]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
   ....  
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
   ...   //external interface
3: vbr3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
   ...   // external routin bridge (brouter)
4: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8000 qdisc noqueue state UP group default qlen 1000
   ...   // internal bridge
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 8000 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
   ...   // virtual interface host to virbr0, no IP addres of its own
6: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vbr3s0 state UNKNOWN group default qlen 1000
   ...   // virtual public interface VM
7: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8000 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
   ...   // virtual private interface VM

[...]# resolvectl   domain
Global: example.lan
Link 2 (enp3s0): example.com ~.
Link 3 (vbr3s0):
Link 4 (virbr0):
Link 5 (virbr0-nic):
Link 6 (vnet0):
Link 7 (vnet1):

[...]# resolvectl   dns
Global:
Link 2 (enp3s0): 213.133.98.98 213.133.99.99 2a01:4f8:0:1::add:1010 2a01:4f8:0:1::add:9999.   ## our external name server
Link 3 (vbr3s0):
Link 4 (virbr0):        ## no sign of libvirt dnsmasq server 
Link 5 (virbr0-nic):
Link 6 (vnet0):
Link 7 (vnet1):

[...]# resolvectl   query vm
vm: aaa.bbb.ccc.dd                        -- link: enp3s0   ## /external name, instead internal
   2a01:aaa:bbb:ccc::4                   -- link: enp3s0
   (vm.example.com)

[...]# resolvectl   query vm.example.lan
vm.example.lan: resolve call failed: 'vm.example.lan' not found

When I restart systemd-resolved:


[...]# systemctl  restart  systemd-resolved
[...]# resolvectl   domain
Global: example.lan
Link 2 (enp3s0): example.com ~.
Link 3 (vbr3s0):
Link 4 (virbr0):
Link 5 (virbr0-nic):
Link 6 (vnet0):
Link 7 (vnet1):       ## everything as before, nothing changed

[...]# resolvectl   dns
Global:
Link 2 (enp3s0): 213.133.98.98 213.133.99.99 2a01:4f8:0:1::add:1010 2a01:4f8:0:1::add:9999.   ## our external name server
Link 3 (vbr3s0):
Link 4 (virbr0):
Link 5 (virbr0-nic):
Link 6 (vnet0):
Link 7 (vnet1):     ## everthing as before, nothing changed here, too

[...]# resolvectl   query vm
vm: 192.168.122.87                        -- link: virbr0
   (vm.example.lan)             ## different, internal address as expected

[...]# resolvectl   query vm.example.lan
vm.example.lan: 192.168.122.87          -- link: virbr0.  ## internal address, as expected. 

@yuwata
Copy link
Member

yuwata commented Feb 23, 2021

Are there any relevant logs about parsing DNS= setting?

@yuwata yuwata added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Feb 23, 2021
@poettering
Copy link
Member

Hmm this doesn't really work this way. DNS= in resolved.conf configures the global DNS scope, not the per-link one. And the % syntax is typically not used unless ipv6 link local stuff is in place. Moreover the interface names are resolved the moment resolved parses the configuration file, i.e. likely before the network iface actually showed up, thus be parsing will fail in the typical case.

So far we have no mechanism to declare a DNS server in our configuration ahead of time. the usual workflow is that either resolved picks the DNS configuration automatically up from networkd, or that "resolvectl dns" is invoked by some external tool (e.g. NetworkManager) that tells systemd about the DNS servers to use the moment the ifaces appear.

@resdigita
Copy link
Author

As far as I know libvirt network configuration, there is no possibility to finally call an external script. This would currently mean that the changeover of servers to resolved considerably limits virtualisation using libvirt.

In any case, it is necessary to continue using dnsmasq as the local DNS server and reconfigure resolved accordingly or disable at all. Too bad, it could have been so easy.

@keszybz keszybz removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Feb 25, 2021
@resdigita
Copy link
Author

@yuwata (sorry, I just missed your post until today)

I just found:

systemd-resolved[994]: Positive Trust Anchors:
systemd-resolved[994]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
systemd-resolved[994]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arp>
systemd-resolved[994]: Failed to add DNS server address '192.168.122.1%virbr0#resdigita.lan', ignoring: No such device
systemd-resolved[994]: Using system hostname 'agora.resdigita.de'.
systemd[1]: Started Network Name Resolution.
...
...
NetworkManager[1104]: <info>  [1614245324.6357] dns-mgr[0x5587f1e50220]: init: dns=systemd-resolved rc-manager=symlink, plugin=systemd-resolv>

Which supports Lennart's analysis.

@resdigita
Copy link
Author

resdigita commented May 1, 2021

As I understand the current situation with libvirt, systemd-resolved can't detect the dynamically created libvirt virtual virbr0 interface because it is not yet present at boot time. Instead it has to be configured separately when it is up later in the boot process.

I now detected that libvirt offers hooks to execute scripts when an interface is set up (https://www.libvirt.org/hooks.html).

Unfortunately I can't configure systemd-resolved to use virbr0 at all. I'm now on Fedora 34 / systemd 248. Everthing that worked previously with systemd 246 now doesn't anymore.

I tried after boot completed:
# systemd-resolve --interface virbr0 --set-dns 192.168.122.1 --set-domain fritz.lan

and got the following status:

`# systemd-resolve --status
Global
Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (enp1s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
DNS Servers: 192.168.158.1 fd00::3681:c4ff:fe14:21b4
DNS Domain: fritz.box

Link 4 (virbr0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
DNS Servers: 192.168.122.1
DNS Domain: fritz.lan`

resolvectl query obviously ignores virbr0
[root@zbox ~]# resolvectl query zbox.fritz.lan zbox.fritz.lan: resolve call failed: 'zbox.fritz.lan' not found

But dig works fine:
`[root@zbox ~]# dig @192.168.122.1 zbox.fritz.lan

; <<>> DiG 9.16.11-RedHat-9.16.11-5.fc34 <<>> @192.168.122.1 zbox.fritz.lan
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22583
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;zbox.fritz.lan. IN A

;; ANSWER SECTION:
zbox.fritz.lan. 0 IN A 192.168.122.1

;; Query time: 0 msec
;; SERVER: 192.168.122.1#53(192.168.122.1)
;; WHEN: Sa Mai 01 11:02:26 CEST 2021
;; MSG SIZE rcvd: 59
`
It looks like the Scope is missing. How can I assign a scope?

@poettering
Copy link
Member

the fritz.lan line in your output is suffixed by an extra `

resolved subscribes to network interface changes and automatically should pick up any network interface with an ip address that is up. maybe virbr0 is not actualy up?

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label May 12, 2021
@jgneff
Copy link

jgneff commented Jun 10, 2021

I now detected that libvirt offers hooks to execute scripts when an interface is set up (https://www.libvirt.org/hooks.html).

I got this working on Ubuntu 20.04.2 LTS, thanks to that very helpful tip from @resdigita. I put the following script in the file /etc/libvirt/hooks/network:

#!/bin/bash
# Executed when a network is started or stopped or an interface is
# plugged/unplugged to/from the network. See:
#   Hooks for specific system management
#   https://www.libvirt.org/hooks.html

# After the network is started, up & running, the script is called as:
#   /etc/libvirt/hooks/network network_name started begin -
if [ "$1" == "default" ] && [ "$2" == "started" ]; then
    resolvectl dns virbr0 192.168.122.1
    resolvectl domain virbr0 '~kvm'
fi

The Libvirt default network is defined as follows:

<network>
  <name>default</name>
  <uuid>947ecc5a-d4a3-4b3d-a6f5-f22507a33a37</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:5e:0d:72'/>
  <domain name='kvm' localOnly='yes'/>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <host mac='52:54:00:57:a1:6d' name='windows' ip='192.168.122.2'/>
      <host mac='52:54:00:b5:a3:1d' name='armfocal' ip='192.168.122.3'/>
      <host mac='52:54:00:c4:5f:72' name='xenial' ip='192.168.122.4'/>
      <host mac='52:54:00:a8:61:0f' name='bionic' ip='192.168.122.5'/>
      <host mac='52:54:00:f2:21:0a' name='focal' ip='192.168.122.6'/>
      <host mac='52:54:00:62:19:54' name='groovy' ip='192.168.122.7'/>
      <host mac='52:54:00:c6:da:05' name='raspios' ip='192.168.122.8'/>
      <host mac='52:54:00:6b:e2:51' name='fedora' ip='192.168.122.9'/>
    </dhcp>
  </ip>
</network>

After rebooting, the DNS resolution on the host works as expected:

$ host raspios.kvm
raspios.kvm has address 192.168.122.8
$ resolvectl status virbr0
Link 4 (virbr0)
      Current Scopes: DNS          
DefaultRoute setting: no           
       LLMNR setting: yes          
MulticastDNS setting: no           
  DNSOverTLS setting: no           
      DNSSEC setting: no           
    DNSSEC supported: no           
  Current DNS Server: 192.168.122.1
         DNS Servers: 192.168.122.1
          DNS Domain: ~kvm         

Now, if only LXD could offer the same kind of external hook for setting up its lxdbr0 bridge. For now, I'm running the following script manually after rebooting:

#!/bin/bash
# Sets up DNS resolution for LXD containers

# Brings up the LXD bridge, if necessary
lxc network info lxdbr0

# LXD - Network configuration - Integration with systemd-resolved
# https://linuxcontainers.org/lxd/docs/master/networks.html
resolvectl dns lxdbr0 10.178.4.1
resolvectl domain lxdbr0 '~lxd'
resolvectl status lxdbr0

@jgneff
Copy link

jgneff commented Jun 11, 2021

Based on the conversation in LXD issue lxc/lxd#3391, I now have a common solution for both the Libvirt bridge virbr0 and the LXD bridge lxdbr0. Instead of adding the hook to Libvirt, as shown in my previous comment, you can add the hook directly to Systemd by placing the following unit file in /etc/systemd/system/dns-virbr0.service:

[Unit]
Description=Per-link DNS configuration for virbr0
BindsTo=sys-subsystem-net-devices-virbr0.device
After=sys-subsystem-net-devices-virbr0.device systemd-resolved.service

[Service]
Type=oneshot
ExecStart=/usr/bin/resolvectl dns virbr0 192.168.122.1
ExecStart=/usr/bin/resolvectl domain virbr0 '~kvm'

[Install]
WantedBy=sys-subsystem-net-devices-virbr0.device

Enable the service with:

$ sudo systemctl enable dns-virbr0.service
Created symlink /etc/systemd/system/sys-subsystem-net-devices-virbr0.device.wants/dns-virbr0.service
  → /etc/systemd/system/dns-virbr0.service.

Reboot, and it works:

$ systemctl status dns-virbr0.service
● dns-virbr0.service - Per-link DNS configuration for virbr0
     Loaded: loaded (/etc/systemd/system/dns-virbr0.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2021-06-11 11:54:09 PDT; 3min 12s ago
    Process: 1593 ExecStart=/usr/bin/resolvectl dns virbr0 192.168.122.1 (code=exited, status=0/SUC>
    Process: 1596 ExecStart=/usr/bin/resolvectl domain virbr0 ~kvm (code=exited, status=0/SUCCESS)
   Main PID: 1596 (code=exited, status=0/SUCCESS)

Jun 11 11:54:09 tower systemd[1]: Starting Per-link DNS configuration for virbr0...
Jun 11 11:54:09 tower systemd[1]: dns-virbr0.service: Succeeded.
Jun 11 11:54:09 tower systemd[1]: Finished Per-link DNS configuration for virbr0.
$ resolvectl status virbr0
Link 4 (virbr0)
      Current Scopes: DNS
DefaultRoute setting: no
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 192.168.122.1
         DNS Servers: 192.168.122.1
          DNS Domain: ~kvm
$ resolvectl query raspios.kvm
raspios.kvm: 192.168.122.8                     -- link: virbr0

-- Information acquired via protocol DNS in 1.5ms.
-- Data is authenticated: no
$ host raspios.kvm
raspios.kvm has address 192.168.122.8

@sgoncalo
Copy link

I think this is a more general problem related to bridges rather than libvirt. After encountering this issue on my NAS/KVM host running F34, I tried debugging in a VM.

I did a stock install of Fedora 34 workstation, updated to latest (5.12.12-300.fc34.x86_64 as/of 20210628) and installed cockpit. Libvirt, KVM, etc were not installed.

Resolved worked intially, but failed after I created a bridge off of enp1s0 using cockpit. Resolved had no DNS after each reboot, but came to life after systemctl restart systemd-resolved.service. Deleting the bridge (again via cockpit) resulted in a working system after reboot.

see also https://www.reddit.com/r/Fedora/comments/o9abx7/did_kernel_51212_cause_any_issues_with_networking/

@nkaminski
Copy link

@sgoncalo Absolutely agree, exact same behavior here as well with Fedora 34, kernel 5.12.12-300, and DNS when the route towards the DNS server is via a bridge interface. In my case, this bridge interface is configured by NetworkManager and contains one physical link as well as a handful of veth/vnet virtual devices connecting to VMs.

Specifically, one indicator of this issue is systemd-resolved reporting current scopes=none for the bridge owning the default route on bootup when DNS fails.

Full example of working and broken states on the referenced Reddit thread as well.

@sgoncalo
Copy link

sgoncalo commented Jul 2, 2021

Just noticed one of my Fedora 34 machines with a bridge did not have a boot problem. It had updated to 5.12.13-300.fc34.x86_64.
I tried reproducing the issue again in a VM. I could not reproduce issue seen on 5.12.12 with 5.12.13-300.fc34.x86_64. If soilution was promoted, thank you!

@pemensik
Copy link
Contributor

Adding domain redirection can be automated only when common domain is defined for the connection. I have this as part of <network>: <domain name='vm' localOnly='yes'/>. Because vm domain is assigned to machines under this connection, it would may add domain specific configuration for this interface. I think it should use resolvconf interface, which may make redirection working for any other local dns cache, not just systemd-resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer resolve
Development

No branches or pull requests

8 participants