bridge slave never gets DHCP response #4674

Open
martinpitt opened this Issue Nov 15, 2016 · 12 comments

Comments

Projects
None yet
5 participants
Contributor

martinpitt commented Nov 15, 2016

Submission type

  • Bug report
  • Request for enhancement (RFE)

systemd version the issue has been seen with

232

Used distribution

Debian unstable

I'm trying to set up an ethernet iface with DHCP that is part of a bridge:

$ cat br0.netdev
[NetDev]
Name=br0
Kind=bridge

$ cat br0.network 
[Match]
Name=br0

$ $ cat ens3.network 
[Match]
Name=ens3

[Network]
DHCP=ipv4
Bridge=br0

With that, br0 comes up fine, and ens3 gets put into the bridge:

$ bridge link
2: ens3 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 

$ networkctl status br0
● 3: br0
       Link File: /lib/systemd/network/99-default.link
    Network File: /etc/systemd/network/br0.network
            Type: ether
           State: degraded (configured)
          Driver: bridge
      HW Address: 66:58:d2:97:03:25
         Address: fec0::6458:d2ff:fe97:325
                  fe80::6458:d2ff:fe97:325
         Gateway: fe80::2

However, ens3 never gets a DHCP response:

$ networkctl status ens3
● 2: ens3
       Link File: /lib/systemd/network/99-default.link
    Network File: /etc/systemd/network/ens3.network
            Type: ether
           State: carrier (configuring)
            Path: virtio-pci-0000:00:03.0
          Driver: virtio_net
          Vendor: Red Hat, Inc
           Model: Virtio network device
      HW Address: 52:54:00:12:34:56

ens3 works fine if I comment out Bridge=br0, i. e. detach it from the bridge. But bridged interfaces are still fully valid interfaces so DHCP should continue to work; so this looks like a bug to me.

Setting addresses manually does work, even though status still remains at "configuring":

[Network]
#DHCP=ipv4
Bridge=br0
Address=10.0.2.15/24
Gateway=10.0.2.2
IPv6AcceptRA=no

(not sure what it's waiting for).

A debug-enabled networkd log for a run with DHCP=ipv4 is in https://gist.github.com/martinpitt/e70ca94d5c03977fcb8a36f98bf41feb . That shows that networkd is attempting DHCP on it, but never gets a response.

@martinpitt martinpitt added the network label Nov 15, 2016

Contributor

martinpitt commented Nov 15, 2016

At first I thought this was an artifact of QEMU's builtin DHCP server, but I can also reproduce it with dnsmasq as server inside the VM. Its log shows that ens3's DHCP requests come in and get answered, but somehow networkd seems to ignore it:

dnsmasq-dhcp[8902]: DHCPDISCOVER(veth43) 32:2c:13:3b:86:0d 
dnsmasq-dhcp[8902]: DHCPOFFER(veth43) 192.168.6.52 32:2c:13:3b:86:0d 
dnsmasq-dhcp[8902]: DHCPDISCOVER(veth43) 32:2c:13:3b:86:0d 
dnsmasq-dhcp[8902]: DHCPOFFER(veth43) 192.168.6.52 32:2c:13:3b:86:0d 
[... repeats ...]
Contributor

martinpitt commented Nov 15, 2016

Reproducer:

--- a/test/networkd-test.py
+++ b/test/networkd-test.py
@@ -227,6 +227,18 @@ DHCP=%s
     def test_hotplug_dhcp_ip6(self):
         self.do_test(coldplug=False, ipv6=True)

+    def test_bridge_no_address(self):
+        self.writeConfig('/run/systemd/network/br-test.netdev', '''\
+[NetDev]
+Name=br-test
+Kind=bridge''')
+        self.writeConfig('/run/systemd/network/br-test.network', '''\
+[Match]
+Name=br-test''')
+        self.addCleanup(subprocess.check_call, ['ip', 'link', 'del', 'dev', 'br-test'])
+
+        self.do_test(dhcp_mode='ipv4', extra_opts='Bridge=br-test')
+
     def test_route_only_dns(self):
         self.writeConfig('/run/systemd/network/myvpn.netdev', '''\
 [NetDev]

output:

---- interface status ----
50: test_eth42@router_eth42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-test state UP group default qlen 1000
    link/ether f2:e7:57:98:05:23 brd ff:ff:ff:ff:ff:ff
---- networkctl status test_eth42 ----
● 50: test_eth42
       Link File: /lib/systemd/network/99-default.link
    Network File: /run/systemd/network/test_eth42.network
            Type: ether
           State: carrier (configuring)
          Driver: veth
      HW Address: f2:e7:57:98:05:23
---- systemd-networkd.service ----
Nov 15 15:07:06 donald systemd[1]: Starting Network Service...
Nov 15 15:07:06 donald systemd-networkd[15492]: br-test: netdev ready
Nov 15 15:07:06 donald systemd-networkd[15492]: dummy0: Gained IPv6LL
Nov 15 15:07:06 donald systemd-networkd[15492]: lxdbr0: Gained IPv6LL
Nov 15 15:07:06 donald systemd-networkd[15492]: lxcbr0: Gained IPv6LL
Nov 15 15:07:06 donald systemd-networkd[15492]: wlp3s0: Gained IPv6LL
Nov 15 15:07:06 donald systemd-networkd[15492]: Enumeration completed
Nov 15 15:07:06 donald systemd[1]: Started Network Service.
Nov 15 15:07:06 donald systemd-networkd[15492]: test_eth42: IPv6 disabled for interface: Success
Nov 15 15:07:06 donald systemd-networkd[15492]: test_eth42: Gained carrier
Nov 15 15:07:06 donald systemd-networkd[15492]: router_eth42: Gained carrier
Nov 15 15:07:06 donald systemd-networkd[15492]: br-test: IPv6 enabled for interface: Success
Nov 15 15:07:07 donald systemd-networkd[15492]: br-test: Gained carrier
Nov 15 15:07:07 donald systemd-networkd[15492]: br-test: Gained IPv6LL
Nov 15 15:07:07 donald systemd-networkd[15492]: router_eth42: Gained IPv6LL


---- dnsmasq log ----
Nov 15 15:07:06 dnsmasq[15488]: gestartet, Version 2.76, Cachegröße 150
Nov 15 15:07:06 dnsmasq[15488]: Übersetzungsoptionen: IPv6 GNU-getopt DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect inotify
Nov 15 15:07:06 dnsmasq-dhcp[15488]: DHCP, IP range 192.168.5.10 -- 192.168.5.200, lease time 1h
Nov 15 15:07:06 dnsmasq-dhcp[15488]: DHCP, sockets bound exclusively to interface router_eth42
Nov 15 15:07:06 dnsmasq[15488]: lese /etc/resolv.conf
Nov 15 15:07:06 dnsmasq[15488]: Benutze Namensserver 192.168.2.1#53
Nov 15 15:07:06 dnsmasq[15488]: Benutze Namensserver fe80::1%wlp3s0#53
Nov 15 15:07:06 dnsmasq[15488]: Benutze Namensserver 127.0.0.53#53
Nov 15 15:07:06 dnsmasq[15488]: /etc/hosts gelesen - 10 Adressen
Nov 15 15:07:09 dnsmasq-dhcp[15488]: DHCPDISCOVER(router_eth42) f2:e7:57:98:05:23 
Nov 15 15:07:09 dnsmasq-dhcp[15488]: DHCPOFFER(router_eth42) 192.168.5.89 f2:e7:57:98:05:23 
Nov 15 15:07:09 dnsmasq-dhcp[15488]: DHCPDISCOVER(router_eth42) f2:e7:57:98:05:23 
Nov 15 15:07:09 dnsmasq-dhcp[15488]: DHCPOFFER(router_eth42) 192.168.5.89 f2:e7:57:98:05:23 
Nov 15 15:07:11 dnsmasq-dhcp[15488]: DHCPDISCOVER(router_eth42) f2:e7:57:98:05:23 
Nov 15 15:07:11 dnsmasq-dhcp[15488]: DHCPOFFER(router_eth42) 192.168.5.89 f2:e7:57:98:05:23 
Nov 15 15:07:15 dnsmasq-dhcp[15488]: DHCPDISCOVER(router_eth42) f2:e7:57:98:05:23 
Nov 15 15:07:15 dnsmasq-dhcp[15488]: DHCPOFFER(router_eth42) 192.168.5.89 f2:e7:57:98:05:23 
Contributor

martinpitt commented Nov 15, 2016

I'm trying to understand the code and what happens. In tcpdump -i test_eth42 I clearly see the request and response:

16:06:37.475637 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 3a:0b:53:b6:03:16 (oui Unknown), length 288
16:06:40.229452 IP donald53.bootps > 192.168.5.117.bootpc: BOOTP/DHCP, Reply, length 300

but sd-dhcp-client.c's client_handle_message() is never being called (while it is being called for the incoming lease without Bridge=). client_receive_message_udp() isn't involved here in either case, but client_receive_message_raw() should be -- but that isn't being called either (it comes before client_handle_message()).

Contributor

martinpitt commented Nov 15, 2016

According to https://wiki.linuxfoundation.org/networking/bridge#does-dhcp-work-overthrough-a-bridge DHCP should work through a bridge, and it does work fine with NetworkManager as well as calling dhclient -1 -v test_eth42.

Contributor

martinpitt commented Nov 15, 2016

I'm out of my knowledge here, I'm afraid. As soon as I comment out the IFLA_MASTER message part in netdev_enslave_ready() then I do get a DHCP OFFER (of course the interface fails later on as joining the bridge failed). So it seems that somehow merely adding test_eth42 to the bridge causes the DHCP responses to not get received any more. But at the same time they do get received with dhclient, so the reason cannot be in the routing, Linux' bridge handling, etc. and is clearly within networkd.

I went through all usages of the link->network->bridge field, and all others except for link_joined() are unrelated (disabling the handling of the field there doesn't change this behaviour). @teg, @ssahani , would you have any idea what could cause these replies to not get received? TIA!

@martinpitt martinpitt changed the title from bridge slave never gets DHCP address to bridge slave never gets DHCP response Nov 15, 2016

@poettering poettering added the dhcp label Nov 17, 2016

Contributor

ssahani commented Nov 28, 2016

I can reproduce the same scenario. DHCP OFFER reaching to the bridge and the ethernet as well. But not reaching to networkd. Wild guess would be something to do with BPF filter itself ?

Contributor

martinpitt commented Nov 28, 2016

Oh, do we set up a BPF filter for events somewhere? That sounds plausible, as I don't even see a wakeup in strace when the DHCP response comes in. Do you have a pointer where that happens by any chance?

Contributor

ssahani commented Nov 28, 2016

dhcp-network.c: _bind_raw_socket . I am not sure I think we are dropping at the BPF filter since the packets reaches to the ethernet port . And the fd never gets waked up . The data never reaches to the socket.

Contributor

martinpitt commented Nov 28, 2016

For testing I commented out all "ignore" filters and even commented out the setsockopt SO_ATTACH_FILTER calll, that didn't help.

Contributor

ssahani commented Jan 17, 2017

I tested with dhcpcd it behaves same. Somehow dhclient works.

ip a

: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
    link/ether 08:00:27:40:aa:db brd ff:ff:ff:ff:ff:ff
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 96:af:f2:ea:44:34 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::94af:f2ff:feea:4434/64 scope link 
       valid_lft forever preferred_lft forever

Running dhcpcd

 # dhcpcd enp0s8
dhcpcd-6.11.5 starting
enp0s8: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' PREINIT
enp0s8: executing `/usr/lib/dhcpcd/dhcpcd-run-hooks' CARRIER
DUID 00:01:00:01:1f:3f:a3:f4:08:00:27:f8:62:01
enp0s8: IAID 27:40:aa:db
enp0s8: adding address fe80::5432:4524:c876:5549
enp0s8: pltime infinity, vltime infinity
if_addaddress6: Permission denied
enp0s8: delaying IPv6 router solicitation for 0.5 seconds
enp0s8: delaying IPv4 for 0.6 seconds
enp0s8: soliciting an IPv6 router
enp0s8: delaying Router Solicitation for LL address
enp0s8: reading lease `/var/lib/dhcpcd/dhcpcd-enp0s8.lease'
enp0s8: discarding expired lease
enp0s8: soliciting a DHCP lease
enp0s8: sending DISCOVER (xid 0xf756a91), next in 3.2 seconds  <========= Look here
enp0s8: sending DISCOVER (xid 0xf756a91), next in 7.9 seconds
enp0s8: sending DISCOVER (xid 0xf756a91), next in 16.4 seconds
enp0s8: sending DISCOVER (xid 0xf756a91), next in 32.4 seconds
timed out
dhcpcd exited
```


Owner

keszybz commented Jun 29, 2017

See also 764febc.

Contributor

toanju commented Jul 18, 2017

Configuring a L3 address on the bridge slave seems odd to me. Wouldn't you only do this on the bridge itself? The whole bridge is a L2 device and should therefor only forward traffic based on mac addresses. Does it really make sense to have an L3 address on a bridge slave?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment