You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to track down an issue I experienced with https://github.com/kube-vip/kube-vip where DHCP failed to respond in a timely manner (Specifically: Invoking with --controlplane --ddns and an --address that refers to an fqdn will request a VIP from DHCP using that as a client ID). kube-vip runs in a container with host networking (I have verified that isc-dhclient works fine within that container). The relevant DHCP client usage is in https://github.com/kube-vip/kube-vip/blob/v0.8.0/pkg/vip/dhcp.go. The downstream ticket is kube-vip/kube-vip#844, but I am fairly certain this is a protocol issue.
The not-very-helpful error in kube-vip:
time="2024-05-07T22:58:09Z" level=info msg="waiting for ip from dhcp"
time="2024-05-07T22:58:45Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)"
time="2024-05-07T22:59:30Z" level=error msg="request failed, error: unable to receive an offer: got an error while the discovery request: no matching response packet received (waiting 12.288644945s)"
time="2024-05-07T23:00:18Z" level=error msg="failed to get an IP address after 3 attempts, error unable to receive an offer: got an error while the discovery request: no matching response packet received, giving up"
In this setup, I have been using eth0 as my interface (it had another static IP address, 192.168.96.3, on it too. MAC is 02:42:c0:a8:60:03). My FQDN is kube-api.cluster.internal, so kube-api gets sent as the DHCP client. node02.cluster.internal (192.168.96.4, 02:42:c0:a8:60:04) has a simple isc-dhcp-server on debian bookworm running with this conf:
I was a bit concerned with the recvfrom, but eventually found out that is just some MDNS discovery thing that runs every now and then from 192.168.96.1 (the actual gateway). When i added timestamps, that is actually 4 seconds later.
Here is a tcpdump -i eth0:
13:27:53.902280 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:27:53.902652 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300
13:27:56.787072 IP 192.168.96.1.17500 > 192.168.111.255.17500: UDP, length 150
13:27:57.299822 IP 192.168.96.1.mdns > 224.0.0.251.mdns: 0 SRV (QM)? google-nest-hub-xxxxxx._googlecast._tcp.local. (89)
13:27:57.299954 IP 192.168.96.1.mdns > 224.0.0.251.mdns: 0 SRV (QM)? google-nest-hub-xxxxxx._googlecast._tcp.local. (89)
13:27:57.751805 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:27:57.751872 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:27:58.008321 IP 192.168.96.1.51461 > 239.255.255.250.1900: UDP, length 101
13:27:58.903234 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:27:58.903629 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300
13:28:02.753256 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:28:02.753257 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:28:07.753908 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:28:07.753908 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:28:08.009455 IP 192.168.96.1.51461 > 239.255.255.250.1900: UDP, length 101
13:28:08.903432 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:28:08.903758 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300
Here is an strace from isc-dhclient which didn't get the info on the UDP socket (6), but on the raw socket (5). Note the different clientid (so slightly different IP returned, 192.168.96.155).
I am fairly certain this is a socket/protocol/binding/filtering issue which is why i reported it here first. i have filed downstream to kube-vip/kube-vip#844 as well.
I am trying to track down an issue I experienced with https://github.com/kube-vip/kube-vip where DHCP failed to respond in a timely manner (Specifically: Invoking with --controlplane --ddns and an --address that refers to an fqdn will request a VIP from DHCP using that as a client ID). kube-vip runs in a container with host networking (I have verified that isc-dhclient works fine within that container). The relevant DHCP client usage is in https://github.com/kube-vip/kube-vip/blob/v0.8.0/pkg/vip/dhcp.go. The downstream ticket is kube-vip/kube-vip#844, but I am fairly certain this is a protocol issue.
The not-very-helpful error in kube-vip:
In this setup, I have been using eth0 as my interface (it had another static IP address, 192.168.96.3, on it too. MAC is 02:42:c0:a8:60:03). My FQDN is kube-api.cluster.internal, so kube-api gets sent as the DHCP client. node02.cluster.internal (192.168.96.4, 02:42:c0:a8:60:04) has a simple isc-dhcp-server on debian bookworm running with this conf:
I have been running both tcpdump and wireshark to see what the packet flows are, and using strace to see what is actually consumed by this library.
Firstly, I have the strace -e network for the client communications:
I was a bit concerned with the recvfrom, but eventually found out that is just some MDNS discovery thing that runs every now and then from 192.168.96.1 (the actual gateway). When i added timestamps, that is actually 4 seconds later.
Here is a tcpdump -i eth0:
Here is an strace from isc-dhclient which didn't get the info on the UDP socket (6), but on the raw socket (5). Note the different clientid (so slightly different IP returned, 192.168.96.155).
Between my many runs i made sure to clear the IPs on the machine and stop the other dhclient processes.
The text was updated successfully, but these errors were encountered: