-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node local DNS creates dummy interface without IP address #282
Comments
Scratch that - I didn't read the code closely enough and confused Nevertheless, somehow the device is being created without an IP address and also without producing any error while configuring the device. |
@negz hitting the same issue here (using CoreOS) Were you able to resolve this? |
Something is very bizarre here... on one occasion, it was able to set the IP (just on one node, they are all identical and I didn't do any kind of changes):
I deleted the above Pod to see if it will work again, after the pod was recreated on the same node (part of a DaemonSet) it failed again. 🤷♂️ |
This seems to be OS/Kernel related. I can reproduce this on CoreOS running build 1688.5.3 (kernel 4.14.32) or build 1967.3.0 (kernel 4.14.88) - this is latest stable. However this works just fine on a Debian Jessie with Kernel 4.9.0. /edit: going to try with latest CoreOS alpha /edit 2: same issue with latest CoreOS alpha (2023.0.0) |
@dannyk81 We never solved this. Ended up writing a mutating admission webhook (https://github.com/planetlabs/legion) that we used to inject a small caching CoreDNS sidecar container into pods. |
thanks @negz! Could you confirm which OS/Kernel combo are you/were you using for this? I wonder if it's CoreOS as well... since I can't reproduce this on Debian. |
It was CoreOS. I can’t say for sure which kernel version but it would have most likely been the stable CoreOS release at the time of writing. |
Thanks, that confirms it. Considering I tried 4 or 5 different versions, seems like a general issue with that OS. |
@prameshj any chance you could take a look at this? seems like node local cache is broken on CoreOS. |
Is the 169.254.20.10 ip address used in some other interface on coreOS? Can you list the interfaces on the host and share the output? Also is the cluster being created using kubeadm? |
Hi @prameshj, thanks for looking into this 😄 please see details below.
The ip address (169.254.25.10) is not used by any other interface, here's
I tried that (with 10.10.10.10 and other IPs), the result is the same:
Yes, it is. I'm using Kubespray to deploy the cluster and it uses kubeadm. It's deploying K8s v1.13.2. For debugging, I extracted the
|
Thanks Danny! I think this is the relevant strace section. I was able to map the netlink request and the parameters, but I wasn't thorough. Bind new ipaddress. unix.RTM_NEWADDR = x14 1593 sendto(5, "\x30\x00\x00\x00\x14\x00\x05\x06\x07\x00\x00\x00\x00\x00\x00\x00\x02\x20\x00\x00\x19\x00\x00\x00\x08\x00\x02\x00\x0a\x0a\x0a\x0a\x08\x00\x01\x00\x0a\x0a\x0a\x0a\x08\x00\x04\x00\x0a\x0a\x0a\x0a", 48, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 48 Would it be possible for you to by hand create a dummy interface, assign it an ip and run strace on those 2 commands ? It will be easier to debug this. Interestingly, kube-ipvs0 is also a dummy interface, created by the same code, but different version of the github repo. I see that kube-ipvs0 has some service ip addresses bound to it. ipvs code uses this, much older commit. I wonder if something changed between these 2 commits that is causing the error. |
@prameshj, attaching straces of two commands
The interface was created and address added:
|
@prameshj I was able to decode (using
Go
Look at the outputs, I see 3 difference:
/edit: in fact, for a
Perhaps it's related to this issue --> vishvananda/netlink#329? |
@prameshj I suspect that this PR (vishvananda/netlink#248) could be the root cause. |
@prameshj I wanted to test my theory so I built node-cache with a modified version of The payload now looks identical to what I see in strace when running
But, unfortunately the address is still not added to the interface:
|
Thanks for investigating this, Danny! The broadcast ip issue seemed like the rootcause.. hmm. Can you try building the image by syncing to this commit? Thats the one ipvs uses and it seems to work on your setup. I assume you are running kube-proxy in ipvs mode? Another thing to try, are you able to by hand assign an ip to nodelocaldns interface? I wonder if the ip assignment from code succeeds momentarilty and somehow later gets removed. |
Sure, let me build up a variant with the commit you mentioned. Indeed, I'm able to add an ip manually using
|
Interesting... There is code in the node-cache binary to periodically ensure that the interface exists and has the same address - Line 196 in f08c140
This will call AddrAdd to ensure the ip address exists on the interface. Maybe this call somehow errors out and the ip is removed? If you are building a custom image, it would be great if you can log the error here - Line 26 in f08c140
The error was ignored since it was expected to error out in case the ip already existed. Or you can comment out that line and see if the ip sticks when running that custom binary. Thanks for trying this out! |
Tried your suggestions, with the following diff:
however it seems like there's no error being returned. I also added several info messages to see how
and here's the log output:
|
Synced up with Danny offline. We found that running exec.Command("ip", "addr", "add" ...) instead of netlink.AddrAdd works. Something in the netlink library is causing the issue. |
kubernetes#282 There is sometimes a race in link creation and ip assignment. If ip assignment is done too soon, the ip address does not persist.
Danny and I were able to verify that there is some race condition between LinkAdd and AddrAdd. If AddrAdd happens too soon, the behavior is undefined. Sometimes the ip sticks, sometimes it is assigned and then deleted, in some cases, it did not get assigned. Danny was able to see this issue with just: I am making a change to check for ip address and add it just before invoking coreDNS in the node-cache code. |
…ntainer (kubernetes-sigs#4074) * Mount host /run/xtables.lock in nodelocaldns container * fix typo in nodelocaldns daemonset manifest yml * Add prometheus scrape annotation, updateStrategy and reduce termination grace period * fix indentation * actually fix it.. * Bump k8s-dns-node-cache tag to 1.15.1 (fixes kubernetes/dns#282)
kubernetes#282 There is sometimes a race in link creation and ip assignment. If ip assignment is done too soon, the ip address does not persist.
v1.15.0 is affected by kubernetes/dns#282
kubernetes#282 There is sometimes a race in link creation and ip assignment. If ip assignment is done too soon, the ip address does not persist.
I did some research on this error... and found this patch in the Linux kernel, for the dummy interface: torvalds/linux@554873e#diff-cb533d7ae320ae01c23e1381a803bc14 which seems to mention a race condition between creating/removing a dummy interface - which is what Node-cache does when it starts? If the tags are to be believed (?) this started making its way into Linux with v4.17.xxx. And looking at the CoreOS releases (stable channel) https://coreos.com/releases/, CoreOS stable released with a 4.14.xx kernel until March 11, 2019 - when it bumped to 4.19.25. So if I am following this correctly, the race condition solved in this patch made it into CoreOS Stable in March. Anyhow - I am unable to reproduce the bug using the following loop on the latest CoreOS Stable: ip link del dummy0 ; ip link add dummy0 type dummy && ip addr add 10.10.10.10/32 dev dummy0 && ip addr show dev dummy0 I run that 50k times, the IP was always correctly assigned. So I might be completely wrong with this - but @dannyk81 or @prameshj I would be really interested if someone manages to reproduce this error on a newer version of CoreOS? 😃 |
@yannh thanks for this! We in fact tested on a >4.18 kernel (using an Alpha CoreOS at the time) and hit the same issue, the extra validation that @prameshj put in place solved the problem. Later on we actually found the true culprit (was meaning to update about this here), it seems that The above happened because CoreOS on VMware adds a
above would actually match any interface on a VMware VM, we did two things to solve this:
|
@yannh I don't think it's driver/vmware specific... it's just that this default |
For anyone walking by here and trying to import the network file above, the Match patterns should be whitespace separated unless you want all interfaces to fail at next boot with following log :
|
Hi @prameshj, this step indeed seems redundant at this point as long as systemd-networkd is setup to ignore the nodelocaldns interface 👍 |
Thanks for confirming @dannyk81 |
Hello,
I've just started experimenting with the new node local DNS cache. For reasons I haven't yet determined the
NetIfManager
manages to create thenodelocaldns
dummy IP in my setup, but fails to allocate it an IP address.NetIfManager
does not check the error return value when assigning IPs, so this manifests as follows:$ kubectl --kubeconfig=/Users/negz/tfk-negz.kubecfg -n kube-system logs nodelocaldns-ltlfp --previous 2018/12/21 03:22:52 2018-12-21T03:22:52.066Z [INFO] Tearing down 2018/12/21 03:22:53 2018-12-21T03:22:53.064Z [INFO] Setting up networking for node cache listen tcp 169.254.20.10:8080: bind: cannot assign requested address
I see when inspect the dummy interface that it's missing an IP:
The text was updated successfully, but these errors were encountered: