New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect MAC Address observed for flannel interface across kubernetes cluster nodes #1788
Comments
Your restore file is wrong. You need to create |
MAC address are being configured on flannel interface by
Also as for the output of iptables-save, it doesnt have FLANNEL-FWD
|
The MAC should be generated by linux when the interface is created. Your issue is unrelated to the ARP I think that maybe the grep is suppressing some lines. You shouldn't configure anything on the |
How are you running flannel? |
To elaborate our issue, over a period of time, the MAC addresses go out of sync. I just took the output from our running cluster. You can observe the mac address of flannel.2 is fa:4e:0e:a8:e6:30, but its only correct on some of the cluster nodes. Not all of them.
All nodes were expected to have mac address as fe80::f84e:eff:fea8:e630 for the IP: 192.168.209.64 in their arp table. But you see how random it is across the cluster. Also verified that value in etcd is correct:
Interim fix we figured is, to restart flannel across all the nodes.
After restart everything seems fine. All have correct mac address. iptables is also used by kube-proxy and firewalld processes also, in our environment. However, we highly think firewall is unrealed to this issue. We were using the version 0.19.0, there as well, we observed the iptables error. But never faced incorrect arp entries and network connectivity issues. But we are facing this since we upgraded to 0.22.0. This breaks our kubernetes cluster networking (intermittently). |
Yes but flannel shouldn't working with the ARP table this is linux doing. |
But why is Linux doing it only in 0.22.0? Haven't understood the doc completely, but does look like flannel has something to do with the ARP entries: flannel/pkg/backend/vxlan/vxlan.go Line 19 in c17e715
flannel/pkg/backend/vxlan/vxlan_network.go Line 186 in c17e715
|
Also the fact as I mentioned in my previous comment, that flannel restart fixes it all. Implies that flannel does take some action, which fixes the arp entries. Same action, flannel probably needs to take more frequently during its run-time (in absence of which we are hitting the issue). |
Those values are added on the bootstrap of flannel and shouldn't change that's why I mentioned linux because somehow they are modified. |
The issue is that it not getting updated in flannel. |
So you are restarting the nodes and when a node restarted you can't access it? This seems strange because your issue should be more frequent also for other users. |
I am trying to setup a new cluster with multiple nodes and trying to reproduce your issue |
We aren't even restarting. |
Right now when a new nodes is added to the cluster flannel will call that part of the code that you mentioned where it adds the MAC address that it get from the etcd entry. MAC address of an interface shouldn't change if you somehow didn't force that change. I am doing some tests to check if there are strange cases where it applies. |
I did some test also trying to restarts some nodes on the cluster. In case a node is restarted and the MAC for that specific node changed the ARP entry is updated on every node. What are the specific action that you are doing to have this issue? Are you starting k8s with kubeadm and deploying flannel with the manifest modified with your CIDR? After every nodes are up you encountered this issue with the MAC right? |
We exactly aren't aware what exactly is causing the issue.
To some of the questions you asked:
|
you could check on dmesg if there are something that force the mac change on the node. |
Your Environment
same issue after node restart
kubernetest annotations:
arp table on other nodes, should be
|
We eventually just added a cron job to restart flannel every 1 minute. |
As I said on my latest comment could you please increase flannel log verbosity? You should see a line with the mac address added on the etcd if it changes you should see that line multiple times. |
I change the code flannel/pkg/backend/vxlan/device.go Lines 124 to 134 in c17e715
to: func (dev *vxlanDevice) Configure(ipa ip.IP4Net, flannelnet ip.IP4Net) error {
if err := ip.EnsureV4AddressOnLink(ipa, flannelnet, dev.link); err != nil {
return fmt.Errorf("failed to ensure address of interface %s: %s", dev.link.Attrs().Name, err)
}
log.Infof("before up info:%v", dev.link)
if err := netlink.LinkSetUp(dev.link); err != nil {
return fmt.Errorf("failed to set interface %s to UP state: %s", dev.link.Attrs().Name, err)
}
nLink, err := netlink.LinkByName(dev.link.LinkAttrs.Name)
if err == nil {
if vxlan, ok := nLink.(*netlink.Vxlan); ok {
log.Infof("after up search vxlan name info:%v", vxlan)
}
}
return nil
} and I delete
seem netlink does't use the mac address
|
You have a different issue from @amolmishra23 |
I create a new issue |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Incorrect MAC Address observed for flannel interface across cluster nodes.
Expected Behavior
While we perform ARP resolution for the IP of flannel, from all nodes of cluster, it is supposed to show the correct MAC address. Thats not happening in this case, we observe different values for MAC address for same flannel, across all the nodes.
Current Behavior
The values in etcd are observed to be correct, FYI
Possible Solution
Periodically resync the ARP entries.
Steps to Reproduce (for bugs)
Nothing special was done to reproduce this, but it was observed regularly in our environment.
Context
This is regularly causing communication issues between the pods in our environment.
As part of preliminary investigation, as we tried to check for flannel logs, in journalctl, observed the following
The firewall rules, for which it was failing, we tried to manually execute iptables-restore for the same. Then also ran into the issue:
Your Environment
The text was updated successfully, but these errors were encountered: