Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: InterfaceAddrs() "no such network interface" with rapidly changes interfaces #51934

Closed
uablrek opened this issue Mar 25, 2022 · 7 comments
Closed
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@uablrek
Copy link

uablrek commented Mar 25, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.6 linux/amd64

The problem is in Kubernetes v1.23.3 which I think uses go1.17.6, but at least go1.17.x.

Does this issue reproduce with the latest release?

N/A. Have to wait for K8s.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
# (can't run this in K8s container)

System is openstack with;

Linux control-plane-n108-mast-n057 5.3.18-24.99-default #1 SMP Sun Jan 23 19:03:51 UTC 2022 (712a8e6) x86_64 x86_64 x86_64 GNU/Linux

What did you do?

In heavy stress tests on a very large K8s cluster (159 nodes) the update of addresses made by the internal k8s load-balancer (kube-proxy) sometimed (very infrequent) fails to read all addresses with net.InterfaceAddrs. The error is;

E0209 12:18:47.395530       1 proxier.go:1110] "Failed to list all node IPs from host" err="error listing LOCAL type addresses from host, error: Could not get addresses: route ip+net: no such network interface"

The stress test contantly reboot nodes and interfaces are created and removed all the time. At very infrequent times the error above i thrown.

Please see the K8s issue for more info, kubernetes/kubernetes#108065

What did you expect to see?

No interface related errors thrown by net.InterfaceAddrs, even if interfaces are created/removed all the time.

My interpretation is that the error is thrown if net.InterfaceAddrs firsts reads an interface name, then when to read addresses, the interface is gone.

A "no such network interface" must be handled (ignored) internally in net.InterfaceAddrs.

We as a users can of course add re-tries, but how many and for how long? Since interfaces are created/removed all the time we may never be able to read the addresses.

What did you see instead?

The error described above.

@uablrek
Copy link
Author

uablrek commented Mar 25, 2022

A work-around may be to read interfaces with net.Interfaces and then read addresses per interface with net.Interface.Addrs and create our own "safe InterfaceAddrs". But IMO we shouldn't have to do that.

@seankhliao seankhliao changed the title affected/package: net. net.InterfaceAddrs() throws "no such network interface" net: InterfaceAddrs() "no such network interface" with rapidly changes interfaces Mar 25, 2022
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 25, 2022
@seankhliao
Copy link
Member

cc @ianlancetaylor @neild

@aojea
Copy link
Contributor

aojea commented Dec 11, 2024

So the problem is kind of TOCTOU, writing my notes down

go/src/net/interface.go

Lines 129 to 130 in a9922d0

func InterfaceAddrs() ([]Addr, error) {
ifat, err := interfaceAddrTable(nil)

gets all Addresses from the RIB

func interfaceAddrTable(ifi *Interface) ([]Addr, error) {
tab, err := syscall.NetlinkRIB(syscall.RTM_GETADDR, syscall.AF_UNSPEC)

and later gets all the Links

ift, err = interfaceTable(0)

func interfaceTable(ifindex int) ([]Interface, error) {
tab, err := syscall.NetlinkRIB(syscall.RTM_GETLINK, syscall.AF_UNSPEC)

and finally it merges the information

ifat, err := addrTable(ift, ifi, msgs)

ifi, err = interfaceByIndex(ift, int(ifam.Index))
if err != nil {
return nil, err
}

go/src/net/interface.go

Lines 157 to 163 in a9922d0

func interfaceByIndex(ift []Interface, index int) (*Interface, error) {
for _, ifi := range ift {
if index == ifi.Index {
return &ifi, nil
}
}
return nil, errNoSuchInterface

So between syscall.NetlinkRIB(syscall.RTM_GETADDR, syscall.AF_UNSPEC) and syscall.NetlinkRIB(syscall.RTM_GETLINK, syscall.AF_UNSPEC) , if the information changes , an interface is deleted, there is an Address that no longer has an interface, the function fails.

We have 2 datasets, addresses that reference interfaces by index

struct ifaddrmsg {
unsigned char ifa family;
unsigned char ifa prefixlen;
unsigned char ifa flags;
unsigned char ifa scope;
int ifa index;
}

and interfaces

struct ifinfomsg {
unsigned char ifi family;
unsigned short ifi type;
int ifi index;
unsigned int ifi flags;
unsigned int ifi change;
};

If we list interfaces first and new interfaces are added with Addresses we have the same problem in the opposite direction, as the new interface will not be present in the list of new addresses.

Meanwhile the list of Addresses contains a subsets of the Interfaces list there is no problem

Some ideas:

  • I'm not knowledgeable enough on netlink to know if we can do some kind of atomic operations to obtain Addresses and Links at the same time.
  • If we list Interfaces first MAYBE adding Interface + Address is slower and the race window is negligible, but does not sounds a great solution
  • Recognize that this method is racy and absorb the error if this happens ONLY when we list ALL addresses

@uablrek
Copy link
Author

uablrek commented Dec 11, 2024

Recognize that this method is racy and absorb the error if this happens ONLY when we list ALL addresses

👍 Just ignore the error internally if the interface doesn't exist. For the user it's the same as if net.InterfaceAddrswas called a millisec later. I.e. the case must be handled anyway

@ianlancetaylor
Copy link
Member

It seems that in the len(ift) != 0 case, if interfaceByIndex returns errNoSuchInterface, we should ignore that error and keep processing messages.

@aojea
Copy link
Contributor

aojea commented Dec 11, 2024

Sent fix https://go-review.googlesource.com/c/go/+/635196

Tested with

package main

import (
        "fmt"
        "net"

        "github.com/vishvananda/netlink"
)

func main() {

        go func() {
                for {
                        err := linkAddDel("test-interface", &net.IPNet{IP: net.ParseIP("127.0.0.77"), Mask: net.CIDRMask(32, 32)})
                        if err != nil {
                                panic(err)
                        }
                }
        }()

        go func() {
                for {
                        addr, err := net.InterfaceAddrs()
                        if err != nil {
                                panic(err)
                        }
                        fmt.Println(addr)
                }
        }()

        select {}
}

func linkAddDel(name string, addr *net.IPNet) error {
        lnk := &netlink.Dummy{
                LinkAttrs: netlink.LinkAttrs{
                        Name: "name",
                },
        }

        err := netlink.LinkAdd(lnk)
        if err != nil {
                return err
        }

        ipConfig := &netlink.Addr{
                IPNet: addr,
        }

        if err = netlink.AddrAdd(lnk, ipConfig); err != nil {
                return err
        }
        err = netlink.LinkDel(lnk)
        if err != nil {
                return err
        }
        return nil
}

without the fix, after a few seconds it panics trying to list the interfaces, with the fix never panics

[127.0.0.1/8 172.28.113.171/32 192.168.8.1/24 192.168.9.1/24 ::1/128 fe80::1d90:8c30:9c15:211f/64 fc00:f853:ccd:e793::1/64 fe80::42:c7ff:fe8f:ab8e/64 fe80::1/64 fe80::42:b1ff:fed9:ef63/64 fe80::3c9f:e7ff:fe67:c8b8/64 fe80::8882:93ff:fe9c:b57/64 fe80::cc3f:b1ff:fe27:41c4/64 fe80::bc5b:f1ff:feab:c9e8/64]
panic: route ip+net: no such network interface

goroutine 21 [running]:
main.main.func2()
        /usr/local/google/home/aojea/src/tmp/interfaces/main.go:25 +0x67
created by main.main in goroutine 1
        /usr/local/google/home/aojea/src/tmp/interfaces/main.go:21 +0x26
exit status 2

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/635196 mentions this issue: net: avoid unnecessary interface lookup fetching all interface addresses

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 11, 2024
@dmitshur dmitshur modified the milestones: Backlog, Go1.24 Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants