Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os, internal/poll, runtime: how to use /dev/net/tun on Linux #30426

Closed
zx2c4 opened this issue Feb 27, 2019 · 17 comments

Comments

Projects
None yet
5 participants
@zx2c4
Copy link
Contributor

commented Feb 27, 2019

Go 1.12 brought Sysconn() for os.File. In theory that should let us OpenFile on /dev/net/tun, and then use Sysconn() to do all of the TUN-specific ioctls for setting up the device and giving it a name and setting some properties and such. From then out, it's supposed to be a matter of Read, Write, and Close. Since we don't need to call Fd() on the os.File at any point, we gain the benefits of using netpoll (which is epoll behind the scenes).

In addition to allowing the scheduler to make better decisions and not allocating an OS thread for every IO operation, netpoll also lets us call Read in one Go routine and Close in another, and the currently running Read will return immediately with an error saying that it's been closed. This is terrific for shutting down gracefully. To illustrate here's something that does not work as a consequence of using Fd:

        fd, err := os.OpenFile("/dev/net/tun", os.O_RDWR, 0)
        if err != nil {
                log.Fatal(err)
        }
        
        var ifr [unix.IFNAMSIZ + 64]byte
        copy(ifr[:], []byte("cheese"))
        *(*uint16)(unsafe.Pointer(&ifr[unix.IFNAMSIZ])) = unix.IFF_TUN
        
        _, _, errno := unix.Syscall(
                unix.SYS_IOCTL,
                uintptr(fd.Fd()),
                uintptr(unix.TUNSETIFF),
                uintptr(unsafe.Pointer(&ifr[0])),
        )
        if errno != 0 {
                log.Fatal(errno)
        }

        wait := sync.WaitGroup{}
        wait.Add(1)
        go func() {
                var err error
                for {   
                        _, err := fd.Read(b[:])
                        if err != nil {
                                break
                        }
                }
                log.Print("Read errored: ", err)
                wait.Done()
        }()
        time.Sleep(time.Second * 3)
        log.Print("Closing")
        err = fd.Close()
        if err != nil {
                log.Print("Close errored: " , err)
        }
        wait.Wait()
        log.Print("Exiting")

The problem with the above code is that fd.Read(b[:]) never returns after fd.Close() executes, and so the program hangs forever. Thanks to Sysconn in Go 1.12, we can fix that problem like this:

        fd, err := os.OpenFile("/dev/net/tun", os.O_RDWR, 0)
        if err != nil {
                log.Fatal(err)
        }
        
        var ifr [unix.IFNAMSIZ + 64]byte
        copy(ifr[:], []byte("cheese"))
        *(*uint16)(unsafe.Pointer(&ifr[unix.IFNAMSIZ])) = unix.IFF_TUN
        
        var errno syscall.Errno
        s, _ := fd.SyscallConn()
        s.Control(func(fd uintptr) {
                _, _, errno = unix.Syscall(
                        unix.SYS_IOCTL,
                        fd,
                        uintptr(unix.TUNSETIFF),
                        uintptr(unsafe.Pointer(&ifr[0])),
                )
        })
        if errno != 0 {
                log.Fatal(errno)
        }

        wait := sync.WaitGroup{}
        wait.Add(1)
        go func() {
                var err error
                for {   
                        _, err := fd.Read(b[:])
                        if err != nil {
                                break
                        }
                }
                log.Print("Read errored: ", err)
                wait.Done()
        }()
        time.Sleep(time.Second * 3)
        log.Print("Closing")
        err = fd.Close()
        if err != nil {
                log.Print("Close errored: " , err)
        }
        wait.Wait()
        log.Print("Exiting")

This works as expected with regards to that fd.Read(b[:]) getting cancelled. (In Go 1.11, I previously worked around this by manually polling on a cancellation pipe and the tun fd with some pretty gnarly ugliness. I've been eagerly awaiting the Go 1.12 release to stop having to play those games.)

There's a big problem, however: netpoll's use of epoll doesn't seem to agree with the the Linux tun driver's tun_chr_poll. Consider the following program:

package main

import "log"
import "os"
import "unsafe"
import "time"
import "syscall"
import "os/exec"
import "sync"
import "golang.org/x/sys/unix"

func main() {
	fd, err := os.OpenFile("/dev/net/tun", os.O_RDWR, 0)
	if err != nil {
		log.Fatal(err)
	}

	var ifr [unix.IFNAMSIZ + 64]byte
	copy(ifr[:], []byte("cheese"))
	*(*uint16)(unsafe.Pointer(&ifr[unix.IFNAMSIZ])) = unix.IFF_TUN

	var errno syscall.Errno
	s, _ := fd.SyscallConn()
	s.Control(func(fd uintptr) {
		_, _, errno = unix.Syscall(
			unix.SYS_IOCTL,
			fd,
			uintptr(unix.TUNSETIFF),
			uintptr(unsafe.Pointer(&ifr[0])),
		)
	})
	if errno != 0 {
		log.Fatal(errno)
	}

	wait := sync.WaitGroup{}
	wait.Add(1)
	go func() {
		var err error
		c := exec.Command("sh", "-c", "ip link set up cheese && ip a a 192.168.9.2/24 dev cheese")
		c.Start()
		c.Wait()
		exec.Command("sh", "-c", "ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese").Start()
		b := [2000]byte{}
		for {
			n, err := fd.Read(b[:])
			if err != nil {
				break
			}
			log.Printf("Read %d bytes", n)
		}
		log.Print("Read errored: ", err)
		wait.Done()
	}()
	time.Sleep(time.Second * 15)
	log.Print("Closing")
	err = fd.Close()
	if err != nil {
		log.Print("Close errored: ", err)
	}
	wait.Wait()
	log.Print("Exiting")
}

This is supposed to work, but actually the call to Read winds up blocking and not returning any data, and only ever returns upon the call to Close. The above program can be "fixed" by adding fd.Fd() just above the go func() { line, in order to remove fd from netpoll. This, however, incurs the pre-Sysconn-era problem of Close not being cancelable and loosing the nice other benefits of netpoll.

Anybody familiar with netpoll's particular use of epoll interested in taking a look under the hood?

zx2c4-bot pushed a commit to WireGuard/wireguard-go that referenced this issue Feb 27, 2019

tun: linux: netpoll is broken for tun's epoll
So this mostly reverts the switch to Sysconn for Linux.

Issue: golang/go#30426

@ianlancetaylor ianlancetaylor changed the title netpoll doesn't like linux's /dev/net/tun runtime: netpoll doesn't like linux's /dev/net/tun Feb 27, 2019

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

The way that netpoll uses epoll is straightforward. It adds descriptors to the epoll descriptor using EPOLL_CTL_ADD with EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET. It shouldn't be hard to try writing a C program to see how epoll behaves with /dev/net/tun.

@mikioh mikioh changed the title runtime: netpoll doesn't like linux's /dev/net/tun os: netpoll doesn't like linux's /dev/net/tun Feb 27, 2019

@mikioh mikioh added the OS-Linux label Feb 27, 2019

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

Just to be sure, can you please confirm that:

  • "ip tuntap add $devname tun user $username" works well on your node under the test,
  • "ip link show" displays the tun interface configured by your program during the test.
@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2019

Yes.

thinkpad ~ # ip tuntap add mode tun name cheese user zx2c4
thinkpad ~ # ip link show dev cheese
150: cheese: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500
    link/none 
@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2019

Using the same flags as Go's usage as epoll, I'm able to reproduce this in C. Here's the working blocking case as a baseline:

#include <sys/types.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <linux/if_tun.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	char buf[2000];
	ssize_t len;
	int tunfd, ret;
	struct ifreq ifreq = {
		.ifr_name = "cheese",
		.ifr_flags = IFF_TUN
	};
	
	tunfd = open("/dev/net/tun", O_RDWR);
	if (tunfd < 0) {
		perror("open(/dev/net/tun");
		return 1;
	}
	ret = ioctl(tunfd, TUNSETIFF, &ifreq);
	if (ret < 0) {
		perror("ioctl(IFF_TUN)");
		return 1;
	}
	system("ip link set up cheese && ip a a 192.168.9.2/24 dev cheese");
	popen("ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese", "r");
	while ((len = read(tunfd, buf, sizeof(buf))) >= 0)
		printf("Read %ld bytes\n", len);
	return 0;
}

Here's the broken epoll case:

#include <sys/types.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <linux/if_tun.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	char buf[2000];
	ssize_t len;
	int tunfd, efd, ret;
	struct ifreq ifreq = {
		.ifr_name = "cheese",
		.ifr_flags = IFF_TUN
	};
	struct epoll_event event = {
		.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET
	};
	
	tunfd = open("/dev/net/tun", O_RDWR);
	if (tunfd < 0) {
		perror("open(/dev/net/tun");
		return 1;
	}
	ret = fcntl(tunfd, F_GETFL);
	if (ret < 0) {
		perror("F_GETFL");
		return 1;
	}
	ret = fcntl(tunfd, F_SETFL, ret | O_NONBLOCK);
	if (ret < 0) {
		perror("F_SETFL");
		return 1;
	}
	efd = epoll_create1(0);
	if (efd < 0) {
		perror("epoll_create1");
		return 1;
	}
	ret = epoll_ctl(efd, EPOLL_CTL_ADD, tunfd, &event);
	if (ret < 0) {
		perror("epoll_ctl");
		return 1;
	}
	ret = ioctl(tunfd, TUNSETIFF, &ifreq);
	if (ret < 0) {
		perror("ioctl(IFF_TUN)");
		return 1;
	}
	system("ip link set up cheese && ip a a 192.168.9.2/24 dev cheese");
	popen("ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese", "r");
	for (;;) {
		len = read(tunfd, buf, sizeof(buf));
		if (len < 0 && errno == EAGAIN) {
			ret = epoll_wait(efd, &event, 1, -1);
			if (ret < 0) {
				perror("epoll_wait");
				return 1;
			}
			continue;
		}
		if (len < 0)
			break;
		printf("Read %ld bytes\n", len);
	}
	return 0;
}
@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2019

Interestingly, it appears that removing EPOLLET fixes things. That's not surprising as level triggering is basically the same as ordinary poll.

@mikioh mikioh changed the title os: netpoll doesn't like linux's /dev/net/tun os, internal/poll: netpoll doesn't like linux's /dev/net/tun Feb 27, 2019

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

it appears that removing EPOLLET fixes things

Sounds like you need to find out some good way to accommodate either a) blocking I/O w/ level-triggered notification, or b) non-blocking I/O w/ edge-triggered notification; the current runtime-integrated network poller is designed for just the latter.

If marking a tun/tap device file with non-blocking does make it possible to work together with the current runtime-integrated network poller, well, it's unlikely, tun_ring_recv in drivers/net/tun.c always returns EAGAIN when the argument noblock is true.

A naive fix might be to make the epoll registration adaptive by referring to the target file capability for non-blocking I/O.

@mikioh mikioh changed the title os, internal/poll: netpoll doesn't like linux's /dev/net/tun os, internal/poll, runtime: netpoll doesn't like linux's /dev/net/tun Feb 27, 2019

@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Feb 27, 2019

tun_ring_recv in drivers/net/tun.c always returns EAGAIN when the argument noblock is true.

Are we reading the same source? It returns 0 and with the buffer if noblock is true and a buffer is available. Otherwise it returns EAGAIN:

static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
{       
        DECLARE_WAITQUEUE(wait, current);
        void *ptr = NULL;
        int error = 0;
        
        ptr = ptr_ring_consume(&tfile->tx_ring);
        if (ptr)
                goto out;
        if (noblock) {
                error = -EAGAIN;
                goto out;
        }
        
        //[...]

out:    
        *err = error;
        return ptr;
}
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

If /dev/net/tun doesn't support EPOLLET, then I don't see a reasonable way to make it work with Go's poller.

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Feb 28, 2019

@bcmills bcmills added this to the Unplanned milestone Feb 28, 2019

@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

@mikioh Is that workaround remotely safe to do? That is, if an fd is in netpoll, and then you manually twiddle it to be nonblocking, won't netpoll tweak out? Or just become really inefficient? Generally the epoll ET pattern is something like:

for (;;) {
    while ((ret = read(fd, ...)) >= 0)
        ...
    if (ret < 0 && errno == EAGAIN)
        epoll(efd, ...);
}

If you put the fd into blocking mode, the reads will just block forever, and so it'll never return EAGAIN and epoll basically won't be used. This sounds like in theory it would make cancellation very difficult, since that read(fd) call just hangs there until a packet comes in. And if Go thinks it can epoll, it might not spawn a thread for the blocking call, which could then starve other Go routines.

Is this analysis correct? Or does Go somehow use epoll internally in a way that makes ET+blocking acceptable?

@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

You had some comments the other day about this working, then not working, on the BSDs, but I can't find them now for some reason. What was the verdict of that? In my quick trials with code similar to OP, I was able to Close() the file from one go routine and have the read canceled in the other. I thought this was decent enough indication things were working fine on the BSDs. From further inspecting what's going on, though, it looks like all the BSDs examine the file descriptor and then might actually wind up disable polling under certain conditions. Are we hitting these conditions? But if that's the case, why does the cancellation appear to work?

@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

Is this analysis correct? Or does Go somehow use epoll internally in a way that makes ET+blocking acceptable?

It looks like your workaround code actually doesn't work at all. Extend that timeout from 3 seconds to 10 seconds, so that there's time for the broadcast packet stuff to stop happening. That way the file is actually closed during a period when there isn't new data. Then, you'll see the same hang that we had in Go 1.11, which I'm forced to solve with this monstrosity.

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

[I deleted my previous comments mentioning BSDs because I was confused a bit, sorry.]

@zx2c4,

I skimmed Linux kernel code a bit and realized that the byte sequence (or character) interface on tun device doesn't support epoll, as your example code displays that the first epoll_pwait always returns EPOLLERR regardless of EPOLLET or blocking/non-blocking I/O; see tun_chr_poll in drivers/net/tun.c. I expected vfs_poll in fs/eventpoll.c to handle poll-capable stuff well but tun_chr_poll returns EPOLLERR for non-NETREG_REGISTERED devices, /dev/net/tun device files. Right now, I have no good idea to accommodate such stuff like poll-capable but non-epoll capable device files.

So, a workaround would be to have own poll for such devices files: https://play.golang.org/p/D3B8KBeW10y

PS: On BSD variants, the tun or similar software interfaces are well integrated with kqueue, so that's the reason I was confused initially, sorry for the confusion.

@crvv

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

tun works well with runtime poller on my machine.
The code doesn't work because the fd is added to poller before ioctl.
It should use x/sys/unix.Open to open the file, then ioctl, SetNonblock and os.NewFile.

Please see #22939

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

@crvv,

Oh, nice; that means that calling ioctl w/ IFF_XXX makes the device file NETREG_REGISTERED?

@zx2c4

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

Nice observation. This seems to work correctly:

package main

import "log"
import "os"
import "unsafe"
import "time"
import "os/exec"
import "sync"
import "golang.org/x/sys/unix"

func main() {
        tunfd, err := unix.Open("/dev/net/tun", os.O_RDWR, 0)
        if err != nil {
                log.Fatal(err)
        }

        var ifr [unix.IFNAMSIZ + 64]byte
        copy(ifr[:], []byte("cheese"))
        *(*uint16)(unsafe.Pointer(&ifr[unix.IFNAMSIZ])) = unix.IFF_TUN
        _, _, errno := unix.Syscall(
                unix.SYS_IOCTL,
                uintptr(tunfd),
                uintptr(unix.TUNSETIFF),
                uintptr(unsafe.Pointer(&ifr[0])),
        )

        if errno != 0 {
                log.Fatal(errno)
        }
        unix.SetNonblock(tunfd, true)

        fd := os.NewFile(uintptr(tunfd), "/dev/net/tun")

        wait := sync.WaitGroup{}
        wait.Add(1)
        go func() {
                var err error
                c := exec.Command("sh", "-c", "ip link set up cheese && ip a a 192.168.9.2/24 dev cheese")
                c.Start()
                c.Wait()
                exec.Command("sh", "-c", "ping -c 4 -f 192.168.9.1; ip link set down cheese; ip a f dev cheese").Start()
                b := [2000]byte{}
                for {
                        var n int
                        n, err = fd.Read(b[:])
                        if err != nil {
                                break
                        }
                        log.Printf("Read %d bytes", n)
                }
                log.Print("Read errored: ", err)
                wait.Done()
        }()
        time.Sleep(time.Second * 15)
        log.Print("Closing")
        err = fd.Close()
        if err != nil {
                log.Print("Close errored: ", err)
        }
        wait.Wait()
        log.Print("Exiting")
}

@mikioh mikioh changed the title os, internal/poll, runtime: netpoll doesn't like linux's /dev/net/tun os, internal/poll, runtime: how to use /dev/net/tun on Linux Mar 7, 2019

@mikioh

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

Closing, thanks @crvv for the valuable information.

@mikioh mikioh closed this Mar 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.