-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os, runtime: Go 1.9 assumes epoll is non-blocking #21014
Comments
Heschi suggested a simpler approach, which is LockOSThread on the read loop for FUSE: this would avoid having epoll and FUSE interfering with each other. |
I assume that the problem occurs in the call to the While it seems clearly more likely when If I understand the FUSE code correctly, which I probably don't, the FUSE code is running a goroutine that sits in |
If I understand the problem correctly, then I don't see how |
Your analysis of the FUSE code is correct. Part of the time it sits in a Read waiting for the kernel, the other part, it tries to write back to the kernel. I havent tried the suggestion yet, going out for dinner now :-) . |
I think this would work if the FUSE daemon used I haven't yet been able to think of a workable fix in the Go runtime. If there is a way for the os package to detect that a file is on a FUSE filesystem, then we could avoid using the poller for that file. But I don't know of a way to do that. |
Let me see if I can make RawSyscall work. Since this affects tests primarily, that's probably acceptable. |
FWIW, on Linux you can find out if a file is on FUSE by calling fstatfs on the fd, and checking the f_type field of the result. It does sound like a lot of overhead, since most files arent on FUSE, and some care is needed since fstatfs is also forwarded to FUSE, so you cannot call it from where you call epoll. |
We could call |
I fixed it by kludging in something that triggers the POLL opcode before the Go runtime has the chance to. other fuse libraries like the one from @rsc and bazil.org should likely implement something similar. |
See extensive discussion on hanwen/go-fuse#165
Symptoms:
After discussion with Heschi and Austin, we came up with the following explanation:
The poller was not implemented with the assumption it could block (and is wired especially into the runtime?). Hence, when it runs on a P, a blocking epoll call will prevent other goroutines on the same P from running. If one of these is either the read syscall that gets the POLL opcode, or the write syscall that returns ENOSYS, then the POLL opcode never gets processed, and the runtime deadlocks itself.
This is only a problem if the same program both issues epoll calls and responds to them, in other words, tests for FUSE filesystems, so this is probably not a showstopper for the Go 1.9 release.
We could kludge around this by having go-fuse issue our own epoll call (a blocking syscall); this would trigger a FUSE POLL opcode which we could respond to with ENOSYS preventing further poll problems. (does epoll work for directory file descriptors?)
other approaches:
change the Golang runtime to assume that epoll is blocking. Then the syscalls responding to POLL could be moved onto other Ps. This may be too invasive a change to justify the use case, though?
Convince the FUSE development team to provide a capability flag for POLL. Then we could prevent POLL calls from happening altogether. This is much cleaner, but the linux kernel version is outside of our control in many circumstances.
The text was updated successfully, but these errors were encountered: