-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running tests sometimes hangs at the FUSE level #97
Comments
I have no specific ideas or tips, but I will say (very generically, sorry) that with past issues like this I was able to get a lot of mileage out of digging around in |
gocryptfs developer here, https://github.com/hanwen/go-fuse had a similar problem. Does the hang reproduce every time when you run with The workaround in go-fuse is this: https://github.com/hanwen/go-fuse/blob/master/fuse/poll.go |
I encountered this problem and this is my working theory/understanding of what is happening. Thanks @rfjakob for the pointer. Background: Go File IOGo does file IO using epoll in Linux (or the equivalent in non-Linux targets). Go opens files in non-blocking mode and adds them to the runtime poller. The poller periodically polls IO events from the kernel for the registered files, and returns any goroutine waiting for IO as ready to wake up to the scheduler. The problem: FUSE + Test == DeadlockThe problem here happens because the FUSE server and the test are running on the same process, and therefore sharing the same runtime poller. When the test makes a file IO that would cause a poll to a file in the file-system backed by the FUSE server, the test goroutine will sleep until the file is ready. Notice that at this point, the FUSE server is also sleeping, waiting for messages from the Kernel. When the Kernel receives the poll, it passes it to FUSE (kernel) which forwards it to the FUSE server by writing it to the connection file. The Kernel then waits for a message back in the file, before returning the poll call back to the poller. Unfortunately the daemon never gets a chance to read the message from the kernel, because at this point the runtime poller is blocked waiting for the Kernel to answer the original poll call. It simply can't wake up the FUSE server goroutine, which means the original poll call can never be answered back to the Kernel. Deadlock. Option 1: Don't poll from tests. (<--- This is what I'm doing)The os package doesn't offer a straightforward way to do File IO operations in blocking mode (that I know of). It's possible to implement it yourself. For example:
It's messy, but it works. Option 2: Run FUSE server and test in different processes.Simple enough to do, but I find it not as convenient to debug. Option 3: @rfjakob hackThe workaround posted by @rfjakob consists of: upon initialization, create a special file in the backed file-system, force a poll to that file, and then immediately return ENOSYS. This will tell the FUSE (kernel) that the backed file-system doesn't support poll, and following poll calls to the file-system will return immediately from the Kernel, without going to the FUSE server. Although this hack is very effective, I believe it to be a bit of an overkill. This deadlock, as I understand it, only happens when the poll (to the backed file-system) originates from the same process as the FUSE server, which seems to be a very sketchy thing to do (except on a test). |
Thanks very much for the analysis! I think I would actually prefer option 2 (server and test in separate processes), as it most closely mirrors how jacobsa/fuse will be run in practice. |
the downside of option 2 is that the tests won't be able to mock out parts of the filesystem or do whitebox testing that examines the filesystem states |
Update: I have since switched my project to run the file system in tests in a separate process. The reason is that I could never really remove all sources of
|
Steps to reproduce:
cd ~/go/src/github.com/jacobsa/fuse
go test -count=1 -v ./...
Sometimes, the above
go test
invocation hangs.ps
shows:(Sometimes it’s the statfs test process that hangs, so it’s not just memfs.)
Open file descriptors of the test process:
Adding extra debug logging doesn’t seem to help, as
go test
doesn’t show the output until the test process finishes.To unstuck your computer, use:
Possibly followed by
fusermount -u <mountpoint>
for all remaining FUSE mountpoints in /tmp.This issue does not seem to be caused by a recent change, at least not within
github.com/jacobsa/fuse
(perhaps on the kernel side?). I can reproduce it with commit e7bcad2 from 2019, and also with the current version.Any ideas/tips welcome.
The text was updated successfully, but these errors were encountered: