Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goroutine race breaks guarantees on write -> flush barrier #3

Closed
jacobsa opened this issue Mar 23, 2015 · 6 comments
Closed

Goroutine race breaks guarantees on write -> flush barrier #3

jacobsa opened this issue Mar 23, 2015 · 6 comments
Assignees

Comments

@jacobsa
Copy link
Owner

jacobsa commented Mar 23, 2015

UPDATE, 2015-04-02: Make sure to see also issue #8. Much of this is incorrectly diagnosed, due to a kernel bug.

If I run the test FlushFSTest.Mmap_MunmapBeforeClose 1,000 times waiting for the first error, like so:

go test -c ./samples/flushfs  -o /tmp/foobar && (for i in {0..1000}; do /tmp/foobar --ogletest.run FlushFSTest.Mmap_MunmapBeforeClose --fuse.debug || exit 1; done) && echo OKAY

it almost always fails with something like the following:

flush_fs_test.go:627:
Expected: elements are: [paco]
Actual:   [taco], whose element 0 doesn't match

Running with --fuse.debug shows this is because the second write request (from the modification to the mapped memory) is received after the flush for the close(2), which happens before the modification in test sequence.

This is puzzling, given the code walk documented on WriteFile which appears to prove that fuse writes out all dirty pages before sending a flush request:

  • (http://goo.gl/PheZjf) fuse_flush calls write_inode_now, which appears to start a writeback in the background (it talks about a "flusher thread").
  • (http://goo.gl/1IiepM) fuse_flush then calls fuse_sync_writes, which "[waits] for all pending writepages on the inode to finish".
  • (http://goo.gl/zzvxWv) Only then does fuse_flush finally send the flush request.

The comments there talk about pending writepages requests, but don't use the term "dirty". The commit that introduced this behavior, torvalds/linux@fe38d7d is more clear it talks directly about "dirty pages".

I modified the way requests are logged by server.go, and I believe the issue here is simply that the goroutines spawned by server.Serve are racing. That is, fuse is sending "write, write, flush", but we are processing them as "write, flush, write" because fuse sends write requests but doesn't wait for them.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 23, 2015

So what to do about this? Some options:

  • Do nothing. Don't make the "writes before flushes" guarantee to the file system implementer. This sucks, let's not do it.
  • Process all requests serially. This is nice and simple, but could be very slow.
  • Set up a barrier: wait for responses to all write requests for a given handle to complete before passing on a flush for that handle, potentially blocking the pipeline behind. Document this publicly.
  • Change the API so we don't call blocking functions but rather just pass on requests, either as a channel or a read method or asynchronous function calls.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 23, 2015

This can be made very easy to reproduce by sticking a time.Sleep(time.Second) at the top of the bazilfuse.WriteRequest case in server.go. It makes FlushFSTest.Mmap_MunmapBeforeClose fail every time.

The same is not true of FlushFSTest.CloseReports_ReadWrite, where it appears that each write(2) call is blocking on the fuse write request to finish. This surprises me; I thought writes were supposed to happen asynchronously via the page cache.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 23, 2015

Closing the loop: I think perhaps the reason that we wait for write(2) is because we don't respond to Init with bazilfuse.InitWritebackCache, corresponding to FUSE_WRITEBACK_CACHE in fuse_kernel.h. This was apparently added in kernel 3.15, so I can't verify on my 3.13 machine.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 23, 2015

Current favored solution:

  • Kill the FileSystem interface.
  • Instead, have an Op interface with methods Context() context.Context and Respond(error).
  • Concrete implementations are structs with both request and response fields.
  • Take this opportunity to put them in a fuseops sub-package. For example, fuseops.ReadOp.
  • Have a Connection type that exposes a read method for a stream of Ops, directly reflecting /dev/fuse.
  • Move existing method and struct comments onto the op structs.
  • Mount function accepts serveConn func(*Connection).
  • Kill MountedFileSystem.WaitForReady; instead just in Mount (and document this).
  • Audit package and README comments, update where necessary.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 23, 2015

This is blocked by #4; we want tests that we can run many times to confirm we've fixed the issue.

@jacobsa
Copy link
Owner Author

jacobsa commented Mar 24, 2015

All done with 336525b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant