lisafs: include file size in FlushReq#12977
Conversation
When host FDs are donated to the sentry, writes bypass the gofer via direct pwritev2. Add the sentry-tracked file size to FlushReq so custom gofer implementations can detect file growth without an extra fstat. The stock gofer does not advertise Flush support, so this field is never sent in the default configuration.
| type FlushReq struct { | ||
| FD FDID | ||
| FD FDID | ||
| Size uint64 // Current file size as tracked by the sentry. | ||
| } |
There was a problem hiding this comment.
We can't amend RPC messages for backwards compatibility. runsc-gofer and runsc are released together so it would not be a problem here. But internally we allow gofer releases and sentry releases to diverge.
There was a problem hiding this comment.
What's the alternative here then?
There was a problem hiding this comment.
Thanks @ayushr2!
Does that mean we leave FlushReq and MID=20 untouched and introduce a new FlushWithSize MID = 34 with FlushWithSizeReq{FD, Size}, so the size field rides on a new MID rather than mutating an existing one?
I noticed similar pattern with RenameAt to RenameAt2.
I can look into a commit accordingly and wire up the rest of bit as well.
There was a problem hiding this comment.
I don't think we should add FlushWithSize.
Flush RPC is meant for flushing cached buffers into the underlying storage. We should not pass arbitrary file metadata in the FlushRPC request. We are spilling gofer implementation details into the lisafs protocol (just so it is convenient for this particular custom gofer to implement Flush).
If the custom gofer has donated an FD, why can't it fstat(2) that file's FD to fetch the file size while handling Flush? It should be fairly cheap. I don't think it warrants adding an entirely new RPC for this.
The gofer's only close-time signal is the Flush RPC, which currently carries only the FDID.
Note that Flush is called when an application FD referencing that file is closed via close(2). However, it does not mean that the FDID is being released. There could be other application FDs to the same dentry. Closing each would trigger a Flush.
We have a Close RPC which is called when the sentry is releasing the lisafs FD resource and closing the FDID.
Without the size hint, these gofers must poll with fstat to detect file growth, typically on a timer
Hmm I don't think they need to poll fstat(2) on a timer. You can just have them call fstat(2) on Flush, which is called per application FD close. A host fstat(2) is pretty cheap. Do we have reason to believe that this would be prohibitively expensive?
There was a problem hiding this comment.
Thanks, both points land and fair. Context on where this came from.
The custom gofer that prompted this uses donated host FDs as scratch space on a size-capped tmpfs and progressively streams dirty chunks to a remote file service. I believe since writeFromBlocksAt prefers the donated FD, the Write RPC almost never fires in production, so the gofer has no direct view of the bytes the sentry writes. I hook Flush, fstat the backing, diff against a streamedSize watermark, mark the grown range dirty, and wake an uploader thing basically.
The idea behind FlushWithSize was to drop the per-close fstat on the hot path when the sentry already had the number. But I agree, it's a very specific requirement here for a protocol addition. The fstat is cheap and I did notice some odd perf spikes so this was an exploration I did.
I think I can definitely find a few other ways to get better visibility and solve for contention too
When host FDs are donated to the sentry, subsequent writes go directly to the donated FD via
pwritev2, bypassing the gofer. The gofer's only close-time signal is theFlushRPC, which currently carries only theFDID. Custom gofer implementations that donate host FDs have no way to learn the resulting file size without an extrafstaton the backing FD.This adds a
Sizefield toFlushReq, populated frominode.sizeat flush time. The sentry already tracks file size through its write path (regular_file.goupdatesinode.sizeon everyPWritethat extends the file), so the value is available at zero cost.This is useful for custom gofers that use donated FDs as scratch space while maintaining their own metadata (e.g. a network-backed gofer that streams dirty regions to a remote store, or an encrypted-at-rest gofer that needs to know the plaintext extent). Without the size hint, these gofers must poll with
fstatto detect file growth, typically on a timer, which adds latency and syscall overhead proportional to the number of open files. With the size hint, the gofer learns the final file size immediately on close without any additional syscall. This is the same principle as pNFSLAYOUTCOMMIT(RFC 5661 §18.42) and CephFS capability flush, where clients that write through a path the metadata server cannot observe report the resulting file size on commit.The stock gofer does not advertise
Flushsupport (SupportedMessagesexcludes it), so this field is never populated or sent in the default configuration. No behavior change for existing users. For special files (pipes, sockets) the size is passed as 0 since it has no meaningful value.This becomes more useful alongside #12950 which adds a
Providerinterface for plugging custom gofer backends into the stock gofer binary.