gofer: add pluggable provider interface for custom filesystem backends #12950
gofer: add pluggable provider interface for custom filesystem backends #12950shayonj wants to merge 1 commit into
Conversation
|
Just to get this on your radar @ayushr2 and @EtiennePerot . This depends on #12923, hence the added commit from there. The more relevant piece of proposal is this commit - f0bb46f. Happy to also start a design discussion in Github issues if that seems more fitting. No rush and appreciate your time. |
f0bb46f to
c56e256
Compare
a61ff57 to
6eed079
Compare
Building a custom gofer (e.g. for network-backed storage, encrypted filesystems, or tiered caches) currently requires forking the runsc binary and copying/maintaining unexported setup and seccomp code. This adds a Provider interface that lets custom filesystem backends register with the stock gofer and serve LisaFS connections for specific mounts without forking. The interface follows the socket.Provider pattern: NewServer returns (nil, nil) to decline a mount, and the first registered provider that returns a non-nil server handles it. NewServer receives the sandbox's OCI runtime spec so providers can read per-mount configuration from spec.Annotations (endpoints, volume keys, auth modes) without a side-channel. Stock fsgofer remains the default when no provider claims a mount. SeccompRules lets providers declare additional syscalls, merged via filter.Rules() before installation. Zero behavior change when no providers are registered: the stock fsgofer path runs unchanged, identical to today. This follows the same pattern as the network plugin (config.NetworkPlugin): inactive when not configured, no impact on the default path. New package runsc/gofer/provider defines the Provider interface and registration. The gofer command (runsc/cmd/gofer.go) iterates registered providers for each mount before falling through to fsgofer. The seccomp filter (runsc/fsgofer/filter) gains InstallWithExtra for merging provider rules with the stock allowlist. Also adds documentation in g3doc/user_guide/filesystem.md and pkg/lisafs/README.md describing how to use the provider interface. Depends on google#12923 (GoferMountConf in specutils).
6eed079 to
0d024ed
Compare
|
Just a gentle bump, in case there is any feedback for me? 😊 |
There was a problem hiding this comment.
Hmm IMO we move all this into runsc/fsgofer. Since this is very specific to filesystem gofers (as opposed to network gofers).
fsgofer and gofer as sibling packages seems a bit odd. You can have it under runsc/fsgofer/gofer_impl.go to represent other gofer implementations. "Provider" probably doesn't fit this too well, since it is not "providing" the gofer. The custom gofers are expected to register themselves.
We have a similar concept in our shim, where we allow shim extensions: pkg/shim/v1/extension/extension.go. Maybe you can call it GoferRegistry? Up to you.
There was a problem hiding this comment.
Yeah, def fair points. I had runsc/gofer/provider/ as a sibling because I was reading runsc/fsgofer/ as one specific filesystem-gofer implementation rather than the umbrella for filesystem gofers, but that distinction is pretty thin and runsc/fsgofer/extension/ is the right spot.
On the name, agreed Provider does not quite fit because the registrant is the thing providing. I think my first iterate was Extension based on the shim extension itself but i saw SocketProvider and got carried away. Happy to keep the pattern of Extension going.
| if cfg.mountPath != "/dev" { | ||
| for _, p := range provider.Registered() { | ||
| var err error | ||
| srv, err = p.NewServer(spec, cfg.mount, cfg.mountPath, cfg.mountConf, cfg.readonly) |
There was a problem hiding this comment.
Seems like an anti pattern that we create a new server for each connection. The model we try to have is that we create one gofer server which can handle multiple connections. The server embeds lisafs.Server. That struct does a bunch of work, including providing concurrency guarantees across RPCs. See usages of the renameMu and also the entire filesystem tree it maintains. Each node has an opsMu which synchronizes RPCs on the same file.
I think what we should do is only 1 gofer server. And allow lisafs connections to define custom behavior. I think what you want to do is hijack the fsgofer Server Mount implementation:
gvisor/runsc/fsgofer/lisafs.go
Lines 116 to 117 in 7b19947
Notice that you have the Connection struct there. You also have the full mountPath there. If that is not sufficient to make the decision to pivot into your custom gofer, you can add whatever optional fields in lisafs.Connection to help you make that decision at Mount time. This RPC is called only once per connection.
In that method, you return your custom implementation of lisafs.ControlFDImpl. And that should be all. Every RPC on that connection will then be routed to your lisafs.ControlFDImpl implementation.
There was a problem hiding this comment.
Thank you for the detailed read.
Seems like an anti pattern that we create a new server for each connection. The model we try to have is that we create one gofer server which can handle multiple connections.....
Yeah, some of this is influenced by that long-lived sentry hardened design :D. The shape in this PR comes from the customer gofer, which implements lisafs.ServerImpl directly because it needs values for SupportedMessages and ServerOpts that diverge from fsgofer's, along with its own per-implementation lifecycle for things like a background uploader and lease heartbeat. For example, it advertises Flush so the sentry sends us the Flush RPC on close(2), which is how I learn about writes that bypassed the gofer through the donated host FD, and it runs with ServerOpts{} because the custom gofer's storage model cannot honor OpenOnDeleted or SetAttrOnDeleted. Those values are configured per-ServerImpl server-wide today, but the wire protocol already encodes them per-mount in MountResp.SupportedMs and MountResp.MaxMessageSize. That mismatch is what made multi-Server look like the cheap fix to me when I shaped this PR, since each backend brings its own lisafs.Server and sets those values independently, which is how our fork runs today.
Looking at it again with your framing, there are costs I underweighted for sure. Each registered backend ends up as its own lisafs.Server lifecycle where the gofer command has to fan out Wait and Destroy, and each backend gets its own connWg and root Node tree on top of the points you mentioned. As I am familiarizing myself more with the codebase, I am also realizing this design does not help augment-style backends (the future example posts and documentation on how to bring in a custom backend), because they would have to bring their own server and lose the controlFDLisa infrastructure they would otherwise inherit.
Your point about allowing lisafs connections to define custom behavior, and about adding optional fields in lisafs.Connection, reads to me as opening the door to making those values per-Connection, which would make the single-Server shape work for both augment-style and replace-style backends, IIUC?
If that direction sounds right to you, I think the cleanest path is two PRs where this one gets repurposed for the second. The first would be a pkg/lisafs/ change that moves SupportedMessages, MaxMessageSize, and ServerOpts from per-ServerImpl to per-Connection, with 0 behavior change for fsgofer because it would set per-connection what it sets per-server today. I think that should be ok because the wire format already encodes these per-mount and the per-ServerImpl methods are the inconsistency (?)
Once that lands, I would rebase this PR on top with the reworked extension mechanism, where there is one lisafs.Server and custom backends register a Extension that hijacks LisafsServer.Mount, returns a ControlFDImpl plus the per-connection values. Then any custom augment-style and replace-style extensions/abckends plug in the same way, and the relocate and rename from your other comment land here naturally.
What do you think on the sequencing, and is there anything on your end I should be considering before I rework this? Especially the stuff around moving SupportedMessages and rest to per Connection?
Thanks for helping brainstorm on this too!
There was a problem hiding this comment.
Yes, I totally agree with your plan of action. It makes sense. Yeah the SupportedMessages and MaxMessageSize are already set up per-connection basis. Decoupling that from the server makes sense. Looking forward!
|
BTW thanks a lot for upstreaming your work! This is amazing! |
The lisafs wire protocol already encodes the supported message list and the max message size per-mount on MountResp, but the server-side implementation reads both off a server-wide ServerImpl and reads ServerOpts off Server, so a custom implementation that needs to advertise different MIDs or run with different "OnDeleted" semantics has to bring its own lisafs.Server. That multi-Server shape carries its own renameMu, filesystem tree, and per-node opsMu instead of sharing them across mounts. This moves Mount, SupportedMessages, MaxMessageSize, and the former ServerOpts behind a per-Connection ConnectionImpl. Server keeps the shared renameMu, filesystem tree, connection lifecycle, and an opaque implementation value for existing fsgofer config access, while each Connection reads its supported MIDs from its ConnectionImpl and caches hot-path configuration such as max message size and ConnectionOpts at startup. Stock fsgofer behavior is unchanged because LisafsServer remains the server-wide config carrier and hands out a stock connection implementation. This is a precursor to a follow-up rework of #12950 that drops the multi-Server fan-out and lets LisafsServer choose a per-mount ConnectionImpl that returns the backend ControlFDImpl, all inside the single shared lisafs.Server. FUTURE_COPYBARA_INTEGRATE_REVIEW=#13158 from shayonj:s/lisafs-per-connection-opts ead4588 PiperOrigin-RevId: 915155303
The lisafs wire protocol already encodes the supported message list and the max message size per-mount on MountResp, but the server-side implementation reads both off a server-wide ServerImpl and reads ServerOpts off Server, so a custom implementation that needs to advertise different MIDs or run with different "OnDeleted" semantics has to bring its own lisafs.Server. That multi-Server shape carries its own renameMu, filesystem tree, and per-node opsMu instead of sharing them across mounts. This moves Mount, SupportedMessages, MaxMessageSize, and the former ServerOpts behind a per-Connection ConnectionImpl. Server keeps the shared renameMu, filesystem tree, connection lifecycle, and an opaque implementation value for existing fsgofer config access, while each Connection reads its supported MIDs from its ConnectionImpl and caches hot-path configuration such as max message size and ConnectionOpts at startup. Stock fsgofer behavior is unchanged because LisafsServer remains the server-wide config carrier and hands out a stock connection implementation. This is a precursor to a follow-up rework of #12950 that drops the multi-Server fan-out and lets LisafsServer choose a per-mount ConnectionImpl that returns the backend ControlFDImpl, all inside the single shared lisafs.Server. FUTURE_COPYBARA_INTEGRATE_REVIEW=#13158 from shayonj:s/lisafs-per-connection-opts ead4588 PiperOrigin-RevId: 915155303
The lisafs wire protocol already encodes the supported message list and the max message size per-mount on MountResp, but the server-side implementation reads both off a server-wide ServerImpl and reads ServerOpts off Server, so a custom implementation that needs to advertise different MIDs or run with different "OnDeleted" semantics has to bring its own lisafs.Server. That multi-Server shape carries its own renameMu, filesystem tree, and per-node opsMu instead of sharing them across mounts. This moves Mount, SupportedMessages, MaxMessageSize, and the former ServerOpts behind a per-Connection ConnectionImpl. Server keeps the shared renameMu, filesystem tree, connection lifecycle, and an opaque implementation value for existing fsgofer config access, while each Connection reads its supported MIDs from its ConnectionImpl and caches hot-path configuration such as max message size and ConnectionOpts at startup. Stock fsgofer behavior is unchanged because LisafsServer remains the server-wide config carrier and hands out a stock connection implementation. This is a precursor to a follow-up rework of #12950 that drops the multi-Server fan-out and lets LisafsServer choose a per-mount ConnectionImpl that returns the backend ControlFDImpl, all inside the single shared lisafs.Server. FUTURE_COPYBARA_INTEGRATE_REVIEW=#13158 from shayonj:s/lisafs-per-connection-opts ead4588 PiperOrigin-RevId: 915155303
Building a custom gofer (e.g. for network-backed storage, encrypted filesystems, or tiered caches) currently requires forking the runsc binary and copying/maintaining unexported setup and seccomp code. This adds a
Providerinterface that lets custom filesystem backends register with the stock gofer and serve LisaFS connections for specific mounts without forking.This PR establishes the extension point itself. Example custom gofers (a minimal in-memory "hello", etc.) and a fuller custom-gofer guide can be follow up as separate PRs that depend on this one.
The interface follows the
socket.Providerpattern where theNewServerreturns(nil, nil)to decline a mount, and the first registered provider that returns a non-nil server handles it.NewServerreceives the sandbox's OCI runtime spec and the specific*specs.Mountbeing served, so providers can read sandbox-wide configuration fromspec.Annotations(endpoints, auth modes) and per-mount configuration from the mount itself (Source,Type,Options) without a side-channel. Stockfsgoferremains the default when no provider claims a mount. Also,SeccompRuleslets providers declare additional syscalls, merged viafilter.Rules()before installation.There are no behavior changes when no providers are registered which keeps the stock
fsgoferpath runs unchanged. This follows the same pattern as the network plugin (config.NetworkPlugin).Also adds documentation in
g3doc/user_guide/filesystem.mdandpkg/lisafs/README.mddescribing how to use the provider interface.