-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: add GOEXPERIMENT=cacheprog to let a child process implement the internal action/output cache #59719
Comments
A related approved proposal is #26232 for adding a similar helper process mechanism for |
See previously #42785. |
Right, thanks! I knew I'd seen something similar before. Unlike that proposal, this one involves no network requests or protocol buffers or auth to figure out or changing Go caching or build semantics. Just JSON over stdin/stdout. |
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Change https://go.dev/cl/486715 mentions this issue: |
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
JSON over stdin is probably the easiest plug-in option. The other one worth ruling out is running a WASI interpreter. WASI has the advantage that you can sandbox it and then run plug-ins you only mostly trust. |
The main issue I foresee with JSON is that it is fairly inefficient for binary blobs, unless the idea is to use JSON to transmit filenames from the regular build cache..? |
@bcmills, see the implementation for details, but in summary: for Gets, only the path on disk is returned. From child process to cmd/go it's all JSON objects with no binary data. For Puts (from cmd/go to the child), the base64 binary is streamed to the client process after the JSON object metadata. So it's technically a bunch of JSON values (some objects, some strings) but can be implemented efficiently without slurping it all into memory. |
@carlmjohnson, if you want to run run WASI or Node or C# or a JVM you can pass a path to a program doing that. We're not going to bundle a WASI runtime into cmd/go. :) |
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Signed-off-by: Brad Fitzpatrick <bradfitz@golang.org> Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
Via setting GOCACHEPROG to a binary which speaks JSON over stdin/stdout. Updates golang#59719 Change-Id: I824ff04d5ebdf0ba4d1b5bc2e9fbaee26d34c80f
And I've posted example child process code at https://github.com/bradfitz/go-tool-cache. |
I would appreciate if we could get a doc in this proposal the specification of the go cache program:
It would help us separate between implementation and specification. An alternative implementation I have in mind is to leverage https://github.com/bazelbuild/remote-apis/ which many build tools have started to adopt (Bazel, Buck2, Pants2, recc etc...). For example: separation between fetching Action and fetching Object might be ideal. |
Hello, @bradfitz. I am attempting to manually implement the Go cache protocol and have noticed that all implementations require similar concepts and logic surrounding Would it be possible to conceal these concepts within Go's built-in disk cache so that the implementation of cacher can be simplified as follows? Another benefit is that we don't have to transfer content in base64 JSON format through stdin. For example: // Request is the JSON-encoded message that's sent from cmd/go to
// the GOCACHEPROG child process over stdin. Each JSON object is on its
// own line. A Request of Type "put" with BodySize > 0 will be followed
// by a line containing a base64-encoded JSON string literal of the body.
type Request struct {
// ID is a unique number per process across all requests.
// It must be echoed in the Response from the child.
ID int64
// Command is the type of request.
// The cmd/go tool will only send commands that were declared
// as supported by the child.
Command Cmd
// ObjectID is set for Type "put" and "output-file".
OutputID []byte `json:",omitempty"` // or nil if not used
DiskPath string `json:",omitempty"`
}
// Response is the JSON response from the child process to cmd/go.
//
// With the exception of the first protocol message that the child writes to its
// stdout with ID==0 and KnownCommands populated, these are only sent in
// response to a Request from cmd/go.
//
// Responses can be sent in any order. The ID must match the request they're
// replying to.
type Response struct {
ID int64 // that corresponds to Request; they can be answered out of order
Err string `json:",omitempty"` // if non-empty, the error
// KnownCommands is included in the first message that cache helper program
// writes to stdout on startup (with ID==0). It includes the
// Request.Command types that are supported by the program.
//
// This lets us extend the gracefully over time (adding "get2", etc), or
// fail gracefully when needed. It also lets us verify the program
// wants to be a cache helper.
KnownCommands []Cmd `json:",omitempty"`
// For Get requests.
OutputID []byte `json:",omitempty"`
// DiskPath is the absolute path on disk of the ObjectID corresponding
// a "get" request's ActionID (on cache hit) or a "put" request's
// provided ObjectID.
DiskPath string `json:",omitempty"`
} All cacher will just need to handle requests and implement the following API: // Load cache of given outputID and write into diskPath
fn Get(ctx context.Context, outputID string, diskPath string) (err error)
// Read content from diskPath and store for outputID
fn Put(ctx context.Context, diskPath string, outputID string) (err error) |
@Xuanwo, I don't want cmd/go to pick the path on disk, though. I want cacher programs to be able to return paths to FUSE filesystems back to cmd/go if they'd like. |
I've updated the top comment with the summary of the proposed protocol from the code review. @rsc what's the process of getting a GOEXPERIMENT at least so I can get more testing on this over a release cycle without having to carry a big patch for a long time and without getting Hyrum-locked into a particular API if there's a big mistake we didn't realize. I'd ideally love to get this into Go 1.21 (behind a compile-time GOEXPERIMENT) to test it during the next 9 months and maybe get into Go 1.22 on by default. |
No change in consensus, so accepted. 🎉 |
This is very exciting, thanks! Sorry for comment on closed issue, but: I set out to try it with a flow like:
Based on how Do I understand correctly that as-is this requires something like:
If that's the case, that's fine, but since it would require maintaining a properly-built toolchain it will probably limit how much I'm able to play with it when 1.21 comes out. |
@danp I think you probably need to be setting |
👍 just wanted to make sure that is the intent for this particular experiment. |
Yup. |
Does this apply to Go module cache as well? |
No, the module cache remains separate. (It wasn't cleaned or populated in the same way as the build cache to begin with.) |
Change https://go.dev/cl/556997 mentions this issue: |
This sounds great, I wonder if a similar proposal can be made for the go module cache? Specifically, I think there are some ways go currently handles the module cache that makes it impossible for tools like Bazel to cache those. |
The
cmd/go
tool has great caching support. Unfortunately, its caching only supports filesystem-based caching.I'd like to do things like hook into GitHub's native caching system at a lower level (instead of the inefficient thing people do now: untarring/tarring GOCACHE archives on every CI run, which is often slower than the CI action itself) and support things like a P2P cache gossip protocol between [trusted] coworkers within a company.
Clearly both those examples aren't realistic to add to
cmd/go
itself. So instead:I propose that
cmd/go
support aGOCACHEPROG=/path/to/program
environment variable (akin toGOCACHE=/path/to/dir
) where the GOCACHEPROG is run as a child process andcmd/go
speaks to it over stdin/stdout, translating the Go tool's internal cache interface, and then the GOCACHEPROG can do whatever caching mechanism/policy it wants.I talked to @rsc about this once and he didn't seem opposed so I went off and implemented it and it's looking like it's going to be pretty awesome. (demo programs)
Thoughts, objections, etc?
(And preemptively: I have a soft spot for FUSE but FUSE is not an answer; it doesn't work in enough environments like CI test runner environments and it's finicky on basically all platforms but Linux, but also on Linux)
The protocol (from the code linked above) is currently:
The text was updated successfully, but these errors were encountered: