Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: allow -toolexec tools to opt in to build caching #41145

Closed
mvdan opened this issue Aug 31, 2020 · 11 comments
Closed

proposal: cmd/go: allow -toolexec tools to opt in to build caching #41145

mvdan opened this issue Aug 31, 2020 · 11 comments

Comments

@mvdan
Copy link
Member

@mvdan mvdan commented Aug 31, 2020

From go help build:

        -toolexec 'cmd args'
                a program to use to invoke toolchain programs like vet and asm.
                For example, instead of running asm, the go command will run
                'cmd args /path/to/asm <arguments for asm>'.

-toolexec is a very powerful flag to control or modify how Go's toolchain programs get run. The classic use case is for development of Go itself; for example, using a program to measure how much CPU and wall time each compiler and linker invocation takes, or to insert a debugger in a particular toolchain program execution.

Issue #27628 already covers an existing problem: right now, the use of -toolexec does not invalidate the build cache, which is a problem in both of the examples above. We agree that that bug should be fixed, but consider it to be a separate issue.

However, we argue that it would be reasonable for a toolexec program's output to be cached when the program is deterministic. The use case we have in mind is tools which alter a Go build predictably; for example, a code obfuscator which modifies the arguments passed to the compiler and linker in a deterministic and reproducible way. Without caching, such a tool is extremely slow, especially for big projects with many packages.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, which shows the compiler's full version, and then use that as part of the cache key. This means that the "package build" operations will need to be run again if the compiler version changes or if the input changes, but not otherwise.

We propose adding similar functionality when using -toolexec. For example, when running go build -toolexec=mytool, the Go tool would run the tool with a well defined version flag like mytool -toolversion, similar to the compiler's -V=full. If it returns a non-zero exit code, we do no caching at all, which would align with the non-deterministic use cases in #27628. If the command succeeds, its output is added to the cache key, and proper build caching takes place.

We think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We should also define the bare minimum of information for a -toolversion flag to report. go tool compile -V=full seems to include the toolchain's version as well as the compiler's package build ID:

$ go tool compile -V=full
compile version devel +54e18f1c2a Fri Aug 28 20:01:41 2020 +0000 buildID=DU3bt_lJqYe1kVhgbnOf/1cyl58sUXfIZ9K-RctHM/5cl10_pQMFsyS4FJFGwM/fqivXvY1n7nVWhRpqqqP

We realise that this proposal is somewhat far-fetched, and the easy answer might be "-toolexec was never meant for this kind of use". However, we argue that it opens up a very exciting world of possibilities in Go tooling, especially with tools that closely integrate with the toolchain itself to alter builds.

#29430 is a related proposal, but it also has a very different objective. It proposes an entirely new interface for code rewriting tools, which is yet to be defined. We simply propose to improve the existing -toolexec interface in a very specific and backwards-compatible way. We also think both features could coexist; not all build-altering tools will simply need to alter source code - for example, the code obfuscator shown above also enforces some linker flags like -w -s.

/cc @bcmills @cherrymui @ianlancetaylor @josharian @jayconrod @ianthehat

@mvdan mvdan added this to the Proposal milestone Aug 31, 2020
@bcmills
Copy link
Member

@bcmills bcmills commented Aug 31, 2020

The problem with general caching for toolexec is that we can't tell whether the tool is idempotent (such as a transformer or annotator or some sort of idempotent analyzer), or intended specifically to measure repeated invocations (such as a profiler or tracer), or even consistent from run to run (such as an interactive debugger with user-supplied edits to arbitrary memory locations).

Tool versioning aside, I think we would also need some way to determine whether the output is actually intended to be cached.

@mvdan
Copy link
Member Author

@mvdan mvdan commented Aug 31, 2020

we would also need some way to determine whether the output is actually intended to be cached.

That's what the existence of the flag is meant to convey. We could call it -gocachekey if you think -toolversion is not specific enough.

@ianlancetaylor ianlancetaylor changed the title Proposal: cmd/go: allow -toolexec tools to opt in to build caching proposal: cmd/go: allow -toolexec tools to opt in to build caching Aug 31, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Aug 31, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

Consider go build -toolexec myexec.
It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?
What matters is what happens in the tools it runs.
Using "myexec compiler -V=full" would work for toolstash.

I see #27628 and wanting to get timing repeated, but that seems orthogonal to the use of toolexec.
You can't run "time go build" twice in a row and get consistent answers either (the second uses results cached by the first).
"go build -toolexec=time" behaving like a fine-grained "time go build" does not bother me too much.
Maybe for that case you want to be able to checkpoint the cache and rewind.
(Delete all cached results past a certain moment, or maybe "go build -do-not-write-new-things-to-cache".)

@mvdan
Copy link
Member Author

@mvdan mvdan commented Sep 5, 2020

It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?

Both myexec -gocachekey and go tool compile -V=full would be in the cache key when running go build -toolexec myexec, now that I think about it. This is because myexec and go tool compile are independent and can change their behavior separately.

Running only myexec compile -V=full and expecting that to contain both cache key inputs at once is an option, too. I don't feel strongly one way or another.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 16, 2020

What kinds of things do you envision myexec doing that would make only using compile -V=full not good enough as a cache key? Myexec's job should be to run some kind of executable. Would it ever change the behavior of that executable so that it didn't work the way it was compiled to work? (How it was compiled to work should be covered by the -V=full output.)

@mvdan
Copy link
Member Author

@mvdan mvdan commented Sep 16, 2020

The immediate use case is indeed altering Go builds, such as changing the input Go source or altering the flags passed to the compiler and linker. This is how I implemented a binary obfuscator, which I tried to explain in the original post.

Any tool which aims to build a Go binary with altered Go source code would also benefit from this. #29430 shows fuzzing as an example, since one needs to instrument the code for fuzzing.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 18, 2020

@mdvan In that case it seems pretty clear that the tool that is altering the behavior of the compiler should be responsible for altering the -V=full output as well.

@mvdan
Copy link
Member Author

@mvdan mvdan commented Sep 18, 2020

That's actually a good point, I'm not sure why I never thought of it. I'll give that a try; please give me a week before making a decision on this proposal.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 18, 2020

For the record (because I wasn't sure above), go build -toolexec=myexec already runs myexec compile -V=full:

% go build -toolexec=echo helloworld.go
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
	/Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
@mvdan
Copy link
Member Author

@mvdan mvdan commented Sep 18, 2020

Indeed, it works. Here's a proof of concept I wrote today: burrowers/garble@308e4ae

Unfortunately, via plain go build I don't have a good way to obtain myexec's own version. For now, I just do a sha256 sum of os.Executable, and add the flags passed to myexec which affect the output. Hopefully #37475 can be fixed soon, as then I'd avoid the extra I/O.

I'll leave this proposal open for a few more days, just in case this current solution isn't enough for any other tools or people following this thread. For my own immediate use, it appears I don't need any extra features after all, so I'm happy.

@mvdan
Copy link
Member Author

@mvdan mvdan commented Sep 22, 2020

Alright, withdrawing proposal as per my comment above. Thanks again @rsc.

@mvdan mvdan closed this Sep 22, 2020
@mvdan mvdan moved this from Active to Declined in Proposals Sep 22, 2020
mvdan added a commit to burrowers/garble that referenced this issue Sep 22, 2020
As per the discussion in golang/go#41145, it
turns out that we don't need special support for build caching in
-toolexec. We can simply modify the behavior of "[...]/compile -V=full"
and "[...]/link -V=full" so that they include garble's own version and
options in the build cache key.

We add a number of things to the -V=full output. First, "+garble" since
that is the relatively unique name of the Go program. Second, the
version of Garble itself. Since we can't do this via modules until
golang/go#37475, we instead use the
hex-encoded sha256 of our own binary.

Finally, we need to add the garble options which modify how we obfuscate
code, since each should result in different build cache keys. GOPRIVATE
also affects caching, since a different GOPRIVATE value means that we
might have to garble a different set of packages.

This feature works, with only a minor regression in the ldflags test
since the -X linker flag is now broken with private names. The following
commit will fix that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Declined
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.