Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: do not cache tool output if tools print to stdout/stderr #27628

Open
cherrymui opened this issue Sep 11, 2018 · 28 comments
Open

cmd/go: do not cache tool output if tools print to stdout/stderr #27628

cherrymui opened this issue Sep 11, 2018 · 28 comments

Comments

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 11, 2018

Update, Oct 7 2020: see #27628 (comment) for most recent proposal in this issue.

What version of Go are you using (go version)?

tip (2e5c325), also Go 1.11

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

darwin/amd64

What did you do?

$ go build -toolexec=/usr/bin/time hello.go
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$ go build hello.go 
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$

What did you expect to see?

The second invocation of go build doesn't have -toolexec, so it should not invoke the toolexec command (which I think it doesn't), nor reprint its output.

What did you see instead?

toolexec output is reprinted.

In fact, I think it probably should not cache at all if -toolexec is specified, since the external command that toolexec invokes may do anything, and (intentionally) not reproducible.

cc @dr2chase

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Sep 11, 2018

Using a cooked command "wrap" =

#!/bin/bash
echo -n `basename "$1"` 1>&2 ; /usr/bin/time "$@"
date 1>&2 

it keeps reprinting the same date+time. This is definitely cached output.

go build -toolexec wrap .
go build .
rm hello
go build .
@mark-rushakoff
Copy link
Contributor

@mark-rushakoff mark-rushakoff commented Sep 11, 2018

#27207 is about -exec prevents test caching. Maybe there needs to be a more holistic review of -exec, -toolexec, and friends, and how they all relate to caching.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 12, 2018

I'm not sure whether we should cache the result of -toolexec (see #27207 (comment)), but if we do, the value of the -toolexec flag should be included in the cache key.

So there's definitely a problem here either way: we just need to decide whether the fix is to update the cache key or exclude the result from caching.

@bcmills bcmills added this to the Go1.12 milestone Sep 12, 2018
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 12, 2018

I actually don't think those are the right choices. There is a plausible use of -toolexec to invoke a debugger or other interactive program, in which case we should never cache if -toolexec is used. And there is a plausible use of -toolexec to invoke a logger or other build annotator, in which we should completely ignore -toolexec and its argument for caching purposes. I can't think of an important use case in which we want to cache based on whether we are using the same -toolexec option as before.

I would probably pick "never cache if -toolexec is used" but I don't feel very strongly about it.

@cherrymui
Copy link
Contributor Author

@cherrymui cherrymui commented Sep 12, 2018

I would probably pick "never cache if -toolexec is used"

I vote for this.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 12, 2018

It looks like we have a consensus. (Moving from NeedsDecision to NeedsFix.)

@rsc rsc changed the title cmd/go: -toolexec is mistakenly cached cmd/go: do not cache tool output when using -toolexec Oct 24, 2018
@bcmills bcmills modified the milestones: Go1.12, Go1.13 Jan 17, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@mvdan
Copy link
Member

@mvdan mvdan commented Dec 8, 2019

I agree that this bug needs fixing one way or another. The only way I find -toolexec useful right now is like -a -toolexec=whatever.

I have some thoughts about caching, though. It's unfortunate if we simply throw caching out the window, because it's entirely reasonable for toolexec programs to be deterministic given their input.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, and use that as part of the cache key. This means the compile operations will need re-doing if the compiler version changes, but not otherwise.

Could we do the same for -toolexec as an opt-in? For example, if one does go build -toolexec=mytool, the Go tool would run a well defined version flag like mytool -toolversion, similar to the compiler's -V=full. If that fails, we do no caching at all with -toolexec=mytool. If it succeeds, its output is added to the cache key, and caching takes place.

I think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We would have to define the bare minimum of information for -toolversion to report, though. For example, would a stringified https://golang.org/pkg/runtime/debug/#Module be enough, if available? Or perhaps a hash of the file contents in https://golang.org/pkg/os/#Executable?

@mvdan
Copy link
Member

@mvdan mvdan commented Dec 9, 2019

Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when -toolexec is used. That would be strictly better than what we have right now.

Once this issue is fixed, I can then file the "opt-in caching for -toolexec" idea as a separate proposal.

@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

Going to mark this as a proposal since the answer is unclear and it is bound up in #41145. Maybe they should be merged.

@rsc rsc added Proposal and removed NeedsFix labels Sep 2, 2020
@rsc rsc added this to Active in Proposals Sep 2, 2020
@rsc rsc changed the title cmd/go: do not cache tool output when using -toolexec proposal: cmd/go: do not cache tool output when using -toolexec Sep 2, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Sep 23, 2020

I still think -toolexec should be completely orthogonal from the cache. There are times when you want to force a tool to run (as an extreme case, -a), and we can and should handle those independently of whether -toolexec is being used.

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 24, 2020

I've read your last comment a number of times, but I'm having trouble understanding it, sorry.

If you mean that one should be able to use -toolexec with or without build the build cache, I agree, which is why I raised the other proposal to add such a knob. If you mean something else, you lost me :)

@rsc
Copy link
Contributor

@rsc rsc commented Sep 30, 2020

I mean that the use of the build cache should be completely separate from -toolexec.
The two should operate independently of each other, as they do today.
Neither knows anything about the other. It should stay that way.

If there are problems with wanting to avoid certain entries in the build cache at certain times,
that's a general problem, not limited to -toolexec, and so any solution should not involve -toolexec.

If you don't want a cache, there is currently -a.
If that's too big a hammer, then let's address that separate from -toolexec.

@mvdan
Copy link
Member

@mvdan mvdan commented Sep 30, 2020

Fair enough, I agree that we don't need to make this about toolexec.

If you don't want a cache, there is currently -a.

I don't think that's quite right, though. -a does not disable the cache, it merely forces rebuilding, but it still writes to the cache. This results in unintended behavior when one wants to truly not use the cache at all, since we still write to the cache - for example:

$ cd tools # golang.org/x/tools
$ go build -a -toolexec=time ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps
$ go build ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps

Because stdout/stderr of the tools such as the compiler and linker is also cached, we've now written cache entries with the -toolexec=time effect. The only way to get rid of those is to rebuild again without toolexec, like go build -a ./cmd/stringer. So -a doesn't quite disable the build cache in a useful way.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 1, 2020

I see, thanks. I got this issue mixed up in my head with the other one (#41145).

@mvdan, above you wrote:

I do think the answer is clear here; the default behavior with -toolexec should be to not use the build cache at all.

I disagree that the answer is clear. Maybe it is clear for this one example of -toolexec=time, but that's not what -toolexec is for.

I added -toolexec specifically for toolstash. If you are using -toolexec toolstash you definitely do want caching of results: why should substituting an alternate compiler disable caching?

The build cache came after -toolexec, but its design explicitly contemplates -toolexec and takes care to preserve good performance from -toolexec toolstash. See go doc -u cmd/go/internal/work.toolID. This was not an accident and should not be lightly discarded.

The design of -toolexec was not directly intended for -toolexec=time or -toolexec=gdb. That was an accident - a mostly happy one, but an accident nonetheless. The fact that -toolexec=time fell out naturally from the design doesn't mean it should force awkward changes to the design.

So as far as this proposal issue's current title - "do not cache tool output when using -toolexec" - I don't believe that's the right decision.

One possible way to address -toolexec=time would be to add a new -cachewrite=false flag that allows reading from the build cache but disables writes to it. Then you run -toolexec=time -cachewrite=false. It's easy to forget, though. Also, the existence of this flag would encourage thinking about the cache as something that should be manually managed, which disturbs me quite a bit. A cache that you need to manage by hand to get semantically correct behavior is not actually a cache.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 1, 2020

Counter-proposal: any time a command prints to stdout or stderr, its results are not cached.

The most common time a tool prints to stdout/stderr are failures, and those are already not cached because of the non-zero exit status. The new rule would generalize that: any print cannot be cached.

At first this seems a bit surprising and more tied to tool behavior than you'd expect, but it turns out that it would remove a bunch of special case code that been introduced since the build cache was first added.

The original build cache didn't save stdout/stderr - it just didn't seem necessary.
One unintended effect was that -toolexec=time did not pollute future build output with cached timing prints.

But then go build -gcflags=-m errors didn't reprint the -m output on a second run (#22587).
We fixed that in CL 77110 by caching and reprinting the stdout/stderr too.

But then we still missed stdout/stderr for certain skipped compile+link steps (#23877).
We fixed that in CL 128903 with even more special logic.

And now go build -toolexec=time prints the timing output without -toolexec,
which is kind of the opposite of the -gcflags=-m issue.

(There were probably other issues I am forgetting.)

Instead of continuing to finesse the exact behavior of when stdout/stderr are cached or not,
why not turn back the clock and re-fix #22587 by saying:
"don't cache build artifacts when a command prints anything".
Then #22587 is fixed, #23877 doesn't happen, this issue doesn't happen either,
-toolexec stays independent of the cache decision, and -toolexec=toolstash stays fast.

And it would let us delete all the output-handling code in the build cache (the test cache is separate and would stay).

Thoughts?

@cherrymui
Copy link
Contributor Author

@cherrymui cherrymui commented Oct 1, 2020

I'd be fine with the counter-proposal. Almost all the time when I want the compiler to print something (using toolexec or not), I'd expect it actually invokes the compiler.

@mvdan
Copy link
Member

@mvdan mvdan commented Oct 1, 2020

I disagree that the answer is clear.

For the record, I've changed my mind since then, which is the entire reason I retracted the other proposal. So yes, I agree with your comment.

I'm fine with the counter-proposal too. My tool which wraps the compiler and linker changes their behavior, and alters the output of -V=full to propagate that to the build cache, but otherwise doesn't print anything extra to stdout/stderr. So nothing would break for me.

@thanm
Copy link
Member

@thanm thanm commented Oct 1, 2020

I like the counter-proposal also.

As a rookie compiler developer I used to add code like this to the compiler somewhere while testing/debugging a change:

  if os.Getenv("ENABLE_EXPERIMENTAL_FOOBAR") {
     fmt.Fprintf("enabling the experimental foobar!\n")
     <do something that causes the compiler to emit different code or DWARF>
  }

Since my new environment variable wasn't visible to build caching machinery, this me caused no end of trouble. I've since moved away from this entirely (I now use explicit command line flags, and make sure that I add my new flags to the recordFlags() call in the gc's main). The counter-proposal would make this process a little easier; just have to make sure that the "experimental" code has some trace output.

@rsc rsc changed the title proposal: cmd/go: do not cache tool output when using -toolexec proposal: cmd/go: do not cache tool output if tools print to stdout/stderr Oct 7, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Oct 7, 2020

Retitled and will include in the proposal minutes under the new title, but seems headed for a likely accept.

@cherrymui
Copy link
Contributor Author

@cherrymui cherrymui commented Oct 14, 2020

(Crosslink) #41973 is an example that will be fixed if we do this.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 14, 2020

Based on the discussion above, this seems like a likely accept.

@rsc rsc moved this from Active to Likely Accept in Proposals Oct 14, 2020
@rsc rsc modified the milestones: Backlog, Proposal Oct 14, 2020
@mvdan
Copy link
Member

@mvdan mvdan commented Oct 17, 2020

In case anyone wants to start working on this soon, heads up that I'm writing testdata/script/toolexec.txt for #15677, so it could be reused for this issue later too.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 21, 2020

No change in consensus, so accepted.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Oct 21, 2020
@rsc rsc modified the milestones: Proposal, Backlog Oct 21, 2020
@rsc rsc changed the title proposal: cmd/go: do not cache tool output if tools print to stdout/stderr cmd/go: do not cache tool output if tools print to stdout/stderr Oct 21, 2020
@bcmills
Copy link
Member

@bcmills bcmills commented May 26, 2021

It turns out that this also affects the cmd/go.TestScript/toolexec and cmd/go.TestScript/vet.

The vet test expects go vet to be silent for packages with no vet errors, but if the toolexec test ends up rebuilding shared packages in the standard library (e.g. because they are otherwise stale; see #46347), its output will pollute the vet test.

@bcmills bcmills self-assigned this May 27, 2021
@bcmills bcmills modified the milestones: Backlog, Go1.18 May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants