Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: go run and go tool are slower than directly executing cached binary #71733

Open
dottedmag opened this issue Feb 13, 2025 · 8 comments
Open
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dottedmag
Copy link
Contributor

dottedmag commented Feb 13, 2025

Go version

go version go1.24.0 darwin/arm64

Output of go env in your module/workspace:

AR='ar'
CC='clang'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='clang++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/Users/dottedmag/Library/Caches/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/dottedmag/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/95/vzwcv5yd32x369z0c9t4bfr00000gn/T/go-build3662448907=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/Users/dottedmag/tmp/gr-go-run/go.mod'
GOMODCACHE='/Users/dottedmag/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/dottedmag/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/dottedmag/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.0.darwin-arm64'
GOSUMDB='sum.golang.org'
GOTELEMETRY='off'
GOTELEMETRYDIR='/Users/dottedmag/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/dottedmag/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.0.darwin-arm64/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.24.0'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I have tried running go run and go tool with trivial programs, expecting them to exit nearly instantly.

Instead I'm seeing run times of ~50ms on Mac M1 Max with warm caches.

Here's a repository with a reproducer: https://github.com/dottedmag/gr-go-run

Run go test -bench=..

The basis for a comparison is a tool I wrote some time ago, before link-caching has been merged to Go. It uses a cache key computation algorithm that is fairly close to the original one (borrowing some code directly from Go), and still outperfroms go run 5ms to 50ms.

What did you see happen?

% go test -bench=.
goos: darwin
goarch: arm64
pkg: gr-go-run
cpu: Apple M1 Max
BenchmarkGr-10       	    237	  4570610 ns/op
BenchmarkGoRun-10    	     22	 51377985 ns/op
PASS
ok  	gr-go-run	3.038s

What did you expect to see?

go run or go tool are expected to be at least on par with an external tool that does not hook into the compilation process, now that the linker outputs are cached.

@seankhliao
Copy link
Member

is this a corporate mac with endpoint security software running on it?

@dottedmag
Copy link
Contributor Author

dottedmag commented Feb 13, 2025

Nope, no entrerprise-y shenanigans.

Apple's Gatekeeper is enabled, of course, but it should affect all the solutions equally.

@dmitshur dmitshur added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Feb 14, 2025
@dmitshur dmitshur added this to the Backlog milestone Feb 14, 2025
@dmitshur
Copy link
Contributor

CC @matloob, @samthanawalla.

@seankhliao seankhliao changed the title cmd/go: go run and go tool are still quite slow (50ms on Mac M1 Max with warm caches) cmd/go: go run and go tool are slower than directly executing cached binary Feb 22, 2025
@iwahbe
Copy link

iwahbe commented Mar 12, 2025

I'm seeing this as well.

@matloob
Copy link
Contributor

matloob commented Mar 13, 2025

The caching support for go run and go tool caches the output of the link step in the build. We still need to run all the actions in the build graph (looking up the cached output using the action id). We also still need to do module and package loading.

I'm sure there's a lot we can do to make the go command faster, and I would love to see it become faster, but this level of performance is pretty much what we expect.

I would definitely like to hear about use cases that are adversely impacted by this.

We would also definitely welcome changes that improve the performance of the go command without impacting its complexity.

@dottedmag
Copy link
Contributor Author

I would definitely like to hear about use cases that are adversely impacted by this.

With a fast caching Go could be used to write a wrapper around CLI tools. 5ms is acceptable, but 50ms is already perceptible for interactive invocation, and if it is a wrapper around a often-called tool (e.g. a wrapper for a compiler) then these milliseconds begin to add up quickly.

@iwahbe
Copy link

iwahbe commented Mar 14, 2025

My use case is around build tooling: wrapping a utility for integrating Go into Makefiles. I need to run the wrapped utility ~30 times per make invocation, so speed is critical (and developers wait in real time). Each invocation runs for ~20ms, and a >100ms overhead is unacceptable.

@dottedmag I've tried to work around this problem by fetching the tool path with go tool -n ${YOUR_TOOL}, and then invoking the binary directly. If you have somewhere to save state, that worked pretty well for me. It hit #72824 in CI though...

@dottedmag
Copy link
Contributor Author

@iwahbe

I've tried to work around this problem by fetching the tool path with go tool -n ${YOUR_TOOL}, and then invoking the binary directly.

The trickiest part is cache revalidation, as usual. I'm striving for a replacement for "go run" — a tool that can be used without thinking about stale caches.

With gr (see above) it's already much faster than running go list. I haven't spent much time optimizing it though, so I guess there's still a lot of performance left on the table. Now that I think about it, I have an idea how to get it down to a small number of syscalls with an follow-up execve() with little logic in between. This still isn't free, but I guess I could try to cut it down to under 1ms.

I understand that in your use-case you may be reasonably sure that the source code of your wrapper does not change under you as you're building things. In mine it's a source of frustration when I or somebody else on the team change branches and then 30 minutes later figure out the tool was stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants