Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: slow escape analysis in large package in the typescript compiler #72815

Open
Jorropo opened this issue Mar 12, 2025 · 22 comments
Open
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Other None of the above. Performance
Milestone

Comments

@Jorropo
Copy link
Member

Jorropo commented Mar 12, 2025

Go version

go version go1.24.1 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v3'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/tmp/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1913709825=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/hugo/k/go/src/go.mod'
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.1'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I've tried compiling the new typescript compiler and it took 70s:

________________________________________________________
Executed in   70.51 secs    fish           external
   usr time  159.17 secs  543.00 micros  159.17 secs
   sys time    5.49 secs  249.00 micros    5.49 secs

What tipped me off to an issue is the poor 160 ÷ 70 ≈ 2.3 multi-core utilization.

The biggest outlier is github.com/microsoft/typescript-go/internal/checker:

github.com/microsoft/typescript-go/internal/checker

________________________________________________________
Executed in   44.97 secs    fish           external
   usr time   50.51 secs  413.00 micros   50.51 secs
   sys time    0.32 secs  142.00 micros    0.32 secs

A CPU profile is very suspicious, almost all of the time is spent here:
Image

I've added a couple of debug statements in theses loops:
There is a suspicious:

walkAll 36466 <nil> <nil>

36466 is the length of the queue.
This steadily slowly goes down, walkOne roughly does ~5000 iterations for each iteration of walkAll

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 12, 2025
@gabyhelp gabyhelp added the Other None of the above. label Mar 12, 2025
@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

I've tried to bisect it but 1.22 is the last version I could easily test as this is a modern codebase and it is just as bad.

@jakebailey
Copy link

jakebailey commented Mar 12, 2025

I said this on the gopher slack, but I'm not sure entirely if this is a regression in Go or just our package scaling poorly as it grew during the port; I plan to gather some data over all of the commits in the repo to see what that looks like.

Of course, a bug in the compiler would certainly be "good news" from the PoV that we wouldn't have to figure out how to break apart the monster.

@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

For context this package has 2652 functions, 1669 are *checker.Checker methods.

@prattmic prattmic added this to the Backlog milestone Mar 12, 2025
@prattmic prattmic changed the title cmd/compile: compiling the typescript compiler is slow cmd/compile: slow escape analysis in large package in the typescript compiler Mar 12, 2025
@prattmic
Copy link
Member

cc @golang/compiler

@Jorropo Jorropo added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 12, 2025
@dr2chase
Copy link
Contributor

If there's a copy-paste recipe for building the new typescript compiler to show this problem, that would help. I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

@jakebailey
Copy link

The repo doesn't require any special tooling to build any of the Go code there at all, so it's just:

$ git clone https://github.com/microsoft/typescript-go.git
$ cd typescript-go
$ go build ./internal/checker

@Jorropo
Copy link
Member Author

Jorropo commented Mar 12, 2025

I could probably figure it out, but the bug author already knows the answer and that way we'll know that we are looking at the same problem.

How am I gonna claim the fix if I help you investigating it ?
Jokes aside, mb 😄

@prattmic
Copy link
Member

FWIW, after internal/checker (60s), the next slowest package is internal/ast, which takes 16s on my machine. That appears to be a different issue. Escape analysis doesn't show up at all. Nothing in particular stands out to me in the profile.

@Jorropo

This comment has been minimized.

@prattmic
Copy link
Member

Correct.

internal/checker (60 wall-s): https://pprof.host/jc4g/flamegraph
internal/ast (16 wall-s): https://pprof.host/j84g/flamegraph
runtime (5 wall-s): https://pprof.host/j44g/flamegraph

I include runtime for reference. It seems to be bigger and builds much faster, but the profiles are fairly similar. Most obvious difference is more time in noder.MakeWrappers vs runtime, but that is still much less than SSA.

@dr2chase
Copy link
Contributor

dr2chase commented Mar 12, 2025

I experimentally turned off inlining, and 44-user-second builds turned into 20-user-second builds.

~/work/src/typescript-go$ time go build -gcflags=all=-d=fmahash=1010101010101010101010101 ./internal/checker

real	0m28.298s
user	0m43.952s
sys	0m2.218s
~/work/src/typescript-go$ time go build -gcflags=all=-l\ -d=fmahash=11010101010101010101010101 ./internal/checker

real	0m11.312s
user	0m21.030s
sys	0m1.863s

So, hmmm.

The -d=fmahash parameter is just an irrelevant difference in flags that will guarantee everything gets recompiled.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657295 mentions this issue: cmd/compile/internal/escape: targeted optimization when analyzing many locations

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657315 mentions this issue: cmd/compile/escape: cache b.outlives(root, l) in walkOne

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657179 mentions this issue: cmd/compile/internal/escape: improve order of work to speed up analyzing many locations

@thepudds
Copy link
Contributor

thepudds commented Mar 13, 2025

FWIW, I took a quick stab at trying to speed things up in escape analysis for large packages such as typescript-go/internal/checker, and sent two CLs: https://go.dev/cl/657295 and https://go.dev/cl/657179.

The build times reported via the action graph times show a reasonable improvement for typescript-go/internal/checker:

go1.24.0:      91.792s
cl-657179-ps1: 17.578s

with timing via:

# build CL 657179 via gotip, then use it to build typescript-go/internal/checker
$ go install golang.org/dl/gotip@latest
$ gotip download 657179    # download and build CL 657179
$ gotip build -a -debug-actiongraph=/tmp/actiongraph-cl-657179-ps1 -v github.com/microsoft/typescript-go/internal/checker

# see a report on the timing from the action graph
$ go install github.com/icio/actiongraph@latest
$ actiongraph top -f /tmp/actiongraph-cl-657179-ps1

The CLs pass the trybots, but definitely consider these tentative results (including still WIP and I want to look at more results, look at the performance of some other large packages, double-check things, and in general step back & think a bit more about correctness, etc.). Depending how you count, it's effectively ~3-4 changes between the two CLs, and I haven't teased apart if one or more of those changes might not be useful.

That said, I have some cautious hope things can be sped up for the escape analysis piece, perhaps via something like these CLs, or perhaps via something else.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/657077 mentions this issue: sweet: add typescript-go to go-build benchmark

@jakebailey
Copy link

jakebailey commented Mar 13, 2025

Amazing job on those speedups! 5x would be so good.


I let my machine go through the git history and collect data on how long go build -a ./... takes to run over time. Forgive the unreadable text, but:

Image

So, it does seem somewhat organic. (Not that "organic" growth doesn't imply there aren't things to improve, obviously.)

However, that big cliff about near the centerish comes from microsoft/typescript-go@bcce040.

Which shows that this one commit increased escape analysis time in the checker package by nearly 20 seconds. That seems unusally large for the change made in that commit.

gopherbot pushed a commit to golang/benchmarks that referenced this issue Mar 13, 2025
Building this repository revealed some inefficiencies in the compiler.
This change adds the main command from this repository (tsgo) as a
benchmark of `go build -a` (a cold build) so we can track improvements
and hopefully catch any future regressions.

For golang/go#72815.

Change-Id: I8e01850b7956970000211cce50f200c3e38e54af
Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/657077
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@dr2chase
Copy link
Contributor

dr2chase commented Mar 13, 2025

I was looking into whether it would make sense to run escape analysis in parallel, and in the process found that there is a strongly connected component of 1305 functions in the call graph. I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping, so with some work we could break that up into 1305 individual functions, which might also save time and would also reduce the wall time, if not the user time.

Here's the beginning of the list:

github.com/microsoft/typescript-go/internal/checker.(*Checker).resolveEntityName,
(*Checker).resolveQualifiedName,
(*Checker).getSymbol,
(*Checker).getSymbolFlagsEx,
(*Checker).getTypeOnlyAliasDeclarationEx,
(*Checker).resolveSymbolEx,
(*Checker).resolveAlias,
(*Checker).getTargetOfAliasDeclaration,
(*Checker).getTargetOfImportEqualsDeclaration,
(*Checker).resolveExternalModuleTypeByLiteral,
and 1295 more

@Jorropo
Copy link
Member Author

Jorropo commented Mar 13, 2025

If my english is not failing me I think the following sentence is wrong.

I think-but-am-not-sure that calls in such a cyclic subgraph are modeled as escaping

package a

type node struct {
	next *node
}

func stackAllocatedLinkedList(prev *node, budget uint) {
	if budget == 0 {
		return
	}
	someOtherFunction(&node{prev}, budget-1)
}

//go:noinline
func someOtherFunction(prev *node, budget uint) {
	stackAllocatedLinkedList(&node{prev}, budget-1)
}

See how it creates a linked list across the stack frames:

    00010 (+11) MOVQ AX, command-line-arguments..autotmp_3-8(SP)
    00011 (11) DECQ BX
    00012 (11) LEAQ command-line-arguments..autotmp_3-8(SP), AX
    00013 (11) PCDATA $1, $1
    # live at call to someOtherFunction:
    00014 (11) CALL command-line-arguments.someOtherFunction(SB)

I don't know how much real world code this optimization helps.

@mcy
Copy link

mcy commented Mar 14, 2025

I don't know how much real world code this optimization helps.

A lot of high-throughput compiler-flavored code depends on this. The example here is contrived. A better example would be a recursive function that uses some kind of cursor type to walk levels of a tree (e.g. imagine walking a btree). If each step needs to create a new cursor and pass it to the callee, each cursor will now wind up on the heap. This will also trigger if the callee needs to be passed e.g. a mere int out parameter.

Any graph walking algorithm that was previously allocation-free will now allocate in every frame, generating a lot of surprise garbage.

I wouldn't be surprised if this made the go compiler itself slower on average, due to missed optimizations when compiling itself...

@dr2chase
Copy link
Contributor

@Jorropo thanks very much for checking that. 10 years ago, escape analysis didn't do this, @mdempsky's rewrite made it better. There may still be some domain-specific hacks to make this faster, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Other None of the above. Performance
Projects
Development

No branches or pull requests

8 participants