Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: slowdown in location list generation, possible remedies #52975

Open
dr2chase opened this issue May 18, 2022 · 0 comments
Open

cmd/compile: slowdown in location list generation, possible remedies #52975

dr2chase opened this issue May 18, 2022 · 0 comments
Assignees
Labels
NeedsDecision ToolSpeed
Milestone

Comments

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented May 18, 2022

What version of Go are you using (go version)?

$ go version
go version devel go1.19-2a6e13843d Mon May 16 09:35:17 2022 +0000 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOOS="darwin" and also "linux"

CL 397318 fixed a quadratic space consumption problem in location list generation (#51543) with a fancy data structure that shares storage for small changes to sets (and produced a 95% reduction in heap size for the problem case). Unfortunately, it's slower, overall adding about 2% to build user time, but for the worst case, 35%.

So that's "the bug", here's discussion of causes and possible remedies.

The root cause is that the location list generation algorithm is currently performing a quadratic amount of work. For each block, set operations linear in the number of live slots (intersection, difference) are performed. In some cases the number of live slots is linear in program size, the number of blocks is linear in program size, and we get quadratic time. (A "slot" is a variable or a piece of an aggregate-typed variable).

This is not necessarily required; clever preprocessing might allow us to notice that a block B's flow predecessors P and Q were both descendants of a common block R, therefore their intersection might be computed more efficiently by only considering flow from R to P and R to Q, which might be smaller. This is handwavy, potentially complicated, and likely also involves operations with a noticeable constant factor, so pursuing this route would take a little work, and might not pay off.

A more certain plan for improving performance, though not the asymptotic cost, is to reduce the number of conversions between set representations. The clever structures are currently used to record long-lived set data, the slots live at entrance and exit from each block. The operations with a block are applied to a simple set representation, which is created and consumed each time a block's effects on live slots are modeled. This is slightly trickier than just "skip the data conversions and use the shared sets everywhere" because the live data comes in two parts, one mapping slots to where they are found, and the other mapping registers to the slots that are currently bound to that register (this is necessary to know what associations are undone by assignment to a register).

@dr2chase dr2chase added ToolSpeed NeedsDecision labels May 18, 2022
@dr2chase dr2chase added this to the Backlog milestone May 18, 2022
@dr2chase dr2chase self-assigned this May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision ToolSpeed
Projects
None yet
Development

No branches or pull requests

1 participant