cmd/compile: slowdown in location list generation, possible remedies

### What version of Go are you using (`go version`)?

<pre>
$ go version
go version devel go1.19-2a6e13843d Mon May 16 09:35:17 2022 +0000 darwin/amd64
</pre>

### Does this issue reproduce with the latest release?
Yes

### What operating system and processor architecture are you using (`go env`)?

<details><summary><code>go env</code> Output</summary><br><pre>
$ go env
GOARCH="amd64"
GOOS="darwin" and also "linux"
</pre></details>

[CL 397318](https://go-review.googlesource.com/c/go/+/397318) fixed a quadratic space consumption problem in location list generation (#51543) with a fancy data structure that shares storage for small changes to sets (and produced a 95% reduction in heap size for the problem case).  Unfortunately, [it's slower](https://perf.golang.org/search?q=upload:20220411.12), overall adding about 2% to build user time, but for the worst case, 35%.

So that's "the bug", here's discussion of causes and possible remedies.

The root cause is that the location list generation algorithm is currently performing a quadratic amount of work.  For each block, set operations linear in the number of live slots (intersection, difference) are performed.  In some cases the number of live slots is linear in program size, the number of blocks is linear in program size, and we get quadratic time.  (A "slot" is a variable or a piece of an aggregate-typed variable).

This is not necessarily required; clever preprocessing might allow us to notice that a block B's flow predecessors P and Q were both descendants of a common block R, therefore their intersection might be computed more efficiently by only considering flow from R to P and R to Q, which might be smaller.  This is handwavy, potentially complicated, and likely also involves operations with a noticeable constant factor, so pursuing this route would take a little work, and might not pay off.

A more certain plan for improving performance, though not the asymptotic cost, is to reduce the number of conversions between set representations.  The clever structures are currently used to record long-lived set data, the slots live at entrance and exit from each block.  The operations with a block are applied to a simple set representation, which is created and consumed each time a block's effects on live slots are modeled.  This is slightly trickier than just "skip the data conversions and use the shared sets everywhere" because the live data comes in two parts, one mapping slots to where they are found, and the other mapping registers to the slots that are currently bound to that register (this is necessary to know what associations are undone by assignment to a register).




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: slowdown in location list generation, possible remedies #52975

What version of Go are you using (`go version`)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmd/compile: slowdown in location list generation, possible remedies #52975

Description

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?