Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: use common ancestor blocks in CSE? #16973

josharian opened this issue Sep 2, 2016 · 4 comments


Copy link

@josharian josharian commented Sep 2, 2016

package x

var p *int

func f(b bool) int {
    if b && *p == 0 {
        return 0
    return *p

The load *p occurs regardless of the value of b. That is, the compiler should treat this as equivalent to:

package x

var p *int

func f(b bool) int {
    i := *p
    if b && i == 0 {
        return 0
    return i

It does not. The latter generates better code.

The reason is that neither block containing *p is an ancestor of the other, so CSE ignores the *p values. However, they share a common ancestor in which all of the values' arguments are available.

This happened in practice in CL 22712. Perhaps CSE should be taught to recognize situations like this.

Potential downsides:

  • This might move easily-recalculated values far from their uses, causing additional spills. (Aside: I think this is what went wrong with CL 28219. And yet making tighten more aggressive causes problems as well. We clearly need a more sophisticated approach in general to scheduling, so that we can pick up all these nickels without losing any dollar bills along the way.)
  • This might be computationally expensive, and CSE was long a problem child for performance.

cc @randall77 @dr2chase @aclements @bradfitz @tzneal

If we fix this, we should consider reverting CL 22712.

@josharian josharian added this to the Unplanned milestone Sep 2, 2016

This comment has been minimized.

Copy link

@dr2chase dr2chase commented Sep 2, 2016

0: common ancestor when all descendants compute E.
1: common ancestor when all non-panic descendants compute E.
2: common ancestor when all high-frequency (within-loop) descendants compute E.
3: common ancestor when all high-frequency (according to profiling) descendants compute E.

It might be less of a problem child if we're careful; I recall a lot of the work goes into the partitioning, less into the finding common paths.

Recall we already sort by dominator entry-number -- if there is a common ancestor to a bunch of nodes, then the first one is leftmost at its depth, and all the intervals (modulo numbering offset games, which we can deal with in a minute) must fill the interval of the common ancestor. There's some generational glitches -- suppose a is ancestor, and a.LL, a.LR, a.RRL, a.RRR, a.RL all have a common E, then LL+LR implies L, RRL+RRR implies RR, RR+RL implies R, L+R implies A.

This might be a clever little recursive algorithm that is not that expensive.

Handling the more complex cases might work best if we compute a dominator tree/forest for a flow graph that doesn't include panic edges, and we wouldn't want to extend lifetimes unnecessarily anyway just because a block had a common ancestor with a panic that shared a CSE. I.e., we could compute a different dominator numbering and use that for CSE purposes to get the effect of (1) above


This comment has been minimized.

Copy link

@randall77 randall77 commented Sep 2, 2016

We need to be careful about moving things which can fault. For instance:

if b {
  if p != nil { ... = *p }
} else {
  if p != nil { ... = *p }

We can't move *p up to the common ancestor of the two loads.


This comment has been minimized.

Copy link
Contributor Author

@josharian josharian commented Sep 15, 2016

Keith, I think your recent CL making nil checks participate in the memory store chain should help with the last concern.


This comment has been minimized.

Copy link

@momchil-velikov momchil-velikov commented Oct 14, 2016

FWIW, I've written recently a code hoisting pass here (it passes all tests with ssa/check/on, except some in nilptr3.go, where it hoists some nil checks).

With this pass, the function in the first post compiles to:

"".f t=1 size=40 args=0x10 locals=0x0
    0x0000 00000 (f.go:5)   TEXT    "".f(SB), $0-16
    0x0000 00000 (f.go:5)   FUNCDATA    $0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)
    0x0000 00000 (f.go:5)   FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (f.go:6)   MOVQ    "".p(SB), AX
    0x0007 00007 (f.go:6)   MOVQ    (AX), AX
    0x000a 00010 (f.go:5)   MOVBLZX "".b+8(FP), CX
    0x000f 00015 (f.go:5)   TESTB   CL, CL
    0x0011 00017 (f.go:6)   JEQ 24
    0x0013 00019 (f.go:6)   TESTQ   AX, AX
    0x0016 00022 (f.go:6)   JEQ $0, 30
    0x0018 00024 (f.go:9)   MOVQ    AX, "".~r1+16(FP)
    0x001d 00029 (f.go:9)   RET
    0x001e 00030 (f.go:7)   MOVQ    $0, "".~r1+16(FP)
    0x0027 00039 (f.go:7)   RET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
4 participants
You can’t perform that action at this time.