Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: branch elimination opportunites when comparing constants #37608

Open
laboger opened this issue Mar 2, 2020 · 20 comments
Open

cmd/compile: branch elimination opportunites when comparing constants #37608

laboger opened this issue Mar 2, 2020 · 20 comments

Comments

@laboger
Copy link
Contributor

@laboger laboger commented Mar 2, 2020

What version of Go are you using (go version)?

latest

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
linux/ppc64le but I see it in the SSA on x86

What did you do?

Inspecting code for other reasons, found cases where there are constant compares and subsequent branches against those constants, resulting in several unnecessary branches.

What did you expect to see?

No unnecessary branches.

What did you see instead?

Unnecessary branching due to constant compares of known values.

This example comes from the ssa output of bytes.(Buffer).WriteByte. Here are some snippets:

v18 00011 (+107) MOVD 8(R6), R7
v38 00012 (107) MOVD 16(R6), R5
v26 00013 (107) SUB R7, R5, R8
v70 00014 (107) CMP R8, $1
// This could be BLT 36. Block starting at 46 sets R3 to 0 then branches to a CMP R3, $0.
b1 00015 (107) BLT 46 <--- Change to BLT 36

v54 00019 (108) MOVD R4, 8(R6)
// Once the block at 46 is eliminated as a predecessor of this block, then the next 3 instructions are not needed, since the CMPW has only a single predecessor and it is comparing 0 against 1 and the BEQ will never happen.
v45 00020 (108) MOVD $1, R3 <---- Remove
v52 00021 (+265) CMPW R3, $0 <---- Remove
b9 00022 (+266) BEQ 36 <---- Remove

v24 00023 (+269) PCDATA $1, $1
v24 00024 (+269) MOVD 8(R6), R4
v34 00025 (+269) PCDATA $0, $2
v34 00026 (269) MOVD (R6), R5
v88 00027 (269) CMPU R4, R7

v63 00036 (267) PCDATA $1, $0
v63 00037 (+267) MOVD R6, 32(R1)
v81 00038 (267) MOVD $1, R3
v65 00039 (267) MOVD R3, 40(R1)
v66 00040 (+267) CALL "".(*Buffer).grow(SB)
v68 00041 (267) MOVD 48(R1), R7
v36 00042 (269) PCDATA $0, $1
v36 00043 (269) PCDATA $1, $1
v36 00044 (269) MOVD "".b(R1), R6
b11 00045 (267) JMP 23

// The MOVD $0, R3 and JMP 21 could be replaced by JMP 36 since the successor is CMP $0, R3 followed by BEQ 36. Once it is decided that 36 is the successor, I think the MOVD $0, R7 could also go away since the block starting at 36 doesn't use it. That also means the predecessor of this block could branch directly to 36.
v21 00046 (267) PCDATA $1, $0 <--- Remove
v21 00047 (267) MOVD $0, R7 <--- Remove
v25 00048 (267) MOVD $0, R3 <--- Remove
b2 00049 (+265) JMP 21 <---- Remove

This scenario is common in stdlib functions. It looks like the same situation occurs in x86, so it is not specific to ppc64/ppc64le.

@josharian I think I reported this problem last fall and you commented on it, but I can't find the issue where that was done.

@randall77
Copy link
Contributor

@randall77 randall77 commented Mar 2, 2020

Parts of this sound like the shortcircuit pass. Is that working correctly here?

@randall77 randall77 added this to the Unplanned milestone Mar 2, 2020
@josharian
Copy link
Contributor

@josharian josharian commented Mar 2, 2020

Here is the email that I wrote to @dr2chase this morning. It looks apropos. (Note that the situation I describe in the email occurs in block b9 in the shortcircuit pass of GOSSAFUNC="(*Buffer).WriteByte" go build bytes.)

Hi!

You know way more than me about this stuff, so I have a question for you...

I was investigating why CL 220684 causes cmd/compile to shrink. It
turns out that it enables the shortcircuit pass to work on a few
functions, and that that causes significant (>5% text size)
improvements.

So I went to look at beefing up the shortcircuit pass. As a reminder,
we're dealing with situations like:

p1         p2         (maybe p3, p4, etc.)
  \         /
   \       /
       b
       phi (true, v1)
       if phi s1 else s2
     /     \
    /       \
s1          s2

Since going from p1 to b guarantees that we'll then go to s1, we
rewrite the p1 outbound branch to go straight to s1.

This currently requires that the phi be the only value in b, and that
it have only one use. That way we can disregard it after the CFG
change.

The case I am looking at is one in which there are also other phis in
b. This isn't necessarily a lost cause, depending on where the other
uses of those phis are.

If the use is dominated by s2, but not reachable from s1 without going
through b, then we know that we went from b to s2. And we know that if
we came in to b from p1, then we would have gone from b to s1 instead.
So we know that we came in to b from p2. And therefore we know the
value of the phi is its second arg.

If the use is dominated by s1, but not reachable from s2 without going
through b, then we know that we went from b to s1. And thus, mutatis
mutandis, that we arrived at s1 through the same predecessor we would
have arrived at b though. Therefore we can move the phi value from b
to s1.

So if there aren't any phi uses whose provenance are ambiguous, then
we can rewrite all phi uses, and thus still do the shortcircuit
optimization of moving the edge.

My questions are:

If there a name for the CFG property of being dominated by s2 but not
reachable by s1 without going through b? (It's sort of like being
dominated by an edge rather than a node.) Is there a nice data
structure for calculating it? Or should I be thinking about this in a
different way?

Sorry for the long email. This stuff is easier with a whiteboard. :(

Thanks!

-josh
@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Mar 2, 2020

Sorry not to reply earlier.
I don't know of a name for this offhand, but there could be one.
I might want to do a critical-edges transformation to make it easier to reason about, that way "dominated by an edge" is also "dominated by a node", and we know that s1 does not dominate s2 and vice-versa -- which gives us that each use is either

  1. dominated by s1; move xb := phi(x1,x2) to s1.
  2. dominated by s2; s/xb/x2
  3. dominated by neither, in code that is properly dominated by b
  4. dominated by neither, input to a phi that is not properly dominated by b (might be b).

I think case 3 is potentially a mess, it might require a lot of phi insertion depending on the shape of the graph.

Case 4 depends on the source of the input flow edge; it is some block, either falling into case 1, 2, or 3.
1 and 2 are easy, 3 is a mess.

Slightly off-topic, I recall recently seeing a formulation for SSA where blocks with phis are like "functions" and their predecessors "call" them, passing parameters appropriately. This simplifies the description of case 4 above because it is converted to 1, 2, or 3, as appropriate.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 3, 2020

Sorry, David. I didn’t mean to imply that I had a sub-24-hour SLA on random personal emails asking for help!

The idea of doing a critical edge removal pass first is an intriguing one, thanks!

@josharian
Copy link
Contributor

@josharian josharian commented Mar 9, 2020

I'm working on this.

@marigonzes
Copy link

@marigonzes marigonzes commented Mar 11, 2020

@laboger, where you referring to #33713 in the initial message?

@laboger
Copy link
Contributor Author

@laboger laboger commented Mar 11, 2020

@laboger, where you referring to #33713 in the initial message?

Yes. Not sure if they are the same, i.e., will the same fix resolve both? BTW, this does occur quite a bit in the stdlib functions. I saw it several times without looking too hard.

@josharian
Copy link
Contributor

@josharian josharian commented Mar 11, 2020

This is proving to be a rich source of optimizations. The challenge is that every time I try a deeper analysis, it catches more cases and is also slower. Soon (I hope today) I'll start mailing some of the cheap should-definitely-do-this cases, and we can work incrementally from there.

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 11, 2020

Change https://golang.org/cl/222917 mentions this issue: cmd/compile: rename a local variable in shortcircuitBlock

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 11, 2020

Change https://golang.org/cl/222918 mentions this issue: cmd/compile: remove loop in shortcircuit

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 11, 2020

Change https://golang.org/cl/222919 mentions this issue: cmd/compile: more minor cleanup in shortcircuitBlock

@gopherbot
Copy link

@gopherbot gopherbot commented Mar 11, 2020

Change https://golang.org/cl/222923 mentions this issue: cmd/compile: handle some additional phis in shortcircuit

gopherbot pushed a commit that referenced this issue Mar 13, 2020
v is pretty generic. Subsequent changes will make this function
more complicated, so rename it now, independently, for easier review.

v is the control value for the block (or its underlying phi);
call it ctl.

Passes toolstash-check.

Updates #37608

Change-Id: I3fbae3344f1c95aff0a69c1e4f61ef637a54774e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Mar 13, 2020
shortcircuitBlock contained a loop to handle blocks like

b: <- p q
  v = Phi true false
If v -> t u

in a single execution.
This change makes shortcircuitBlock do it in two instead,
one for each constant phi arg.

Motivation: Upcoming changes will expand the range of
blocks that the shortcircuit pass can handle.
Those changes need to understand what the CFG
will look like after the rewrite in shortcircuitBlock.
Making shortcircuitBlock do only a single CFG
modification at a time significantly simplifies that code.

In theory, this is less efficient, but not measurably so.
There is minor, unimportant churn in the generated code.

Updates #37608

Change-Id: Ia6dce7011e3e19b546ed1e176bd407575a0ab837
Reviewed-on: https://go-review.googlesource.com/c/go/+/222918
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Mar 13, 2020
Continue to simplify, rename for clarity,
improve docs, and reduce variable scope.

This is in preparation for this function becoming
more complicated.

Passes toolstash-check.

Updates #37608

Change-Id: I630a4e07c92297c46d18aea69ec29852d6371ff0
Reviewed-on: https://go-review.googlesource.com/c/go/+/222919
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Apr 8, 2020
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.

This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.

The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.

For example, consider this CFG:

p   q
 \ /
  b
 / \
t   u

And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.

We will rewrite this CFG to:

p    q
|   /
|  b
|/  \
t    u

What should we do with v?

Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.

Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).

Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).

Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.

Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.

This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.

The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.

I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.

Here is a concrete example of code improved by this change:

func f(e interface{}) int {
	if x, ok := e.(int); ok {
		return x
	}
	return 0
}

Omitting PCDATA, FUNCDATA, and the like, it used to compile to:

"".f STEXT nosplit size=50 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	43
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:4)	JNE	33
	0x001b 00027 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x0020 00032 (x.go:5)	RET
	0x0021 00033 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x002a 00042 (x.go:7)	RET
	0x002b 00043 (x.go:7)	MOVL	$0, AX
	0x0030 00048 (x.go:4)	JMP	25

Afterwards, it compiles to:

"".f STEXT nosplit size=41 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	31
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x001e 00030 (x.go:5)	RET
	0x001f 00031 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x0028 00040 (x.go:7)	RET

Note that there is now only a single JNE and a single RET $0 path.

Updates #37608

Has a minor good effect on compilation speed and memory use.

Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.

file      before    after     Δ       %       
addr2line 4376080   4371984   -4096   -0.094% 
api       5945400   5933112   -12288  -0.207% 
asm       5034312   5030216   -4096   -0.081% 
buildid   2844952   2840856   -4096   -0.144% 
cgo       4812872   4804680   -8192   -0.170% 
compile   19622064  19610368  -11696  -0.060% 
cover     5236648   5232552   -4096   -0.078% 
dist      3658312   3654216   -4096   -0.112% 
doc       4653512   4649416   -4096   -0.088% 
fix       3370072   3365976   -4096   -0.122% 
link      6671864   6667768   -4096   -0.061% 
pprof     14781652  14761172  -20480  -0.139% 
trace     11639684  11627396  -12288  -0.106% 
vet       8252280   8231800   -20480  -0.248% 
total     115052984 114934792 -118192 -0.103% 


file                                                                     before   after    Δ       %       
internal/cpu.s                                                           3298     3296     -2      -0.061% 
internal/bytealg.s                                                       1730     1737     +7      +0.405% 
cmd/vendor/golang.org/x/mod/semver.s                                     7332     7283     -49     -0.668% 
image/color.s                                                            8248     8156     -92     -1.115% 
math.s                                                                   35966    35956    -10     -0.028% 
math/cmplx.s                                                             6596     6575     -21     -0.318% 
runtime.s                                                                480566   480053   -513    -0.107% 
sync.s                                                                   16408    16385    -23     -0.140% 
math/rand.s                                                              10447    10406    -41     -0.392% 
internal/reflectlite.s                                                   28408    28366    -42     -0.148% 
errors.s                                                                 2736     2701     -35     -1.279% 
sort.s                                                                   17031    17036    +5      +0.029% 
io.s                                                                     16993    16964    -29     -0.171% 
container/heap.s                                                         2006     1997     -9      -0.449% 
text/tabwriter.s                                                         9570     9552     -18     -0.188% 
bytes.s                                                                  31823    31594    -229    -0.720% 
strconv.s                                                                52760    52717    -43     -0.082% 
vendor/golang.org/x/text/transform.s                                     16713    16706    -7      -0.042% 
strings.s                                                                42590    42563    -27     -0.063% 
bufio.s                                                                  22883    22785    -98     -0.428% 
encoding/base32.s                                                        9586     9531     -55     -0.574% 
syscall.s                                                                82237    82243    +6      +0.007% 
image.s                                                                  37465    37452    -13     -0.035% 
regexp/syntax.s                                                          82827    82769    -58     -0.070% 
image/draw.s                                                             18698    18584    -114    -0.610% 
image/jpeg.s                                                             36560    36549    -11     -0.030% 
time.s                                                                   82557    82526    -31     -0.038% 
context.s                                                                10863    10820    -43     -0.396% 
regexp.s                                                                 64114    64049    -65     -0.101% 
os.s                                                                     51751    51524    -227    -0.439% 
reflect.s                                                                168240   168049   -191    -0.114% 
cmd/go/internal/lockedfile/internal/filelock.s                           2317     2290     -27     -1.165% 
path/filepath.s                                                          17831    17766    -65     -0.365% 
io/ioutil.s                                                              6994     6990     -4      -0.057% 
encoding/binary.s                                                        30791    30726    -65     -0.211% 
cmd/vendor/golang.org/x/sys/unix.s                                       78055    78033    -22     -0.028% 
encoding/pem.s                                                           9280     9247     -33     -0.356% 
crypto/cipher.s                                                          20376    20374    -2      -0.010% 
os/exec.s                                                                29229    29140    -89     -0.304% 
internal/goroot.s                                                        4588     4579     -9      -0.196% 
cmd/internal/browser.s                                                   2246     2240     -6      -0.267% 
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27183    27149    -34     -0.125% 
fmt.s                                                                    76625    76484    -141    -0.184% 
encoding/hex.s                                                           6154     6152     -2      -0.032% 
compress/lzw.s                                                           7063     7059     -4      -0.057% 
database/sql/driver.s                                                    18875    18862    -13     -0.069% 
debug/plan9obj.s                                                         8268     8266     -2      -0.024% 
net/url.s                                                                29724    29719    -5      -0.017% 
encoding/csv.s                                                           12872    12856    -16     -0.124% 
debug/gosym.s                                                            25303    25268    -35     -0.138% 
compress/flate.s                                                         50952    51019    +67     +0.131% 
compress/zlib.s                                                          7277     7266     -11     -0.151% 
archive/zip.s                                                            42155    42111    -44     -0.104% 
debug/dwarf.s                                                            107632   107541   -91     -0.085% 
database/sql.s                                                           98373    98028    -345    -0.351% 
os/user.s                                                                14722    14708    -14     -0.095% 
encoding/json.s                                                          105836   105711   -125    -0.118% 
debug/macho.s                                                            32598    32560    -38     -0.117% 
encoding/gob.s                                                           136478   135755   -723    -0.530% 
debug/pe.s                                                               31160    30869    -291    -0.934% 
debug/elf.s                                                              63495    63302    -193    -0.304% 
vendor/golang.org/x/text/unicode/bidi.s                                  27220    27217    -3      -0.011% 
vendor/golang.org/x/text/secure/bidirule.s                               3363     3352     -11     -0.327% 
go/token.s                                                               12036    12035    -1      -0.008% 
flag.s                                                                   22277    22256    -21     -0.094% 
mime.s                                                                   39696    39509    -187    -0.471% 
go/scanner.s                                                             19033    19020    -13     -0.068% 
archive/tar.s                                                            70936    70581    -355    -0.500% 
internal/xcoff.s                                                         22823    22820    -3      -0.013% 
text/scanner.s                                                           11631    11629    -2      -0.017% 
encoding/xml.s                                                           110534   110408   -126    -0.114% 
math/big.s                                                               183636   183545   -91     -0.050% 
image/gif.s                                                              27376    27343    -33     -0.121% 
crypto/dsa.s                                                             6029     5969     -60     -0.995% 
image/png.s                                                              42947    42939    -8      -0.019% 
crypto/rand.s                                                            6866     6854     -12     -0.175% 
vendor/golang.org/x/text/unicode/norm.s                                  66394    66354    -40     -0.060% 
runtime/trace.s                                                          2603     2521     -82     -3.150% 
crypto/ed25519.s                                                         6321     6300     -21     -0.332% 
text/template/parse.s                                                    93910    93844    -66     -0.070% 
crypto/rsa.s                                                             31460    31369    -91     -0.289% 
encoding/asn1.s                                                          57021    57023    +2      +0.004% 
crypto/elliptic.s                                                        51382    51363    -19     -0.037% 
crypto/x509/pkix.s                                                       10386    10342    -44     -0.424% 
vendor/golang.org/x/net/idna.s                                           24482    24466    -16     -0.065% 
vendor/golang.org/x/crypto/cryptobyte.s                                  33479    33280    -199    -0.594% 
crypto/ecdsa.s                                                           11936    11883    -53     -0.444% 
go/constant.s                                                            43670    42663    -1007   -2.306% 
go/ast.s                                                                 80383    80191    -192    -0.239% 
testing.s                                                                68069    68057    -12     -0.018% 
runtime/pprof.s                                                          59613    59603    -10     -0.017% 
testing/iotest.s                                                         4895     4891     -4      -0.082% 
internal/trace.s                                                         78136    78089    -47     -0.060% 
cmd/internal/goobj2.s                                                    13158    13154    -4      -0.030% 
cmd/internal/src.s                                                       17661    17657    -4      -0.023% 
go/parser.s                                                              79046    78880    -166    -0.210% 
cmd/internal/objabi.s                                                    16367    16343    -24     -0.147% 
text/template.s                                                          94899    94486    -413    -0.435% 
go/printer.s                                                             77267    76992    -275    -0.356% 
cmd/internal/goobj.s                                                     25988    25947    -41     -0.158% 
runtime/pprof/internal/profile.s                                         102066   101933   -133    -0.130% 
go/format.s                                                              5419     5371     -48     -0.886% 
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37181    37149    -32     -0.086% 
go/doc.s                                                                 74533    74132    -401    -0.538% 
html/template.s                                                          88743    88389    -354    -0.399% 
cmd/asm/internal/lex.s                                                   24881    24872    -9      -0.036% 
cmd/internal/buildid.s                                                   18263    18256    -7      -0.038% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80036    79980    -56     -0.070% 
go/build.s                                                               68905    68737    -168    -0.244% 
cmd/cover.s                                                              46070    45950    -120    -0.260% 
cmd/internal/obj.s                                                       117001   116991   -10     -0.009% 
cmd/doc.s                                                                62700    62419    -281    -0.448% 
cmd/internal/obj/arm.s                                                   66745    66687    -58     -0.087% 
cmd/compile/internal/syntax.s                                            145406   145062   -344    -0.237% 
cmd/internal/obj/wasm.s                                                  44049    44027    -22     -0.050% 
net.s                                                                    291835   291020   -815    -0.279% 
cmd/dist.s                                                               209020   208807   -213    -0.102% 
cmd/cgo.s                                                                241564   241102   -462    -0.191% 
vendor/golang.org/x/net/http/httpproxy.s                                 9407     9399     -8      -0.085% 
log/syslog.s                                                             7921     7909     -12     -0.151% 
go/types.s                                                               319325   317513   -1812   -0.567% 
vendor/golang.org/x/net/http/httpguts.s                                  3834     3825     -9      -0.235% 
mime/multipart.s                                                         21414    21343    -71     -0.332% 
cmd/internal/obj/ppc64.s                                                 119949   119938   -11     -0.009% 
cmd/compile/internal/logopt.s                                            10158    10118    -40     -0.394% 
vendor/golang.org/x/net/nettest.s                                        28012    27991    -21     -0.075% 
go/internal/srcimporter.s                                                6405     6380     -25     -0.390% 
go/internal/gcimporter.s                                                 34525    34493    -32     -0.093% 
net/mail.s                                                               23937    23720    -217    -0.907% 
go/internal/gccgoimporter.s                                              56095    56038    -57     -0.102% 
cmd/compile/internal/types.s                                             47247    47207    -40     -0.085% 
cmd/api.s                                                                39582    39558    -24     -0.061% 
cmd/go/internal/base.s                                                   12572    12551    -21     -0.167% 
cmd/vendor/golang.org/x/xerrors.s                                        17846    17814    -32     -0.179% 
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18142    18070    -72     -0.397% 
cmd/go/internal/search.s                                                 19994    19876    -118    -0.590% 
cmd/go/internal/imports.s                                                16457    16428    -29     -0.176% 
cmd/vendor/golang.org/x/mod/module.s                                     17838    17759    -79     -0.443% 
cmd/go/internal/cache.s                                                  30551    30514    -37     -0.121% 
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36356    36321    -35     -0.096% 
cmd/internal/test2json.s                                                 9452     9408     -44     -0.466% 
cmd/go/internal/mvs.s                                                    25136    25092    -44     -0.175% 
cmd/go/internal/txtar.s                                                  3488     3461     -27     -0.774% 
cmd/vendor/golang.org/x/mod/zip.s                                        18811    18800    -11     -0.058% 
cmd/go/internal/version.s                                                11213    11171    -42     -0.375% 
cmd/link/internal/benchmark.s                                            4941     4949     +8      +0.162% 
cmd/internal/obj/s390x.s                                                 126865   126849   -16     -0.013% 
cmd/gofmt.s                                                              30684    30596    -88     -0.287% 
cmd/fix.s                                                                87450    86906    -544    -0.622% 
cmd/internal/obj/x86.s                                                   88578    88556    -22     -0.025% 
cmd/vendor/golang.org/x/mod/modfile.s                                    72450    72363    -87     -0.120% 
cmd/oldlink/internal/loader.s                                            16743    16741    -2      -0.012% 
cmd/pack.s                                                               14863    14861    -2      -0.013% 
cmd/go/internal/load.s                                                   106742   106568   -174    -0.163% 
cmd/oldlink/internal/objfile.s                                           21787    21780    -7      -0.032% 
cmd/oldlink/internal/loadmacho.s                                         29309    29317    +8      +0.027% 
cmd/oldlink/internal/loadelf.s                                           35013    35021    +8      +0.023% 
cmd/asm/internal/asm.s                                                   68550    68538    -12     -0.018% 
cmd/link/internal/loader.s                                               94765    94564    -201    -0.212% 
cmd/link/internal/loadelf.s                                              35663    35667    +4      +0.011% 
cmd/link/internal/loadmacho.s                                            29501    29509    +8      +0.027% 
cmd/vendor/golang.org/x/tools/go/analysis.s                              4983     4976     -7      -0.140% 
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16771    16709    -62     -0.370% 
cmd/vendor/golang.org/x/tools/go/types/objectpath.s                      18481    18456    -25     -0.135% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100     2085     -15     -0.714% 
cmd/vendor/github.com/google/pprof/profile.s                             150141   149620   -521    -0.347% 
cmd/vendor/github.com/google/pprof/internal/measurement.s                10420    10404    -16     -0.154% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               36814    36755    -59     -0.160% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6688     6673     -15     -0.224% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9856     9784     -72     -0.731% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s             3011     2979     -32     -1.063% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9737     9682     -55     -0.565% 
cmd/vendor/golang.org/x/tools/go/cfg.s                                   30738    30725    -13     -0.042% 
cmd/vendor/github.com/ianlancetaylor/demangle.s                          175195   174513   -682    -0.389% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s          3625     3520     -105    -2.897% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           2987     2971     -16     -0.536% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4372     4340     -32     -0.732% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8634     8611     -23     -0.266% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6189     6164     -25     -0.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8089     8073     -16     -0.198% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s             2208     2177     -31     -1.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8050     8047     -3      -0.037% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3665     3629     -36     -0.982% 
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           65773    65680    -93     -0.141% 
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13328    13286    -42     -0.315% 
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12263    12162    -101    -0.824% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1459     1421     -38     -2.605% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5208     5191     -17     -0.326% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s             1801     1782     -19     -1.055% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9569     9528     -41     -0.428% 
cmd/go/internal/work.s                                                   304928   304756   -172    -0.056% 
crypto/x509.s                                                            147340   147139   -201    -0.136% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34287    34019    -268    -0.782% 
crypto/tls.s                                                             311603   310644   -959    -0.308% 
cmd/oldlink/internal/ld.s                                                533115   532651   -464    -0.087% 
cmd/oldlink/internal/wasm.s                                              16484    16458    -26     -0.158% 
cmd/oldlink/internal/x86.s                                               18832    18830    -2      -0.011% 
cmd/link/internal/ld.s                                                   548200   547626   -574    -0.105% 
cmd/link/internal/wasm.s                                                 16760    16734    -26     -0.155% 
cmd/link/internal/arm64.s                                                20850    20840    -10     -0.048% 
cmd/link/internal/x86.s                                                  17437    17435    -2      -0.011% 
net/http.s                                                               556647   555519   -1128   -0.203% 
net/http/cookiejar.s                                                     15849    15833    -16     -0.101% 
expvar.s                                                                 9521     9508     -13     -0.137% 
net/http/httptest.s                                                      16471    16452    -19     -0.115% 
cmd/vendor/github.com/google/pprof/internal/plugin.s                     4266     4264     -2      -0.047% 
net/http/cgi.s                                                           23448    23428    -20     -0.085% 
cmd/go/internal/web.s                                                    16472    16428    -44     -0.267% 
net/http/httputil.s                                                      39672    39670    -2      -0.005% 
net/rpc.s                                                                33989    33965    -24     -0.071% 
net/http/fcgi.s                                                          19167    19162    -5      -0.026% 
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5861     5857     -4      -0.068% 
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35842    35823    -19     -0.053% 
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11449    11404    -45     -0.393% 
cmd/go/internal/get.s                                                    62726    62582    -144    -0.230% 
cmd/vendor/github.com/google/pprof/internal/report.s                     80032    80022    -10     -0.012% 
cmd/go/internal/modfetch/codehost.s                                      89005    88871    -134    -0.151% 
cmd/trace.s                                                              116607   116496   -111    -0.095% 
cmd/vendor/github.com/google/pprof/internal/driver.s                     143234   143207   -27     -0.019% 
cmd/vendor/github.com/google/pprof/driver.s                              9000     8998     -2      -0.022% 
cmd/go/internal/modfetch.s                                               126300   125726   -574    -0.454% 
cmd/pprof.s                                                              12317    12312    -5      -0.041% 
cmd/go/internal/modconv.s                                                17878    17861    -17     -0.095% 
cmd/go/internal/modload.s                                                150261   149763   -498    -0.331% 
cmd/go/internal/clean.s                                                  11122    11091    -31     -0.279% 
cmd/go/internal/help.s                                                   6523     6521     -2      -0.031% 
cmd/go/internal/generate.s                                               11627    11614    -13     -0.112% 
cmd/go/internal/envcmd.s                                                 22034    21986    -48     -0.218% 
cmd/go/internal/modget.s                                                 38478    38398    -80     -0.208% 
cmd/go/internal/modcmd.s                                                 46430    46229    -201    -0.433% 
cmd/go/internal/test.s                                                   64399    64374    -25     -0.039% 
cmd/compile/internal/ssa.s                                               3615264  3608276  -6988   -0.193% 
cmd/compile/internal/gc.s                                                1538865  1537625  -1240   -0.081% 
cmd/compile/internal/amd64.s                                             33593    33574    -19     -0.057% 
cmd/compile/internal/x86.s                                               30871    30852    -19     -0.062% 
total                                                                    19343565 19311284 -32281  -0.167% 

Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@josharian
Copy link
Contributor

@josharian josharian commented May 2, 2020

Since the 1.15 window has passed, I'm going to jot down a few more related things and test cases I've noticed, just to have them written down somewhere.

(1) The shortcircuit pass can be strengthened considerably, at some compiler performance risk, by figuring out the locations of all the uses of the non-control phis. I have CLs locally that play with this.

(2) The phiopt pass could be strengthened. For example, in runtime.atoi, we find code like this:

	neg := false
	if s[0] == '-' {
		neg = true
		s = s[1:]
	}

This should be ripe for the phiopt pass to convert into code like:

  neg := s[0] == '-'
  if neg {
    s = s[1]
  }

But it doesn't because of the second conditions[0] contains a bounds check. In this case, prove later removes that bounds check. But there are other cases in the tree in which it cannot. To find them, at the end of the phiopt pass, add something like:

	for _, b := range f.Blocks {
		for _, v := range b.Values {
			if v.Op != OpPhi {
				continue
			}
			all := true
			for _, a := range v.Args {
				if a.Op != OpConstBool {
					all = false
					break
				}
			}
			if !all {
				continue
			}
			fmt.Println(f.Name, b, v)
		}
	}

(3) The shortcircuit pass currently looks for situations in which the phi control value of a block is known for a particular inbound edge because it is a constant along that edge.

But there is a structurally similar scenario that can occur in which a control value of a block is known for a particular inbound edge because its predecessor branched on that exact same control value. For example, runtime.atoi contains this code:

	if !neg && un > uint(maxInt) {
		return 0, false
	}
	if neg && un > uint(maxInt)+1 {
		return 0, false
	}

After CSE, the first thing we do is branch on !neg. And if it is false, we immediately branch again on !neg. We should be able to proceed directly to un > uint(maxInt)+1. That is, we should optimize this code to:

	if !neg {
		if un > uint(maxInt) {
			return 0, false
		}
	} else {
		if un > uint(maxInt)+1 {
			return 0, false
		}
	}

Prove can catch some repeated block controls, but not conditional ones; that's where a modified short circuit pass would help.

(4) We should consider making phiopt part of the fuse pass loops, or at least after prove. cmd/compile/internal/types.(*Type).IsInteger is a good test case. It is:

func (t *Type) IsInteger() bool {
	switch t.Etype {
	case TINT8, TUINT8, TINT16, TUINT16, TINT32, TUINT32, TINT64, TUINT64, TINT, TUINT, TUINTPTR:
		return true
	}
	return false
}

This switch case should be optimized into a single unsigned comparison. But it's too complicated for phiopt. Prove simplifies it, but it's too late for phiopt. Fuse can't handle it. We end up with pointless branching.

We might want to make phiopt a fuse function, like we recently did for shortcircuit, and run it after prove.

(5) Running shortcircuit again later helps. The trick is doing so cheaply enough without having to run a deadcode afterwards. I tried this in CL 229058 and failed. The failure is very interesting, and are the effects of doing that CL piecemeal. I still think it's a good idea, but it needs some thought and investigate before trying again.

That's a bunch of stuff. But it all falls under the umbrella of "look at the CFG and do less branching". Help with any/all of these is of course welcome.

cc @mundaym

@erifan
Copy link
Contributor

@erifan erifan commented May 6, 2020

A long time ago, I found that compared with gcc, the generation instruction of function math.IsInf is very cumbersome and there are many jumps, see https://godbolt.org/z/n8wWyG and https://godbolt.org/z/GcWh9S. There doesn't seem to be no improvement at the moment. I look forward to making it as concise as the instructions generated by gcc. I am interested in this case. But I won't be free until next quarter.

@erifan
Copy link
Contributor

@erifan erifan commented Jun 28, 2020

Hi, I'm doing the third one.

@zhangfannie
Copy link
Contributor

@zhangfannie zhangfannie commented Jul 20, 2020

If there are no other volunteers, I will look into the second one. Thank you. 😊

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 24, 2020

Change https://golang.org/cl/244737 mentions this issue: cmd/compile/ssa: optimize the derivable known branch of If block

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 3, 2020

Change https://golang.org/cl/252937 mentions this issue: cmd/compile/internal/ssa: strengthen phiopt pass

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 28, 2020

Change https://golang.org/cl/257981 mentions this issue: cmd/compile: optimize the Phi values

@erifan
Copy link
Contributor

@erifan erifan commented Oct 22, 2020

This comment is an explanation of CL https://go-review.googlesource.com/c/go/+/244737 @randall77
Take math.IsInf as an example.
func IsInf(f float64, sign int) bool {
return sign >= 0 && f > MaxFloat64 || sign <= 0 && f < -MaxFloat64
}
The IR graph before shortcircuit:
image
(before)
The control value of b2 is v15, a phi op, one of its argument is v11, which is the control value of b1. So if b2 is coming from b1, we know v11 is false, then v15 is false, and we can simply redirect b1 to b5. This is what shortcircuit does. CL_244737 can't do this optimization, because CL_244737 can't infer the value of v15 from v11. From the op of v11, CL_244737 knows the relationship of v10 and v8, from the edge b1->b5, it also knows the v11 is false, but it doesn't know the value of v15 from the above knowledge.
What is missing here is to derive the output value of phi through its inputs. This is exactly what shortcircuit does. So CL_244737 needs to be based on shortcircuit.

After shortcircuit, the graph becomes as follow, then CL_244737 can play a role.
image
(after)
If b5 if coming from b1, then from the control value (b11) of b1, CL_244737 knows that v10 > v8, the control value of b5 is v17, so CL_244737 knows that b5's true branch is the taken branch. So we can redirect b1 to b7 directly.
It needs to be pointed out that prove pass can't handle what CL_244737 does, because b5 has multiple predecessors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.