Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: restore walkinrange optimization (by moving to SSA) #30645

Open
josharian opened this issue Mar 6, 2019 · 15 comments

Comments

@josharian
Copy link
Contributor

commented Mar 6, 2019

I’m on my phone, but I suspect that CL 165617 disabled the walkinrange optimization. This makes sense—order now rewrites away OANDAND and OOROR nodes, but those are what walkinrange operates on.

One symptom (I believe) is this:

https://go-review.googlesource.com/c/go/+/165617/1/test/checkbce.go#36

We have long wanted to move walkinrange to the SSA backend; this is a good motivation.

I’m marking this as a 1.13 release blocker, since IIRC walkinrange is a significant optimization for some hot code (usually parsing/formatting code looking for bytes in a range).

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

(This is also an indicator that this optimization needs an explicit asm test.)

@ALTree ALTree changed the title cmd/compile: restore walkingrange optimization (by moving to SSA) cmd/compile: restore walkinrange optimization (by moving to SSA) Mar 7, 2019
@ALTree

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

I suspect that CL 165617 disabled the walkinrange optimization

It appears that's the case. Given

package p

func f(c uint64) bool {
	return 0x10 <= c && c < 0x20
}

on 178a2c4 (a little before that CL):

RETURN l(4) tc(1)
RETURN-list
.   AS l(4) tc(1)
.   .   NAME-p.~r1 a(true) g(1) l(3) x(8) class(PPARAMOUT) bool
.   .   LT l(4) tc(1) bool
.   .   .   CONVNOP l(4) tc(1) uint64
.   .   .   .   SUB l(4) tc(1) uint64
.   .   .   .   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64
.   .   .   .   .   LITERAL-16 l(4) tc(1) uint64
.   .   .   LITERAL-16 l(4) tc(1) uint64

on a2ace8e (after):

AS l(4) tc(1)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
.   LE l(4) tc(1) bool
.   .   LITERAL-16 l(4) tc(1) uint64
.   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64

IF l(4)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
IF-body
.   AS l(4) tc(1)
.   .   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool
.   .   LT l(4) tc(1) bool
.   .   .   NAME-p.c a(true) g(2) l(3) x(0) class(PPARAM) tc(1) used uint64
.   .   .   LITERAL-32 l(4) tc(1) uint64

RETURN l(4) tc(1)
RETURN-list
.   AS l(4) tc(1)
.   .   NAME-p.~r1 a(true) g(1) l(3) x(8) class(PPARAMOUT) bool
.   .   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool

VARKILL tc(1)
.   NAME-p..autotmp_2 a(true) l(4) x(0) class(PAUTO) esc(N) tc(1) assigned used bool

And the generated code contains two CMPQ instead of one.

@mundaym

This comment has been minimized.

Copy link
Member

commented Mar 7, 2019

Not sure if this is exactly what you're after but I've been tinkering with some simple logical optimizations for control flow graphs (it's very rough code at the moment, I was just experimenting, but it does work):

https://go-review.googlesource.com/c/go/+/165998/

This can do optimizations such as this for disjunctions (and similar optimizations for conjunctions):

x == 1 || x == 2 -> unsigned(x - 1) <= 1
x == 4 || x == 6 -> (x | 2) == 6
x != 0 || y != 0 -> (x | y) != 0

It also has some optimizations to clean up some uses of IsSliceInBounds which maybe shouldn't have been generated in the first place:

x >= 0 && IsSliceInBounds(x, y) -> IsSliceInBounds(x, y)

I scheduled it after prove since I was a bit worried it might obscure some facts.

Unfortunately it is quite unpleasant adding new optimizations, especially if you want to add handle multiple bit widths. So I was thinking of extending the rewrite rules a bit so you could write something like this and have a new mergeSuccessor function handle all the gnarly stuff around assessing cost/mergability and value/phi updating:

// x < 0 || y < 0 -> (x|y) < 0
(If (Less(64|32|16|8) x (Const(64|32|16|8) [0])) (If (Less(64|32|16|8) y (Const(64|32|16|8) [0]) yes no) no)
        && mergeSuccessor(b, 0)
        -> (If (Less(64|32|16|8) (Or(64|32|16|8) x y) (Const(64|32|16|8) [0]) yes no)
@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

Not sure if this is exactly what you're after

Yes, indeed, something like this.

I scheduled it after prove since I was a bit worried it might obscure some facts.

Makes sense. The right place to document that is an entry in passOrder.

Unfortunately it is quite unpleasant adding new optimizations

Indeed. That’s why I wrote this optimization during walk the first time around. :)

I was thinking of extending the rewrite rules a bit

I’m AFK for a while (which is also why I haven’t replied to your comments on the unsafe.Unreachable issue), and it’s not quite obvious to me whether this would work in practice. It’d be lovely if it did. Perhaps we could also move the CMOV optimizations into such a form as well.

In a pinch we could also create a new kind of rewrite rule + rulegen + runner for this kind of optimization, one geared specifically to this kind of optimization.

josharian added a commit to josharian/go that referenced this issue May 7, 2019
An acceptRange is used to check whether a byte falls in a range.
The existing code used the naive way: lo <= x <= hi.
Because bytes are unsigned, this can be rewritten as x-lo <= hi-lo.

The compiler does this optimization (modulo golang#30645),
but only when lo and hi are constants.
In this case, lo and hi are effectively constants,
but the compiler can't see that.

Change-Id: I9a4a87393915168bd59e28c8048e85e4c688aca5
@andybons

This comment has been minimized.

Copy link
Member

commented May 16, 2019

@josharian would you still like to do this for 1.13?

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

I think this is important to do for 1.13, yes. It looked to me like maybe @mundaym was going to do it. I guess I'll take a look at it.

@gopherbot

This comment has been minimized.

Copy link

commented May 20, 2019

Change https://golang.org/cl/178197 mentions this issue: cmd/compile: optimize If blocks with ConstBool phi args

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented May 27, 2019

I'm going to punt this to 1.14 after all. I finally have a working version, based very loosely on @mundaym's work. But it is definitely non-trivial, and Keith would (very reasonably) prefer to wait for 1.14 for CL 178197, on which my CL depends. I'll see about cleaning up and mailing my CL sooner rather than later, so it can be ready when the tree re-opens or in case of need.

@josharian josharian modified the milestones: Go1.13, Go1.14 May 27, 2019
gopherbot pushed a commit that referenced this issue Aug 27, 2019
While working on #30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
tomocy added a commit to tomocy/go that referenced this issue Sep 1, 2019
While working on golang#30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
t4n6a1ka added a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
While working on golang#30645, I noticed that many instances
in which the walkinrange optimization could apply
were not even being considered.

This was because of extraneous blocks in the CFG,
of the type that shortcircuit normally removes.

The change improves the shortcircuit pass to handle
most of those cases. (There are a few that can only be
reasonably detected later in compilation, after other
optimizations have been run, but not enough to be worth chasing.)

Notable changes:

* Instead of calculating live-across-blocks values, use v.Uses == 1.
  This is cheaper and more straightforward.
  v.Uses did not exist when this pass was initially written.
* Incorporate a fusePlain and loop until stable.
  This is necessary to find many of the instances.
* Allow Copy and Not wrappers around Phi values.
  This significantly increases effectiveness.
* Allow removal of all preds, creating a dead block.
  The previous pass stopped unnecessarily at one pred.
* Use phielimValue during cleanup instead of manually
  setting the op to OpCopy.

The result is marginally faster compilation and smaller code.

name        old time/op       new time/op       delta
Template          213ms ± 2%        212ms ± 2%  -0.63%  (p=0.002 n=49+48)
Unicode          90.0ms ± 2%       89.8ms ± 2%    ~     (p=0.122 n=48+48)
GoTypes           710ms ± 3%        711ms ± 2%    ~     (p=0.433 n=45+49)
Compiler          3.23s ± 2%        3.22s ± 2%    ~     (p=0.124 n=47+49)
SSA               10.0s ± 1%        10.0s ± 1%  -0.43%  (p=0.000 n=48+50)
Flate             135ms ± 3%        135ms ± 2%    ~     (p=0.311 n=49+49)
GoParser          158ms ± 2%        158ms ± 2%    ~     (p=0.757 n=48+48)
Reflect           447ms ± 2%        447ms ± 2%    ~     (p=0.815 n=49+48)
Tar               189ms ± 2%        189ms ± 3%    ~     (p=0.530 n=47+49)
XML               251ms ± 3%        250ms ± 1%  -0.75%  (p=0.002 n=49+48)
[Geo mean]        427ms             426ms       -0.25%

name        old user-time/op  new user-time/op  delta
Template          265ms ± 2%        265ms ± 2%    ~     (p=0.969 n=48+50)
Unicode           119ms ± 6%        119ms ± 6%    ~     (p=0.738 n=50+50)
GoTypes           923ms ± 2%        925ms ± 2%    ~     (p=0.057 n=43+47)
Compiler          4.37s ± 2%        4.37s ± 2%    ~     (p=0.691 n=50+46)
SSA               13.4s ± 1%        13.4s ± 1%    ~     (p=0.282 n=42+49)
Flate             162ms ± 2%        162ms ± 2%    ~     (p=0.774 n=48+50)
GoParser          186ms ± 2%        186ms ± 3%    ~     (p=0.213 n=47+47)
Reflect           572ms ± 2%        573ms ± 3%    ~     (p=0.303 n=50+49)
Tar               240ms ± 3%        240ms ± 2%    ~     (p=0.939 n=46+44)
XML               302ms ± 2%        302ms ± 2%    ~     (p=0.399 n=47+47)
[Geo mean]        540ms             541ms       +0.07%

name        old alloc/op      new alloc/op      delta
Template         36.8MB ± 0%       36.7MB ± 0%  -0.42%  (p=0.008 n=5+5)
Unicode          28.1MB ± 0%       28.1MB ± 0%    ~     (p=0.151 n=5+5)
GoTypes           124MB ± 0%        124MB ± 0%  -0.26%  (p=0.008 n=5+5)
Compiler          571MB ± 0%        566MB ± 0%  -0.84%  (p=0.008 n=5+5)
SSA              1.86GB ± 0%       1.85GB ± 0%  -0.58%  (p=0.008 n=5+5)
Flate            22.8MB ± 0%       22.8MB ± 0%  -0.17%  (p=0.008 n=5+5)
GoParser         27.3MB ± 0%       27.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Reflect          79.5MB ± 0%       79.3MB ± 0%  -0.20%  (p=0.008 n=5+5)
Tar              34.7MB ± 0%       34.6MB ± 0%  -0.42%  (p=0.008 n=5+5)
XML              45.4MB ± 0%       45.3MB ± 0%  -0.29%  (p=0.008 n=5+5)
[Geo mean]       80.0MB            79.7MB       -0.34%

name        old allocs/op     new allocs/op     delta
Template           378k ± 0%         377k ± 0%  -0.22%  (p=0.008 n=5+5)
Unicode            339k ± 0%         339k ± 0%    ~     (p=0.643 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.10%  (p=0.008 n=5+5)
Compiler          5.51M ± 0%        5.50M ± 0%  -0.13%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.14%  (p=0.008 n=5+5)
Flate              234k ± 0%         234k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           299k ± 0%         299k ± 0%  -0.05%  (p=0.008 n=5+5)
Reflect            978k ± 0%         979k ± 0%  +0.02%  (p=0.016 n=5+5)
Tar                351k ± 0%         351k ± 0%  -0.04%  (p=0.008 n=5+5)
XML                435k ± 0%         435k ± 0%  -0.11%  (p=0.008 n=5+5)
[Geo mean]         840k              840k       -0.08%

file      before    after     Δ       %
go        14794788  14770212  -24576  -0.166%
addr2line 4203688   4199592   -4096   -0.097%
api       5954056   5941768   -12288  -0.206%
asm       4862704   4846320   -16384  -0.337%
cgo       4778920   4770728   -8192   -0.171%
compile   24001568  23923792  -77776  -0.324%
cover     5198440   5190248   -8192   -0.158%
dist      3595248   3587056   -8192   -0.228%
doc       4618504   4610312   -8192   -0.177%
fix       3337416   3333320   -4096   -0.123%
link      6120408   6116312   -4096   -0.067%
nm        4149064   4140872   -8192   -0.197%
objdump   4555608   4547416   -8192   -0.180%
pprof     14616324  14595844  -20480  -0.140%
test2json 2766328   2762232   -4096   -0.148%
trace     11638844  11622460  -16384  -0.141%
vet       8274936   8258552   -16384  -0.198%
total     132520780 132270972 -249808 -0.189%

Change-Id: Ifcd235a2a6e5f13ed5c93e62523e2ef61321fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/178197
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2019

@mundaym @josharian Were either of you guys planning on submitting something here?

@mundaym

This comment has been minimized.

Copy link
Member

commented Oct 15, 2019

I'd like to make block rewrite rules more powerful but I don't think I'll be able to get it done before the freeze because it is quite fiddly (happy to help if someone else wants to take a look). I'm happy to write a pass to do this specific optimization though if that would be useful. Most of the tricky stuff I worked through in the CL I posted previously.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2019

Just finishing your CL for this specific optimization would be good.

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Oct 15, 2019

I will also dig up my work in progress and mail it, in case it is helpful. Probably this afternoon.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 15, 2019

Change https://golang.org/cl/201206 mentions this issue: cmd/compile: optimize integer-in-range checks

@josharian

This comment has been minimized.

Copy link
Contributor Author

commented Oct 15, 2019

Mailed CL 201206 in the hopes that it is helpful. I only vaguely remember where I left it, but I'm pretty sure it's not done. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.