Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: group write barrier calls more aggresively #19838

Closed
cherrymui opened this issue Apr 4, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@cherrymui
Copy link
Contributor

commented Apr 4, 2017

Currently, for things like

v0 = some memory
v1 = Store ...  v0 // write barrier
v2 = Store ...  v1 // no write barrier
v3 = Store ...  v2 // write barrier
v4 = Store ...  v3 // no write barrier
v5 = Store ...  v4 // write barrier

The compiler generates separate branches for v1, v3, v5. Something like:

v0 = some memory
if write barrier is on {
    ... // prepare args
    v1a = Call runtime.writebarrierPtr ...
} else {
    v1b = Store ...  v0 // regular store
}
v1 = Phi v1a v1b
v2 = Store ...  v1 // no write barrier
if write barrier is on {
    ... // prepare args
    v3a = Call runtime.writebarrierPtr ...
} else {
    v3b = Store ...  v2 // regular store
}
v3 = Phi v3a v3b
v4 = Store ...  v3 // no write barrier
if write barrier is on {
    ... // prepare args
    v5a = Call runtime.writebarrierPtr ...
} else {
    v5b = Store ...  v4 // regular store
}
v5 = Phi v5a v5b

Sometimes it creates too many blocks. (When building SSA we try to put pointer writes together exactly for this reason.)

It would be better to group them in a single branch, when it is safe. But it needs to know whether it is ok to reorder stores.

Alternatively, maybe we could duplicate v2, v4 to both side of the branch (some conditions needed, at least v2.Uses == 1). But the question is how many of them we want to duplicate. What if there are 5 no-WB stores between two WB stores?

@randall77

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2017

A simple rule of thumb would be to do whatever generates less total code. So if the source is

storeptr
store * N
storeptr

Compare (some approximation of) the code size between

if wbon {
  storeptrwb
} else {
  storeptr
}
store * N
if wbon {
  storeptrwb
} else {
  storeptr
}

and

if wbon {
   storeptrwb
   store *N
   storeptrwb
} else {
   storeptr
   store *N
   storeptr
}

So basically, if the size of the N stores is smaller than an additional condition (load, compare, conditional branch, unconditional branch).

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2017

I've started experimenting with this. I hope to some data tomorrow.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/42010 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/42011 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Apr 27, 2017

CL https://golang.org/cl/42013 mentions this issue.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2017

The CL series just mailed takes a look at this. The final CL needs some numbers. I'd love suggestions about what other than binary size to look at--e.g. regular benchmarks to run.

Unfortunately, my original personal motivation for digging into this--improving compilation time of large static literals--isn't helped, since that involves duplicating calls, not stores, and that's probably a bad idea. So back to the drawing board on those.

gopherbot pushed a commit that referenced this issue Apr 27, 2017

cmd/compile: minor writebarrier cleanup
This CL mainly moves some work to the switch on w.Op,
to make a follow-up change simpler and clearer.

Updates #19838

Change-Id: I86f3181c380dd60960afcc24224f655276b8956c
Reviewed-on: https://go-review.googlesource.com/42010
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

josharian added a commit to josharian/go that referenced this issue May 2, 2017

cmd/compile: allow OpVarXXX calls to be duplicated in writebarrier bl…
…ocks

OpVarXXX Values don't generate instructions,
so there's no reason not to duplicate them,
and duplicating them generates better code
(fewer branches).

This requires changing the start/end accounting
to correctly handle the case in which we have run
of Values beginning with an OpVarXXX, e.g.
OpVarDef, OpZeroWB, OpMoveWB.
In that case, the sequence of values should begin
at the OpZeroWB, not the OpVarDef.

This also lays the groundwork for experimenting
with allowing duplication of some scalar stores.

Shrinks function text sizes a tiny amount:

name        old object-bytes  new object-bytes  delta
Template           381k ± 0%         381k ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode            203k ± 0%         203k ± 0%  -0.04%  (p=0.008 n=5+5)
GoTypes           1.17M ± 0%        1.17M ± 0%  -0.01%  (p=0.008 n=5+5)
SSA               8.24M ± 0%        8.24M ± 0%  -0.00%  (p=0.008 n=5+5)
Flate              230k ± 0%         230k ± 0%    ~     (all equal)
GoParser           286k ± 0%         286k ± 0%    ~     (all equal)
Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
Tar                189k ± 0%         189k ± 0%    ~     (all equal)
XML                415k ± 0%         415k ± 0%  -0.01%  (p=0.008 n=5+5)

Updates golang#19838

Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27

josharian added a commit to josharian/go that referenced this issue May 2, 2017

cmd/compile: duplicate some scalar stores in wb blocks
DO NOT SUBMIT

[mailing for initial feedback, needs benchmark numbers]

Duplicate some scalar stores on both halves of a
writeBarrier.enabled block, to reduce branching.

There are other memory ops besides OpVarXXX and OpStore
that we could consider here: OpMove, OpZero, Op*Call,
atomic stores, OpSelect1, OpPhi.

However, OpVarXXX and OpStore are by far the most common,
and they correspond to a fixed number of instructions.

The others are all rare and are either tricky
(OpSelect1, OpPhi, atomic stores) or
correspond to potentially lots of code.

Fixes golang#19838

Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581

josharian added a commit to josharian/go that referenced this issue May 2, 2017

cmd/compile: minor writebarrier cleanup
This CL mainly moves some work to the switch on w.Op,
to make a follow-up change simpler and clearer.

Updates golang#19838

Change-Id: I86f3181c380dd60960afcc24224f655276b8956c

gopherbot pushed a commit that referenced this issue May 9, 2017

cmd/compile: allow OpVarXXX calls to be duplicated in writebarrier bl…
…ocks

OpVarXXX Values don't generate instructions,
so there's no reason not to duplicate them,
and duplicating them generates better code
(fewer branches).

This requires changing the start/end accounting
to correctly handle the case in which we have run
of Values beginning with an OpVarXXX, e.g.
OpVarDef, OpZeroWB, OpMoveWB.
In that case, the sequence of values should begin
at the OpZeroWB, not the OpVarDef.

This also lays the groundwork for experimenting
with allowing duplication of some scalar stores.

Shrinks function text sizes a tiny amount:

name        old object-bytes  new object-bytes  delta
Template           381k ± 0%         381k ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode            203k ± 0%         203k ± 0%  -0.04%  (p=0.008 n=5+5)
GoTypes           1.17M ± 0%        1.17M ± 0%  -0.01%  (p=0.008 n=5+5)
SSA               8.24M ± 0%        8.24M ± 0%  -0.00%  (p=0.008 n=5+5)
Flate              230k ± 0%         230k ± 0%    ~     (all equal)
GoParser           286k ± 0%         286k ± 0%    ~     (all equal)
Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
Tar                189k ± 0%         189k ± 0%    ~     (all equal)
XML                415k ± 0%         415k ± 0%  -0.01%  (p=0.008 n=5+5)

Updates #19838

Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27
Reviewed-on: https://go-review.googlesource.com/42011
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

josharian added a commit to josharian/go that referenced this issue May 10, 2017

cmd/compile: allow OpVarXXX calls to be duplicated in writebarrier bl…
…ocks

OpVarXXX Values don't generate instructions,
so there's no reason not to duplicate them,
and duplicating them generates better code
(fewer branches).

This requires changing the start/end accounting
to correctly handle the case in which we have run
of Values beginning with an OpVarXXX, e.g.
OpVarDef, OpZeroWB, OpMoveWB.
In that case, the sequence of values should begin
at the OpZeroWB, not the OpVarDef.

This also lays the groundwork for experimenting
with allowing duplication of some scalar stores.

Shrinks function text sizes a tiny amount:

name        old object-bytes  new object-bytes  delta
Template           381k ± 0%         381k ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode            203k ± 0%         203k ± 0%  -0.04%  (p=0.008 n=5+5)
GoTypes           1.17M ± 0%        1.17M ± 0%  -0.01%  (p=0.008 n=5+5)
SSA               8.24M ± 0%        8.24M ± 0%  -0.00%  (p=0.008 n=5+5)
Flate              230k ± 0%         230k ± 0%    ~     (all equal)
GoParser           286k ± 0%         286k ± 0%    ~     (all equal)
Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
Tar                189k ± 0%         189k ± 0%    ~     (all equal)
XML                415k ± 0%         415k ± 0%  -0.01%  (p=0.008 n=5+5)

Updates golang#19838

Change-Id: Ic5ef30855919f1468066eba08ae5c4bd9a01db27

josharian added a commit to josharian/go that referenced this issue May 10, 2017

cmd/compile: duplicate some scalar stores in wb blocks
DO NOT SUBMIT

[mailing for initial feedback, needs benchmark numbers]

Duplicate some scalar stores on both halves of a
writeBarrier.enabled block, to reduce branching.

There are other memory ops besides OpVarXXX and OpStore
that we could consider here: OpMove, OpZero, Op*Call,
atomic stores, OpSelect1, OpPhi.

However, OpVarXXX and OpStore are by far the most common,
and they correspond to a fixed number of instructions.

The others are all rare and are either tricky
(OpSelect1, OpPhi, atomic stores) or
correspond to potentially lots of code.

Fixes golang#19838

Change-Id: I0d5a68c2dfb3b6a6916b92151b3e64df533d4581
@josharian

This comment has been minimized.

Copy link
Contributor

commented May 15, 2017

Anything more on this front should wait for 1.10.

@josharian josharian modified the milestones: Go1.10, Go1.9 May 15, 2017

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

@aclements

This comment has been minimized.

Copy link
Member

commented May 18, 2018

This optimization would interact poorly with safe-points everywhere, since we have to disallow safe-points between the write barrier flag check and the last write barrier that depends on it (or, if we can construct the write barrier as a restartable sequence, back up to the flag load, but that doesn't help this optimization). Though I think this optimization may be less important now, too, since we generate significantly less code for a write barrier call than we used to. We could generate even less if we pulled the write out of runtime.gcWriteBarrier.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 18, 2018

I never got conclusive improvements when I implemented this. Let’s focus any effort on this front on improving write barriers generally. (E.g. #24686)

@josharian josharian closed this May 18, 2018

@golang golang locked and limited conversation to collaborators May 18, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.