Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
cmd/compile: group write barrier calls more aggresively #19838
Currently, for things like
The compiler generates separate branches for v1, v3, v5. Something like:
Sometimes it creates too many blocks. (When building SSA we try to put pointer writes together exactly for this reason.)
It would be better to group them in a single branch, when it is safe. But it needs to know whether it is ok to reorder stores.
Alternatively, maybe we could duplicate v2, v4 to both side of the branch (some conditions needed, at least v2.Uses == 1). But the question is how many of them we want to duplicate. What if there are 5 no-WB stores between two WB stores?
referenced this issue
Apr 4, 2017
A simple rule of thumb would be to do whatever generates less total code. So if the source is
Compare (some approximation of) the code size between
So basically, if the size of the N stores is smaller than an additional condition (load, compare, conditional branch, unconditional branch).
referenced this issue
Apr 17, 2017
The CL series just mailed takes a look at this. The final CL needs some numbers. I'd love suggestions about what other than binary size to look at--e.g. regular benchmarks to run.
Unfortunately, my original personal motivation for digging into this--improving compilation time of large static literals--isn't helped, since that involves duplicating calls, not stores, and that's probably a bad idea. So back to the drawing board on those.
This optimization would interact poorly with safe-points everywhere, since we have to disallow safe-points between the write barrier flag check and the last write barrier that depends on it (or, if we can construct the write barrier as a restartable sequence, back up to the flag load, but that doesn't help this optimization). Though I think this optimization may be less important now, too, since we generate significantly less code for a write barrier call than we used to. We could generate even less if we pulled the write out of runtime.gcWriteBarrier.