Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/compile: appropriately disable garbage collector #24299

Closed
pciet opened this issue Mar 7, 2018 · 13 comments

Comments

Projects
None yet
6 participants
@pciet
Copy link
Contributor

commented Mar 7, 2018

Disabling the garbage collector in cmd/compile via debug.SetGCPercent(-1) saves significant time according to compilebench. Since the compile command is used per-package there are cases where the garbage collector is freeing memory the OS is about to free anyway, so this proposal is to define a feature to disable the collector in certain cases for cmd/compile.

name       old time/op       new time/op       delta
Template         172ms ± 1%        162ms ± 1%   -5.88%  (p=0.000 n=9+8)
Unicode         80.2ms ± 2%       74.3ms ± 1%   -7.38%  (p=0.000 n=10+8)
GoTypes          572ms ± 1%        542ms ± 0%   -5.24%  (p=0.000 n=9+9)
Compiler         2.63s ± 1%        2.50s ± 1%   -4.91%  (p=0.000 n=10+10)
SSA              6.67s ± 1%        6.27s ± 0%   -6.00%  (p=0.000 n=10+10)
Flate            111ms ± 1%        105ms ± 3%   -5.44%  (p=0.000 n=9+10)
GoParser         137ms ± 1%        130ms ± 1%   -5.38%  (p=0.000 n=8+9)
Reflect          365ms ± 1%        346ms ± 1%   -5.13%  (p=0.000 n=9+10)
Tar              161ms ± 1%        155ms ± 4%   -3.56%  (p=0.004 n=10+10)
XML              193ms ± 1%        185ms ± 1%   -4.09%  (p=0.000 n=9+9)
StdCmd           16.7s ± 1%        12.8s ± 0%  -23.46%  (p=0.000 n=9+10)


name       old user-time/op  new user-time/op  delta
Template         221ms ± 4%        165ms ± 8%  -25.32%  (p=0.000 n=10+10)
Unicode          112ms ± 7%         77ms ± 7%  -31.07%  (p=0.000 n=10+10)
GoTypes          718ms ± 3%        564ms ± 2%  -21.50%  (p=0.000 n=10+10)
Compiler         3.31s ± 2%        2.60s ± 1%  -21.59%  (p=0.000 n=10+10)
SSA              8.75s ± 2%        6.53s ± 1%  -25.38%  (p=0.000 n=10+10)
Flate            135ms ± 8%        105ms ± 8%  -22.49%  (p=0.000 n=10+10)
GoParser         172ms ± 3%        135ms ± 2%  -21.22%  (p=0.000 n=8+9)
Reflect          448ms ± 3%        350ms ± 2%  -21.92%  (p=0.000 n=9+9)
Tar              202ms ± 9%        160ms ± 3%  -21.01%  (p=0.000 n=10+9)
XML              242ms ± 4%        185ms ± 6%  -23.34%  (p=0.000 n=10+10)


name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%   -0.03%  (p=0.005 n=10+10)
Unicode         28.8MB ± 0%       28.8MB ± 0%     ~     (p=0.093 n=10+10)
GoTypes          112MB ± 0%        112MB ± 0%   -0.01%  (p=0.029 n=10+10)
Compiler         466MB ± 0%        466MB ± 0%     ~     (p=0.105 n=10+10)
SSA             1.48GB ± 0%       1.48GB ± 0%     ~     (p=0.105 n=10+10)
Flate           24.3MB ± 0%       24.3MB ± 0%   -0.04%  (p=0.002 n=10+10)
GoParser        30.7MB ± 0%       30.7MB ± 0%   -0.04%  (p=0.000 n=9+10)
Reflect         76.3MB ± 0%       76.3MB ± 0%   -0.02%  (p=0.000 n=7+10)
Tar             39.2MB ± 0%       39.2MB ± 0%   -0.03%  (p=0.002 n=10+9)
XML             41.5MB ± 0%       41.4MB ± 0%   -0.02%  (p=0.000 n=10+9)


name       old allocs/op     new allocs/op     delta
Template          385k ± 0%         385k ± 0%   -0.03%  (p=0.000 n=10+10)
Unicode           342k ± 0%         342k ± 0%     ~     (p=0.118 n=10+10)
GoTypes          1.19M ± 0%        1.19M ± 0%   -0.02%  (p=0.000 n=9+10)
Compiler         4.52M ± 0%        4.52M ± 0%   -0.00%  (p=0.000 n=10+10)
SSA              12.2M ± 0%        12.2M ± 0%   -0.00%  (p=0.000 n=9+10)
Flate             234k ± 0%         234k ± 0%   -0.04%  (p=0.000 n=10+10)
GoParser          318k ± 0%         317k ± 0%   -0.03%  (p=0.000 n=10+8)
Reflect           974k ± 0%         974k ± 0%   -0.01%  (p=0.000 n=10+10)
Tar               395k ± 0%         395k ± 0%   -0.03%  (p=0.000 n=10+9)
XML               404k ± 0%         404k ± 0%   -0.02%  (p=0.000 n=10+10)

(with go version devel +1b1c8b3 Sat Feb 17 18:35:41 2018 +0000 linux/amd64, four cores, and 'performance' CPU frequency governor)

Running the benchmark and compiling the Go toolchain worked on an 8GB linux/amd64 computer with the garbage collector disabled.

Two concerns from https://groups.google.com/forum/#!topic/golang-dev/atj2hJIJj4o are for limited systems such as the Raspberry Pi and for large packages that may be created by generating code, but a conclusion is that there may be a careful worthwhile cmd/compile change to make.

I plan to report results here from:

  • how low can memory be limited on my 8GB linux/amd64 computer with and without GC enabled
  • adding a large generated code case to compilebench

@gopherbot gopherbot added this to the Proposal milestone Mar 7, 2018

@gopherbot gopherbot added the Proposal label Mar 7, 2018

@ALTree

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

Potentially unbounded memory grow while compiling for a ~20% speed-up in compilation times for a typical package seems a bad trade-off. I this had the potential to cut in half compilation times it could be worth it, but 20% is not much. CPU-time reduction is also very small, I wonder if filling up the memory when compiling many packages in parallel could make it even less worthwhile.

And anyway users can already do this with GOCG=off.

this proposal is to define a feature to disable the collector in certain cases for cmd/compile.

which cases? It's not clear from the proposal. Like on certain systems? Or when compiling certain packages? Or both?

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

Potentially unbounded memory grow while compiling for a ~5% speed-up in compilation times for a typical package seems a bad trade-off. I this had the potential to cut in half compilation times it could be worth it, but 5% is not much.

I may misunderstand, but I think the benchmark means 5% in kernel and 20-25% in application, which is a noticeable difference by a person.

which cases? It's not clear from the proposal. Like on certain systems? Or when compiling certain packages? Or both?

We’re missing data from everything not amd64/linux. I’d like to try with large open source projects. @ianlancetaylor mentioned very large packages built by generating code. The proposal is to define these cases and the feature that meets all needs. My thought and guess is disabling it may help 80% of people without any crashing, and otherwise we can reenable it by checking something.

And anyway users can already do this with GOCG=off.

Yes, but I’d prefer to not worry about that and a free noticeable improvement is good for the project.

@ALTree

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

Would you mind if I'd label this proposal as "on hold" until you have all the data you need to come up with a concrete plan? The proposal process is usually used for concrete proposals with most of the details already worked out.

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

@ALTree that's fine. I could go back to the golang-dev thread and work through it there too. Thanks.

@agnivade

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

I am a bit apprehensive about this. IMO, this really feels like a slippery slope. There are whole lot of cases where disabling GC gives a boost. But changing the compiler to dynamically switch off GC seems like a cop-out to me.

We should instead optimize the runtime further instead of switching off GC to improve performance. Especially when there is already a switch (GOGC) exposed to the user.

@ALTree ALTree added the Proposal-Hold label Mar 7, 2018

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2018

On the disabling front: what about a goroutine in cmd/compile that periodically (every 100ms?) checks memory usage and turns on regular garbage collection and returns if over a platform default that can be adjusted with an environment variable?

@mvdan

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

There's always making the compiler generate less garbage. For example, at the moment it parses files via cmd/compile/internal/syntax, and translates that AST to cmd/compile/internal/gc's. That results in every AST node being allocated twice, and lots of little objects for the GC to keep track of.

That will eventually be cleaned up, though. I would imagine that once the compiler gets better at generating less garbage, turning the GC off will have less of an impact.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

Long term plan is indeed to use less memory. Skipping the intermediate ast is one big piece of that. The other is lazy importing, since much of what gets imported is unused. @mdempsky is actively working on the latter, I believe.

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2018

That will eventually be cleaned up, though. I would imagine that once the compiler gets better at generating less garbage, turning the GC off will have less of an impact.

If this isn’t the case by the end of the Go 1.11 development cycle then I think invisibly (no regressions because of memory use changes) disabling the collector for (assumed) widespread 10-30% time reduction is the right move. This would require a Go 1.12 issue to revisit the workaround.

@mvdan

This comment has been minimized.

Copy link
Member

commented Mar 9, 2018

That seems to imply that making the compiler 5% faster is an immediate priority. Sure, the compiler could be faster, and it is made faster every release. But I don't see the need for this kind of urgency, especially when this workaround has many potential downsides. And also since it's already available via GOGC.

For example, if one compiles very large programs, I wouldn't be surprised if turning off the GC doubled the peak memory use of the compiler. Have you measured the downsides to disabling the GC in any way? Also remember the machines that have low RAM - if I remember correctly, even with GC on some ARM builders were having issues with memory.

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2018

That seems to imply that making the compiler 5% faster is an immediate priority.

5% doesn’t seem worth a workaround effort, but 20-30% does to me. I may be misunderstanding the benchmark. The benchmark system is Ubuntu server without a GUI, and I’m assuming the time and user-time add.

For example, if one compiles very large programs, I wouldn't be surprised if turning off the GC doubled the peak memory use of the compiler. Have you measured the downsides to disabling the GC in any way?

From compilebench I assume the total allocations (without each package’s memory being released to the OS taken into account) generally don’t go beyond a few GB, which is fine for a typical desktop compile. Disabling the GC doesn’t change that number, and I assume most large programs consist of packages that fit within the compilebench constraints.

Also remember the machines that have low RAM - if I remember correctly, even with GC on some ARM builders were having issues with memory.

Having a dynamic reenable like I suggested earlier would cover these cases. An idea is that if memory is overused then the memory threshold environment variable could be updated by the toolchain (so all future compiles use that) then the compile could be retried. Worst case the GC is back to always on and the user didn’t see any difference.

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2018

linux/amd64 with 1 GB of memory (kernel flag mem=1G) and spinning hard drive:

name       old time/op       new time/op       delta
Template         208ms ±12%        228ms ±11%     +9.93%  (p=0.010 n=10+9)
Unicode          109ms ±37%        128ms ±65%       ~     (p=0.218 n=10+10)
GoTypes          592ms ± 4%        589ms ± 5%       ~     (p=0.661 n=10+9)
Compiler         2.74s ± 4%        2.70s ± 3%       ~     (p=0.095 n=10+9)
SSA              6.79s ± 2%      161.30s ±14%  +2276.02%  (p=0.000 n=10+10)
Flate            111ms ± 6%        600ms ±20%   +439.18%  (p=0.000 n=8+10)
GoParser         144ms ±17%        293ms ± 7%   +103.54%  (p=0.000 n=10+10)
Reflect          378ms ± 4%        395ms ± 7%     +4.27%  (p=0.043 n=10+9)
Tar              180ms ± 1%        257ms ±10%    +43.12%  (p=0.000 n=8+9)
XML              215ms ±10%        230ms ± 1%       ~     (p=0.408 n=10+8)
StdCmd           27.2s ±21%       153.6s ±17%   +465.76%  (p=0.000 n=8+10)

name       old user-time/op  new user-time/op  delta
Template         221ms ± 4%        168ms ±10%    -24.10%  (p=0.000 n=9+10)
Unicode          114ms ± 6%         78ms ± 8%    -31.47%  (p=0.000 n=10+10)
GoTypes          720ms ± 2%        560ms ± 4%    -22.12%  (p=0.000 n=10+10)
Compiler         3.26s ± 1%        2.60s ± 2%    -20.10%  (p=0.000 n=9+10)
SSA              8.74s ± 1%        7.59s ± 4%    -13.20%  (p=0.000 n=10+10)
Flate            139ms ± 4%        113ms ±19%    -18.37%  (p=0.000 n=9+10)
GoParser         174ms ± 3%        139ms ± 5%    -19.82%  (p=0.000 n=10+10)
Reflect          454ms ± 0%        353ms ± 7%    -22.24%  (p=0.000 n=9+10)
Tar              208ms ± 2%        161ms ± 4%    -22.84%  (p=0.000 n=10+10)
XML              242ms ± 1%        193ms ± 3%    -20.04%  (p=0.000 n=9+9)

name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%     -0.03%  (p=0.002 n=10+10)
Unicode         28.8MB ± 0%       28.8MB ± 0%     -0.01%  (p=0.015 n=10+10)
GoTypes          112MB ± 0%        112MB ± 0%       ~     (p=0.113 n=9+10)
Compiler         466MB ± 0%        466MB ± 0%     -0.01%  (p=0.003 n=9+10)
SSA             1.48GB ± 0%       1.48GB ± 0%       ~     (p=0.093 n=10+10)
Flate           24.3MB ± 0%       24.3MB ± 0%     -0.04%  (p=0.000 n=10+10)
GoParser        30.7MB ± 0%       30.7MB ± 0%     -0.04%  (p=0.000 n=10+10)
Reflect         76.3MB ± 0%       76.3MB ± 0%     -0.02%  (p=0.000 n=10+10)
Tar             39.2MB ± 0%       39.2MB ± 0%     -0.02%  (p=0.009 n=10+10)
XML             41.5MB ± 0%       41.4MB ± 0%     -0.02%  (p=0.019 n=10+10)

name       old allocs/op     new allocs/op     delta
Template          385k ± 0%         385k ± 0%     -0.03%  (p=0.000 n=10+10)
Unicode           342k ± 0%         342k ± 0%     -0.01%  (p=0.004 n=10+10)
GoTypes          1.19M ± 0%        1.19M ± 0%     -0.01%  (p=0.000 n=10+10)
Compiler         4.52M ± 0%        4.52M ± 0%     -0.01%  (p=0.000 n=9+10)
SSA              12.2M ± 0%        12.2M ± 0%     -0.00%  (p=0.000 n=10+10)
Flate             234k ± 0%         234k ± 0%     -0.04%  (p=0.000 n=9+10)
GoParser          318k ± 0%         317k ± 0%     -0.03%  (p=0.000 n=9+9)
Reflect           974k ± 0%         974k ± 0%     -0.01%  (p=0.000 n=10+10)
Tar               395k ± 0%         395k ± 0%     -0.02%  (p=0.000 n=10+9)
XML               404k ± 0%         404k ± 0%     -0.02%  (p=0.000 n=10+10)

Disabling on platforms with virtual memory (all of them?) shouldn’t cause crashes (assuming ample drive space), but for large cases performance can be severely impacted by memory swapping to the point of being unusable.

The compile command is used on each package separately but has to load the entirety of its dependency object code, so it appears a lot of memory can be used in a single call especially at the root package. But for compilebench cases this number appears to be under 8 GB.

Conclusion

The cmd/compile memory needs are unbounded in relation to program size and performance is majorly helped by the garbage collector after a knee point of memory use, but before that point, which is significant on most development computers for small to medium programs, we can see 10%-30% cmd/compile time reduction by disabling the garbage collector.

@pciet

This comment has been minimized.

Copy link
Contributor Author

commented Mar 15, 2018

I misunderstood the benchmark. The first table (time/op) is wall time, and the second time (user-time/op) is the value reported by os.ProcessState.UserTime() (I was thinking this was kernel vs app time - also I can’t just add percentages for this even though the values are somewhat close).

For the user the wall time is what they’ll perceive, so we are looking at ~5% like @mvdan said. That doesn’t seem like a worthwhile increase for a workaround, so I’ll close this. Thanks.

@pciet pciet closed this Mar 15, 2018

@golang golang locked and limited conversation to collaborators Mar 15, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.