Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
runtime: program appears to spend 10% more time in GC on tip 3c47ead than on Go1.13.3 #35430
What version of Go are you using (
This is likely related to golang.org/cl/200439 which allows the GC to assist more than 25% in cases where there's a high rate of allocation.
Although this seems like a regression, please stay tuned. I'm currently in the process of landing a set of patches related to #35112 and by the end, with this additional GC use, it's a net win for heavily allocating applications (AFAICT).
The reason we're allowing GC to exceed 25% in these cases is because #35112 makes the page allocator fast enough to out-run the GC and drive the trigger ratio to very low values (like 0.01), which means the next mark phase is starting almost immediately, meaning pretty much all new memory would be allocated black, leading to an unnecessary RSS increase. By bounding the trigger ratio like in golang.org/cl/200439, your application may end up assisting more, but the latency win from #35112 should still beat that latency hit by a significant margin in my experiments.
I'll poke this thread again when I've finished landing the full stack of changes, so please try again at that point.
In the meantime, if you could provide some information about your application? In particular:
This will help me get a better idea of whether this will be a win, or whether this is a loss in single-threaded performance or something else.
This runs as a 12 threaded Go program. So the code is using a pool of 12 goroutines and the GC is keeping the heap at 4 meg. In the version of code that creates a goroutine per file, I see the heap grow as high as 80 meg.
The program is opening, reading, decoding and searching 4000 files. It's memory intensive to an extent. Throwing 4000 groutines at this problem on tip is finishing the work faster than using a pool. That was never the case in 1.13.
I find this interesting. This is my understanding.
A priority of the pacer is to maintain a smaller heap over time and to reduce mark assit (MA) so more M's can be used for application work during any GC cycle. A GC may start early (before the heap reaches the GC Percent threshold) if it means reducing MA time. In the end, the total GC time would stay at or below 25%.
This change is allowing the GC time to grow above 25% to help reduce the size of the heap in some heavy allocation scenarios. This will increase the amount of MA time and reduce the application throughput during a GC?
Your hope is the performance loss there is gained back in the allocator?
In the end, the heap size remains as small as possible?
Pretty much, though I wouldn't characterize it as "may start early", but rather as just "starts earlier". It's the pacer's job to drive GC use to 25%, and its primary tool for doing so is deciding when to start a GC.
Both latency and throughput, but yes that's correct.
Correct. A heavily allocating RPC benchmark was able to drive the pacer to start a GC at the half-way point (trigger ratio = 0.5) in Go 1.13. The same benchmark drove the trigger ratio to 0.01 with the new allocator. The most convincing evidence of this being that the allocator just got faster was that the only thing that brought the trigger ratio back up was adding a sleep on the critical path.
In the end, this RPC benchmark saw a significant improvement in tail latency (-20% or more) and throughput (+30% or more), even with the new threshold.
Not quite. The threshold in that above CL was chosen to keep the heap size roughly the same across Go versions.