runtime: first benchmark run is much slower than subsequent runs #28595
The first benchmark run is similar to the others, maybe a bit slower because caches are cold. Sample:
The first benchmark run is drastically slower. Sample:
I noticed this while trying to understand some wild swings (50%+) in go1 benchmark performance that are not limited to the first benchmark run. I don't know whether the root cause is the same or not, but this issue at least reproduces reliably.
The text was updated successfully, but these errors were encountered:
Thanks for bisecting and for reporting the issue!
I suspect that the regession you're seeing actually is the result of the runtime scavenging (i.e. releasing pages back to the OS) immediately after the first heap growth. For small heaps this shouldn't really be an issue since the scavenging routine should back out quite fast, but it seems that it's doing more work than expected in this case and perhaps being a little too aggressive.
I'll investigate and see what I can do.
I found the issue while I was writing this, so it the rest of the comment reads a little more like "how I figured it out". TL;DR is that we're calling madvise way too often due to some bad logic on my part. CL incoming.
It seems that on x64 Linux it's not so bad.
On Darwin I was able to reproduce it, and I confirmed the cause is my change. If I run the same command with
The runtime.madvise being so high up definitely points to my change(s). madvise might be more expensive on Darwin than on Linux. If I run with
Then it's not quite so high up, since those madvise calls are likely mostly made during the first run. Maybe during a heap growth.
To make sure, I tracked down where the madvise calls were coming from with some print debugging. Turns out I was wrong! The heap-growth scavenging routine only actually scavenges one enormous span. The real problem arises primarily from a performance bug in a previous change of mine.
When freeing a span, the runtime will may try to coalesce a scavenged span with a neighboring unscavenged span, so on merging, we re-scavenge the whole thing. Freeing a span happens more frequently than you might think, since when we're allocating a new span we search for a "best fit" free span, and if it's not of the same size we trim off the excess. In this particular case, we have one enormous free span which we scavenge. Then, lots of smaller allocations come in, and they end up hitting the space where that scavenged span lived, trimming it and returning the trimmings (which are considered scavenged) each time, only for those trimmings to get allocated out of again.
In this freeing process, due to a bug, we actually end up re-scavenging the span we're freeing if it's already scavenged, even if it didn't coalesce with anything. So, we basically have a lot of unnecessary additional madvise calls happening, which has a big impact if madvise is expensive of the platform.
I hacked together a fix for the freeing logic locally and it does in fact fix this issue. CL incoming.