Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: heap goal overrun due to scheduler delays in mark termination #52433

Open
mknyszek opened this issue Apr 19, 2022 · 2 comments
Open

runtime: heap goal overrun due to scheduler delays in mark termination #52433

mknyszek opened this issue Apr 19, 2022 · 2 comments
Assignees
Labels
NeedsInvestigation
Milestone

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Apr 19, 2022

Summary

In tests that have goroutines which heap-allocate and nothing else, I've observed these goroutines allocate beyond the heap goal by up to tens of MiB. This happens because of an inopportune scheduling delay, either by the OS scheduler or the Go scheduler, of a thread trying to terminate the GC's mark phase, where simply assisting the GC with scan work is not enough.

This is unlikely to affect real Go programs, but when it does, it is more likely to affect those with smaller heaps.

Full explanation

While writing a test for the implementation of #48409, I noticed that the memory limit might be passed.

The test in question has a single goroutine allocating heap memory in a loop, and a goroutine checking stats to ensure the memory limit is maintained. Occasionally, with a target limit of 64 MiB, the runtime could exceed the memory limit. Raising the limit to 512 MiB or more effectively mitigated the problem.

Digging deeper, I discovered the cause: the runtime was allowing the allocating goroutine to exceed the heap goal. GC assists weren't kicking in, because there was no mark work left to do anyway. With more logging I further discovered that the GC mark worker, which was in mark termination, would disappear entirely for 1-3ms when dropping into a sleep that has a target of 100µs. Replacing that sleep with an osyield could reproduce the same issue. I was also able to reproduce it by removing any kind of sleep and having it spin, though the point at which the thread could disappear was now more variable, when it did happen. It was also easier to reproduce with multiple instances running, but in all cases I ensured the CPU hardware still had plenty of breathing room (50% headroom in # of cores).

When investigating the spinning case, I also noticed that if a GC mark worker running mark termination algorithm notices that there's still work to do, it will by default park its goroutine. By the time that worker would wake back up and run mark termination again, the allocating goroutine would have blown past the heap goal. This was another common source of overrun.

Outside of this parking issue, however, I suspected an OS scheduler latency issue (this was on Linux). To confirm this, I traced OS scheduler events (perf sched record -k CLOCK_MONOTONIC) and lined them up with my logging to discover that indeed, the thread was getting context switched away for a couple milliseconds in at least the osyield and spinning cases (I haven't confirmed the original, but the symptoms are identical). It appears that due to kernel workers and background processes, the system could become transiently oversubscribed, and tasks could get kicked across CPUs, incurring this degree of latency.

But this isn't necessarily an issue on the OS's side. OS schedulers are always going to have some latency. More fundamentally, the runtime isn't always robust to this kind of external latency.

Aside from this test I wrote, the Go team has observed two other tests occasionally failing with mysterious heap overrun issues: #37331 and #49564. I suspect, but have not confirmed yet, that these flakes arise from precisely the same issue. Both involve tests that perform heap allocations in a loop.

Note that in practice this kind of heap overrun is rarely, if ever, an issue. In order to allocate at this rate, the application would have to do nothing with that memory, not even write to it or read from it, which is pointless. The overrun is also proportional to these external latency factors, which are effectively constant, so this is typically a lot less noticeable for larger heaps (i.e. going over by a few MiB is a much bigger fraction of the heap goal for 64 MiB than for 1 GiB).

Mitigation

The short-term mitigation for tests is to just allocate at a slower rate, reducing the amount of possible overrun. Long-term, the runtime should be more robust to these situations. A few ideas on how to do this:

  • For the situation where a mark worker parks after a failure to terminate the mark phase, forcing allocating goroutines to assist more often would actually help here, because assists can trigger mark termination as well. They can take over.
  • @aclements notes that typically, wait-free algorithms are used to combat these sorts of situations. A wait-free distributed mark termination algorithm is a tall order, but that would tackle this particular situation directly, as several goroutines could be helping each other through mark termination, as opposed to one goroutine coordinating it at a time.

CC @prattmic @aclements

@mknyszek mknyszek added the NeedsInvestigation label Apr 19, 2022
@mknyszek mknyszek added this to the Backlog milestone Apr 19, 2022
@mknyszek mknyszek self-assigned this Apr 19, 2022
@bcmills
Copy link
Member

@bcmills bcmills commented Apr 20, 2022

(This might also explain #51019, although it doesn't really explain why that would occur on the -11 builder and not the -10.)

@gopherbot
Copy link

@gopherbot gopherbot commented May 27, 2022

Change https://go.dev/cl/408825 mentions this issue: test/heapsampling.go: slow down allocation rate and reduce iterations

gopherbot pushed a commit that referenced this issue May 27, 2022
As far as I can tell, this test suffers from #52433. For some reason,
this seems to become more of a problem on the windows/386 than anywhere
else. This CL is an attempt at a mitigation by slowing down the
allocation rate by inserting runtime.Gosched call in the inner loop. It
also cuts the iteration count which should help too (as less memory is
allocated in total), but the main motivation is to make sure the test
doesn't take too long to run.

Fixes #49564.

Change-Id: I8cc622b06a69cdfa66f680a30e1ccf334eea2164
Reviewed-on: https://go-review.googlesource.com/c/go/+/408825
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation
Projects
Status: Todo
Development

No branches or pull requests

3 participants