New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: goroutines linger in GC assist queue #22654

Open
rhysh opened this Issue Nov 10, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@rhysh
Contributor

rhysh commented Nov 10, 2017

What version of Go are you using (go version)?

go version devel +1ac8846984 Fri Nov 3 14:06:21 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

This report is about the behavior on tip. I've also seen this recently with Go 1.9 and Go 1.8.

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

I have a server that makes HTTPS requests for large JSON blobs consisting mainly of base64 encoded protobuf messages. It decodes and processes the messages, retaining them in memory for a short period. I collected and viewed an execution trace.

What I found in the execution trace didn't match the way I think the GC is expected to work.

This server isn't very latency-sensitive, but since the GC is preventing mutators from doing work the program's throughput is stifled.

What did you expect to see?

I expected to see the background GC worker running on 1/4th of the threads (9 of 36 here), regularly releasing assist credit to the goroutines that have become blocked on assist. I expected to see any goroutines that blocked on assist credit to be satisfied in significantly less than 1ms.

What did you see instead?

I see goroutines sitting in the GCWaiting state for several milliseconds, and see less than a quarter of Ps running the background mark worker. User goroutines that allocate memory during those several milliseconds are transitioned to the GCWaiting state.

This doesn't happen during every GC cycle for this program—maybe during one out of five cycles, and to varying degrees. This is the "best" example I've seen out of maybe 40 GC cycles (three screens of execution trace UI).

I'd expect the background workers to flush their credit at least dozens of times each millisecond, since gcCreditSlack = 2000 seems pretty small. I'd expect there to be enough work for many background mark workers, since the GC splits large jobs are split into oblets of 128 kB. I'd expect for the GC-aware parts of the scheduler to allow and encourage 1/4th of the Ps to run background mark workers. If background mark workers intentionally go idle when they aren't able to find work, I'd expect them to be reactivated when gcFlushBgCredit finds that there's unsatisfied demand for scan credit.


Here's an overview of my program's execution trace. We'll zoom in on the GC cycle at 215–229 ms in a moment:

screen shot 2017-11-06 at 10 59 48 pm

A GC workers start at 216ms, and 9/36 Ps run the gcBgMarkWorker for 6ms. All Ps are busy for another 4ms (hard to see if 1/4th of them are running the background worker).

Around 225ms, the number of goroutines in the GCWaiting state climbs to the 80s, where it remains until 228ms. Only Ps 9 and 12 run the background mark worker. About a dozen Ps run user goroutines. A couple of those goroutines attempt to allocate and so also enter the GCWaiting state.

screen shot 2017-11-06 at 10 35 06 pm

When the background mark worker on P 9 stops, its assist credit allows a bunch of the blocked goroutines to run. Those goroutines soon return to the GCWaiting state.

screen shot 2017-11-06 at 10 40 05 pm

screen shot 2017-11-06 at 10 41 15 pm

@aclements do you understand what's going on here?

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 29, 2018

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Mar 29, 2018

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment