Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC assists are inefficient at the beginning of marking #15361

Closed
dvyukov opened this issue Apr 19, 2016 · 7 comments

Comments

Projects
None yet
6 participants
@dvyukov
Copy link
Member

commented Apr 19, 2016

go version devel +318da8d Sat Apr 16 02:55:03 2016 +0000 linux/amd64

I've looked at http benchmark trace:
trace
trace.zip

At the beginning of marking all goroutines are both in debt and cannot scan anything (because GC goroutines are busy with roots and did not produce enough workbufs yet). So what happens is that all procs cycle through all runnable goroutines and park them on assistQueue. That's first 200us of GC with small goroutine boxes all blocking in gcAssistAlloc.

gcAssistAlloc must be able to help with root scanning, because it may be the only available GC work. Switching to other goroutines makes small sense, because the first malloc will trigger assist and will bring us to where we started.

Maybe it also makes sense to give goroutines some temporal credit in the beginning of GC. Currently we start concurrent phase just to force every goroutine to assist. If it turns out that we need aggressive assists, we can make up for it later. But in the expected case GC is triggered at such point that we don't need any assists. The idea is similar to https://go-review.googlesource.com/#/c/20969 but not black and white (GOGC=199 requires assists, while GOGC=200 does not need them entirely).

You can also see that Proc 4 is idle for some reason and does not help GC.

@RLH

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2016

Having the goroutines attempting to help out early in the GC while roots
are still being scanned seems like the problem. Fighting for a small amount
of work at this point might not be efficient. In the case where the root
scan is not complete and there are no work bufs available we could just
allow the goroutine to go ahead and allocate. This is part of the larger
discussion about what kinds of availability promises the Go GC wants to
make. At this point 200us is in the noise, hopefully it won't be in the
noise in the 1.8 timeframe.

On Tue, Apr 19, 2016 at 2:01 AM, Dmitry Vyukov notifications@github.com
wrote:

go version devel +318da8d Sat Apr 16 02:55:03 2016 +0000 linux/amd64

I've looked at http benchmark trace:
[image: trace]
https://cloud.githubusercontent.com/assets/1095328/14628858/f047c932-0601-11e6-8405-ce35de5bb539.png
trace.zip https://github.com/golang/go/files/225283/trace.zip

At the beginning of marking all goroutines are both in debt and cannot
scan anything (because GC goroutines are busy with roots and did not
produce enough workbufs yet). So what happens is that all procs cycle
through all runnable goroutines and park them on assistQueue. That's first
200us of GC with small goroutine boxes all blocking in gcAssistAlloc.

gcAssistAlloc must be able to help with root scanning, because it may be
the only available GC work. Switching to other goroutines makes small
sense, because the first malloc will trigger assist and will bring us to
where we started.

Maybe it also makes sense to give goroutines some temporal credit in the
beginning of GC. Currently we start concurrent phase just to force every
goroutine to assist. If it turns out that we need aggressive assists, we
can make up for it later. But in the expected case GC is triggered at such
point that we don't need any assists. The idea is similar to
https://go-review.googlesource.com/#/c/20969 but not black and white
(GOGC=199 requires assists, while GOGC=200 does not need them entirely).

You can also see that Proc 4 is idle for some reason and does not help GC.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15361

@aclements

This comment has been minimized.

Copy link
Member

commented Apr 19, 2016

I agree with @dvyukov that assists should be able to scan roots. I don't see why they shouldn't; the amount of fighting is bounded by the number of Ps whether it's assists doing it or the background mark worker. The difficulty is that we currently don't count root scanning as "scan work", so assists can't get credit for it (and can't satisfy their debt with it). This also means that the background worker won't build up global credit while it's scanning roots, which is probably contributing to the delay. Tracking the stats on that is somewhat subtle, which is why we don't do it right now, but it's definitely doable.

I believe we'll also want this when we start re-scanning stacks during concurrent mark so that assists can help out with that.

In the case where the root scan is not complete and there are no work bufs available we could just allow the goroutine to go ahead and allocate.

We used to do this and then I worked very hard to eliminate this uncontrolled allocation window because @dvyukov had some great benchmarks that used this window to completely blow out the heap goal.

@aclements aclements added this to the Go1.8Early milestone Apr 19, 2016

@dvyukov

This comment has been minimized.

Copy link
Member Author

commented Apr 19, 2016

Tracking the stats on that is somewhat subtle, which is why we don't do it right now, but it's definitely doable.

But it should be positive without stats. Threads will do useful work instead of switching goroutines.

We used to do this and then I worked very hard to eliminate this uncontrolled allocation window because @dvyukov had some great benchmarks that used this window to completely blow out the heap goal.

Can't we give them temporal credit? The intention is that background scanners will cover that backlog soon and we can allow goroutines to work without assists at all in the expected case. But if background scanners do not, then goroutines will have to assists aggressively and compensate for initial allocations. The duration of the temporal credit must not be time based, it must be allocation based. That is, if we expect to do scanning while application allocates X bytes, we can give credit for X/10 allocations. I don't know the assist system good enough to say whether such credit system can avoid overallocation.
It also looks like a more reliable solution to the problem described in cl/20969. The change turns off assists when GOGC>200. But that still can lead to overallocation. Instead we could give goroutines credit for longer period of time.

@quentinmit

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2016

@aclements Is this bug still relevant with the new changes to reduce the amount of extra work that is done for each assist?

@quentinmit quentinmit added the NeedsFix label Sep 28, 2016

@aclements

This comment has been minimized.

Copy link
Member

commented Sep 29, 2016

@aclements Is this bug still relevant with the new changes to reduce the amount of extra work that is done for each assist?

Yes. This issue arises because there's no work at all for assists at the beginning of the cycle, not because they want more work than they can get. Reducing the extra work does mean assists become useful earlier in the cycle, just not yet at the beginning.

@rsc rsc modified the milestones: Go1.8, Go1.8Early Oct 20, 2016

@gopherbot

This comment has been minimized.

Copy link

commented Oct 31, 2016

CL https://golang.org/cl/32431 mentions this issue.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 31, 2016

CL https://golang.org/cl/32432 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 31, 2016

runtime: lift systemstack part of gcAssistAlloc
This lifts the part of gcAssistAlloc that runs on the system stack to
its own function in preparation for letting assists perform root jobs
(notably stack scanning). This makes it easy to see that there are no
references to the user stack once we've entered gcAssistAlloc1, which
means it's safe to shrink the stack while in gcAssistAlloc1.

This does not yet make assists perform root jobs, so it's not actually
possible for the stack to shrink yet. That will happen in the next
commit.

The code in gcAssistAlloc1 is identical to the code that's currently
passed in a closure to systemstack with one exception. Currently, we
set the "completed" variable in the enclosing scope to indicate that
the assist completed the mark phase. This is exactly the sort of
cross-stack reference lifting this function is meant to prevent. We
replace this variable with setting gp.param to nil or non-nil to
indicate the completion status.

Updates #15361.

Change-Id: Iba7cfb758c781070a441aea86c0117b399a24dbd
Reviewed-on: https://go-review.googlesource.com/32431
TryBot-Result: Gobot Gobot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>

@gopherbot gopherbot closed this in d35dfd4 Oct 31, 2016

@golang golang locked and limited conversation to collaborators Oct 31, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.