OK I see what's going on here. We're going into a syscall and handing off our P, but the GC is trying to terminate and so asks each P to flush its workbuf. Technically we're flushing the workbuf with a P, but this P has already been released from the G and the M (via handoffp(releasep()) in entersyscallblock_handoff) and I had assumed that the page/span allocator is always called with a P, but this is a rare case where it isn't (and it didn't technically have this requirement before).
Unfortunately, I believe this means we need to just support this case, so there needs to be a fallback for updating the stats when we don't have a P (either that or we plumb the mcache through, though that sounds onerous).
Also, this is a potential problem on all platforms, not just aix/ppc64. Definitely a release-blocker.
This change moves the responsibility of throwing if an mcache is not
available to the caller, because the inlining cost of throw is set very
high in the compiler. Even if it was reduced down to the cost of a usual
function call, it would still be too expensive, so just move it out.
This choice also makes sense in the context of #42339 since we're going
to have to handle the case where we don't have an mcache to update stats
in a few contexts anyhow.
Also, add getMCache to the list of functions that should be inlined to
prevent future regressions.
getMCache is called on the allocation fast path and because its not
inlined actually causes a significant regression (~10%) in some
Trust: Michael Knyszek <email@example.com>
Run-TryBot: Michael Knyszek <firstname.lastname@example.org>
Reviewed-by: Michael Pratt <email@example.com>
TryBot-Result: Go Bot <firstname.lastname@example.org>
Ah, it occurs to me that this failure isn't exactly common on aix/ppc64 so that's not necessarily proof of a fix, but I definitely do understand the problem. I suppose I'll land and we'll reopen this issue if we see it again?