New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: hot vars and cache lines #14980

Open
josharian opened this Issue Mar 26, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@josharian
Contributor

josharian commented Mar 26, 2016

Naive question. The runtime has a bunch of top-level vars, some of which are fairly hot, e.g. the writeBarrier struct (checked before every write barrier call), the debug struct (checked during every malloc for e.g. allocfreetrace), and the trace struct (to know whether tracing is enabled). Some are written a lot (writeBarrier), whereas others are read-mostly (debug, trace).

They are organized for readability and thus end up potentially scattered around the final binary. However, I wonder whether it would be better to ensure that all the hottest read-mostly variables are in a single cache line and ensure that the hottest read-write variables don't trigger false sharing.

Many of these aren't easy to move around and experiment with, because of compiler integration. So first: Any instincts about whether this is likely to matter in practice?

cc @dvyukov @aclements

@josharian josharian added this to the Unplanned milestone Mar 26, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Mar 26, 2016

Having a few hot reads scattered about cache lines instead of all in one cache line shouldn't matter much. It will only take a tiny bit more work and cache space to cache them.

Keeping hot writes away from each other (and from other hot reads) will matter much more.

Is writeBarrier really written that much? Just twice per GC cycle, as far as I can tell.

@aclements

This comment has been minimized.

Member

aclements commented Mar 26, 2016

I don't recall seeing serious contention on any globals when I ran https://godoc.org/github.com/aclements/go-perf/cmd/memlat, but that was quite a while ago and I wasn't necessarily looking. It would be easy enough to run that again. Particularly if it's run on a multi-node system, any globals with poor cacheability or false sharing should stick out as expensive remote DRAM events.

It would also be easy enough to crank up the PEBS recording rate and just write a simple tool to look for hot globals. Sort of like https://godoc.org/github.com/aclements/go-perf/cmd/memanim, but obviously looking for different things in the memory trace. With memanim, I found the hardware could easily record every single load over 50 cycles.

If we do find any, the cheap solution is to add padding variables around them. We already do this in a few places (grep for CacheLineSize), but I think those are all based on assumptions about hot cache lines and aren't backed up by measurements.

@josharian

This comment has been minimized.

Contributor

josharian commented Mar 29, 2016

Thanks, Keith and Austin. I don't have a linux machine lying around now, but I should soon(ish), and I will play with this then.

@dvyukov

This comment has been minimized.

Member

dvyukov commented Apr 3, 2016

Frequent write sharing can be very expensive and prevent scaling on higher core counts. We need to get rid of each and every case.
But note that processors don't have circuitry to distinguish between false and true sharing. They penalize both equally. So it is not about adding padding and shuffling variables, it is about elimination of frequently written to variables. You can see the following changes for examples:
d6ed1b7
d839a80
66d5c9b
909f318
013ad89
c9152a8
86e7323
And scheduler (distributed run queues), memory allocator (MCache) and parallel GC (Workbuf, parfor) were designed around the idea of not creating heavy write sharing in the first place.
If new instances of frequently written to variables were added since then, we need to get rid of them as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment