Currently we have 3 performance issues with accesses to thread-local data (g/m/p):
1. Accesses require non-inlinable function calls.
2. The only thread-local var is now g, while most frequently accesses data is in m. So
most of the accesses has an additional indirection.
3. We do lots of duplicate loads of g/m.
We need to:
1. Make the thread-local var m (instead of g).
2. Move stack guard of the current g into m (that's the only hot data in g).
3. Declare runtime.curm variable in runtime, teach the compiler to recognize it and turn
into tls access.
4. Teach compiler to not do unnecessary duplicate loads of curm (like in
The text was updated successfully, but these errors were encountered:
I believe that changing from g to m is a mistake.
The most frequently accessed thread-local data is g->stackguard0, which is in g. It is
accessed once per function call. g is also much easier to reason about in programs,
because it cannot change from line to line as a particular function executes.
Eventually I would like to put g back into a dedicated register on amd64, like we do on
arm. Then getting at g->stackguard0 will be just one load, and getting at m will be just
one load too.
> The most frequently accessed thread-local data is g->stackguard0, which is in g.
Yes, it's the most frequently accessed, that's I propose to move it to M. But there are
also m->mcache, m->locks, m->p and m->ptr/scalarargs. Duplicating them in G looks bad
because it will bloat G and open door to bugs. While what was called stackguard0 can
moved to M rather than duplicated.
> g is also much easier to reason about in programs, because it cannot change from line
to line as a particular function executes.
It's true that it can change, but I don't see how naming things differently changes
something. It can change regardless of whether you call it 'm' or 'g->m'. If you want to
prevent m from changing, you do 'm->locks++' or 'g->m->locks++'. No difference (other
than additional indirection).