You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thread stats are locally locked; meaning the stats are only ever changed by the same thread that owns the lock. There is one lock per worker thread. (single writer, potentially many but realistically one reader)
The only other times the locks are held are for running the "stats" command, or I guess anything that internally gathers per-thread stats.
It might be tempting to simply replace them all with individual atomics, but that will likely cause a slowdown as the various atomic counters still need memory barriers (or worse; SMP-synchronization/etc). Atomics might be useful for some of the remaining stats which use the global STATS_LOCK(), since they are more likely to contend in the first place.
There're two paths I like, and I like the second one better:
If the thread stats are all 8-byte aligned, writes/reads are atomic on at least x86 systems. Then you simply remove all the locks. This doesn't work on 32-bit systems and may not work on systems with looser memory architectures (IE; ARM).
Run the stats call on a side thread and use the worker notify pipes to gather snapshots of the per-thread stats.
We have the concept of side threads (lru_crawler, log watcher), which with a little more work could have more generalized code. This could be used to create an admin/stats side thread. When a stats command is issued, the socket gets handed to the side thread. The side thread then writes a command to the notify pipelines of each worker thread; they then memcpy their stats and check in.
The side thread then tallies with what should be mostly existing code, and all of the thread stats locks are removed completely.
I like the idea so I'm documenting it, but throwing it down as low-priority. I haven't even done any testing to show if commenting out the thread locks improves IPC at all. I imagine it barely moves the needle. A bigger potential benefit for doing this would mean we could add more stats with less concern for impacting performance.
The text was updated successfully, but these errors were encountered:
Thread stats are locally locked; meaning the stats are only ever changed by the same thread that owns the lock. There is one lock per worker thread. (single writer, potentially many but realistically one reader)
The only other times the locks are held are for running the "stats" command, or I guess anything that internally gathers per-thread stats.
It might be tempting to simply replace them all with individual atomics, but that will likely cause a slowdown as the various atomic counters still need memory barriers (or worse; SMP-synchronization/etc). Atomics might be useful for some of the remaining stats which use the global
STATS_LOCK()
, since they are more likely to contend in the first place.There're two paths I like, and I like the second one better:
We have the concept of side threads (lru_crawler, log watcher), which with a little more work could have more generalized code. This could be used to create an admin/stats side thread. When a stats command is issued, the socket gets handed to the side thread. The side thread then writes a command to the notify pipelines of each worker thread; they then memcpy their stats and check in.
The side thread then tallies with what should be mostly existing code, and all of the thread stats locks are removed completely.
I like the idea so I'm documenting it, but throwing it down as low-priority. I haven't even done any testing to show if commenting out the thread locks improves IPC at all. I imagine it barely moves the needle. A bigger potential benefit for doing this would mean we could add more stats with less concern for impacting performance.
The text was updated successfully, but these errors were encountered: