Skip to content
/ linux Public

Commit cb6c4aa

Browse files
dramborlegSasha Levin
authored andcommitted
tracing: Fix false sharing in hwlat get_sample()
[ Upstream commit f743435 ] The get_sample() function in the hwlat tracer assumes the caller holds hwlat_data.lock, but this is not actually happening. The result is unprotected data access to hwlat_data, and in per-cpu mode can result in false sharing which may show up as false positive latency events. The specific case of false sharing observed was primarily between hwlat_data.sample_width and hwlat_data.count. These are separated by just 8B and are therefore likely to share a cache line. When one thread modifies count, the cache line is in a modified state so when other threads read sample_width in the main latency detection loop, they fetch the modified cache line. On some systems, the fetch itself may be slow enough to count as a latency event, which could set up a self reinforcing cycle of latency events as each event increments count which then causes more latency events, continuing the cycle. The other result of the unprotected data access is that hwlat_data.count can end up with duplicate or missed values, which was observed on some systems in testing. Convert hwlat_data.count to atomic64_t so it can be safely modified without locking, and prevent false sharing by pulling sample_width into a local variable. One system this was tested on was a dual socket server with 32 CPUs on each numa node. With settings of 1us threshold, 1000us width, and 2000us window, this change reduced the number of latency events from 500 per second down to approximately 1 event per minute. Some machines tested did not exhibit measurable latency from the false sharing. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260210074810.6328-1-clord@mykolab.com Signed-off-by: Colin Lord <clord@mykolab.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 9566c87 commit cb6c4aa

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

kernel/trace/trace_hwlat.c

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -102,9 +102,9 @@ struct hwlat_sample {
102102
/* keep the global state somewhere. */
103103
static struct hwlat_data {
104104

105-
struct mutex lock; /* protect changes */
105+
struct mutex lock; /* protect changes */
106106

107-
u64 count; /* total since reset */
107+
atomic64_t count; /* total since reset */
108108

109109
u64 sample_window; /* total sampling window (on+off) */
110110
u64 sample_width; /* active sampling portion of window */
@@ -195,8 +195,7 @@ void trace_hwlat_callback(bool enter)
195195
* get_sample - sample the CPU TSC and look for likely hardware latencies
196196
*
197197
* Used to repeatedly capture the CPU TSC (or similar), looking for potential
198-
* hardware-induced latency. Called with interrupts disabled and with
199-
* hwlat_data.lock held.
198+
* hardware-induced latency. Called with interrupts disabled.
200199
*/
201200
static int get_sample(void)
202201
{
@@ -206,6 +205,7 @@ static int get_sample(void)
206205
time_type start, t1, t2, last_t2;
207206
s64 diff, outer_diff, total, last_total = 0;
208207
u64 sample = 0;
208+
u64 sample_width = READ_ONCE(hwlat_data.sample_width);
209209
u64 thresh = tracing_thresh;
210210
u64 outer_sample = 0;
211211
int ret = -1;
@@ -269,7 +269,7 @@ static int get_sample(void)
269269
if (diff > sample)
270270
sample = diff; /* only want highest value */
271271

272-
} while (total <= hwlat_data.sample_width);
272+
} while (total <= sample_width);
273273

274274
barrier(); /* finish the above in the view for NMIs */
275275
trace_hwlat_callback_enabled = false;
@@ -287,8 +287,7 @@ static int get_sample(void)
287287
if (kdata->nmi_total_ts)
288288
do_div(kdata->nmi_total_ts, NSEC_PER_USEC);
289289

290-
hwlat_data.count++;
291-
s.seqnum = hwlat_data.count;
290+
s.seqnum = atomic64_inc_return(&hwlat_data.count);
292291
s.duration = sample;
293292
s.outer_duration = outer_sample;
294293
s.nmi_total_ts = kdata->nmi_total_ts;
@@ -837,7 +836,7 @@ static int hwlat_tracer_init(struct trace_array *tr)
837836

838837
hwlat_trace = tr;
839838

840-
hwlat_data.count = 0;
839+
atomic64_set(&hwlat_data.count, 0);
841840
tr->max_latency = 0;
842841
save_tracing_thresh = tracing_thresh;
843842

0 commit comments

Comments
 (0)