Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming down time to process work items #8

Closed
stevej opened this issue Apr 19, 2017 · 5 comments
Closed

Trimming down time to process work items #8

stevej opened this issue Apr 19, 2017 · 5 comments
Assignees

Comments

@stevej
Copy link

stevej commented Apr 19, 2017

In an attempt to fix linkerd/linkerd-tcp#35 I'm benchmarking master vs ver/synchro-system (using time cargo run --release --example multithread | tee output.txt and want to share some numbers.

ver/synchro-system output (this is the last report from the metrics reporter)

metrics_count 5
loop_iters_count{role="worker"} 1000000000
loop_iters_curr{role="worker"} 999999999
total_time_ms{role="worker"} 654672
heartbeats{role="heartbeat"} 653
loop_time_us{stat="count", role="worker"} 1086186
loop_time_us{stat="mean", role="worker"} 0.0055248364460598835
loop_time_us{stat="min", role="worker"} 0
loop_time_us{stat="max", role="worker"} 414
loop_time_us{stat="stddev", role="worker"} 0.42562936784848004
loop_time_us{stat="p50", role="worker"} 0
loop_time_us{stat="p90", role="worker"} 0
loop_time_us{stat="p95", role="worker"} 0
loop_time_us{stat="p99", role="worker"} 0
loop_time_us{stat="p999", role="worker"} 0
loop_time_us{stat="p9999", role="worker"} 0

real	10m54.844s
user	8m5.564s
sys	2m49.460s

master output. (note our last report doesn't show the last batch of items being processed)

metrics_count 4
loop_iters_count{role="worker"} 999826348
loop_iters_curr{role="worker"} 999826347
heartbeats{role="heartbeat"} 5377
loop_time_us{stat="count", role="worker"} 374274
loop_time_us{stat="mean", role="worker"} 1.7634861091072316
loop_time_us{stat="min", role="worker"} 0
loop_time_us{stat="max", role="worker"} 137
loop_time_us{stat="stddev", role="worker"} 2.118960934392008
loop_time_us{stat="p50", role="worker"} 0
loop_time_us{stat="p90", role="worker"} 0
loop_time_us{stat="p95", role="worker"} 0
loop_time_us{stat="p99", role="worker"} 0
loop_time_us{stat="p999", role="worker"} 0
loop_time_us{stat="p9999", role="worker"} 0

real	89m39.084s
user	102m42.164s
sys	63m15.424s

The example in master is 9x slower than the version in very/synchro-system. I'll build some flame graphs so compare the two.

@stevej stevej self-assigned this Apr 19, 2017
@stevej
Copy link
Author

stevej commented Apr 19, 2017

Mean time (which is not a great way to compare but we don't have enough precision with only microsecond timing) difference is 5ns vs 1.7us for each loop iteration.

@stevej
Copy link
Author

stevej commented Apr 19, 2017

In this zipfile are the flame graphs of both branches, each running the multithread benchmark loop 50MM times: flame_graphs.zip

@stevej
Copy link
Author

stevej commented Apr 19, 2017

Top cpu users from perf report for master.

+   38.36%     0.00%  multithread  [unknown]           [k] 0000000000000000                                     ▒
+   35.26%     2.31%  multithread  [kernel.kallsyms]   [k] entry_SYSCALL_64_fastpath                            ▒
+   26.84%     0.31%  multithread  [kernel.kallsyms]   [k] sys_futex                                            ▒
+   26.53%     0.37%  multithread  [kernel.kallsyms]   [k] do_futex                                             ▒
+   23.96%     0.51%  multithread  libc-2.24.so        [.] __lll_unlock_wake_private                            ▒
+   23.12%     0.00%  multithread  [unknown]           [k] 0x0000000000000001                                   ▒
+   21.16%     1.23%  multithread  [kernel.kallsyms]   [k] futex_wake                                           ▒
+   19.45%    19.45%  multithread  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore                          ▒
+   17.51%     0.05%  multithread  [kernel.kallsyms]   [k] wake_up_q                                            ▒
+   17.44%     0.04%  multithread  [kernel.kallsyms]   [k] try_to_wake_up                                       ▒
+   11.00%    10.95%  multithread  libc-2.24.so        [.] _int_free                                            ▒
+    9.66%     0.00%  multithread  [unknown]           [.] 0x0000558d7b0e78b0                                   ▒
+    7.64%     7.59%  multithread  libc-2.24.so        [.] malloc                                               ▒
+    6.70%     1.78%  multithread  libc-2.24.so        [.] __lll_lock_wait_private                              ▒
+    6.31%     6.25%  multithread  multithread         [.] _ZN73_$LT$tacho..aggregator..Aggregator$u20$as$u20$```

@stevej
Copy link
Author

stevej commented Apr 19, 2017

Top cpu users from perf report for ver/synchro-system

+   33.61%     6.36%  multithread  [vdso]              [.] __vdso_clock_gettime
+   27.14%     8.26%  multithread  [kernel.kallsyms]   [k] entry_SYSCALL_64_fastpath
+   26.12%     0.00%  multithread  [unknown]           [k] 0x0000000000000001
+   15.93%     1.23%  multithread  [kernel.kallsyms]   [k] sys_clock_gettime
+   14.78%     0.27%  multithread  [kernel.kallsyms]   [k] posix_ktime_get_ts
+   14.58%     1.62%  multithread  [kernel.kallsyms]   [k] ktime_get_ts64
+   14.38%     0.00%  multithread  multithread         [.] _ZN50_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT
+   14.38%     0.00%  multithread  libpthread-2.24.so  [.] start_thread
+   12.84%     0.35%  multithread  [kernel.kallsyms]   [k] kvm_clock_get_cycles
+   12.72%    12.72%  multithread  [kernel.kallsyms]   [k] pvclock_clocksource_read
+   10.10%    10.10%  multithread  multithread         [.] _ZN56_$LT$twox_hash..XxHash$u20$as$u20$core..hash..Has
+    9.32%     0.72%  multithread  libc-2.24.so        [.] __clock_gettime
+    8.28%     8.28%  multithread  libpthread-2.24.so  [.] __pthread_rwlock_wrlock
+    8.25%     8.25%  multithread  libpthread-2.24.so  [.] __pthread_rwlock_unlock
+    8.03%     8.03%  multithread  multithread         [.] _ZN56_$LT$twox_hash..XxHash$u20$as$u20$core..hash..Has
+    6.27%     0.00%  multithread  [unknown]           [.] 0x0000000000000002
+    6.26%     0.00%  multithread  [unknown]           [.] 0x0000000000000004
+    6.26%     0.00%  multithread  [unknown]           [.] 0x000072656b726f77
+    4.05%     4.05%  multithread  multithread         [.] _ZN3std11collections4hash5table9make_hash17h79b66d50

@stevej
Copy link
Author

stevej commented Apr 20, 2017

Closed now that #7 is merged.

@stevej stevej closed this as completed Apr 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant