Performance evaluation (and possible optimization?) #41

mariomac · 2023-03-30T08:06:41Z

In projects previously using a similar architecture, 200K events/second consumed 0.5 CPUs. The impact was caused mostly because of the userspace having to wakeup on each message in the ringbuffer. In the case of the linked project, moving from a ringbuffer to a hashmap allowed aggregating millions of events/second with 0.1% CPU.

In the case of HTTP/GRPC servers, it seems unlikely that a single host will need to process so many events/second. So our current implementation should be fine. However, we could create performance tests to see how many resources we are consuming on a typical high-load scenarios.

If we decide to optimize the performance, I would recommend to address the userspace wakeups on each HTTP request message.

We can configure the ringbuffer to accumulate messages and send them to the user space in batches of X (user configurable). If, after a user-configured timeout, the batch size didn't reach X, we will anyway send them. This will decrease by orders of magnitude the number of userspace wakeups.

grcevski · 2023-04-17T17:06:26Z

I did couple of benchmark runs in HTTP mode, using our wrk2 based driver on top of the simple pingserver we have in our tests. I didn't use any delay in the API requests, so this is pretty much the worst case scenario - all instrumentation overhead.

Here are the results for a 10,000 QPS (which is very unrealistic for a single service) :

Environment

Linux kernel version 5.19.0-38
12th Gen Intel(R) Core(TM) i7-1280P, 2 GHz, 6 performance cores, 8 low power cores, running without CPU throttling
64 GB total system memory

Notes

Latency varies a bit from run to run, so the latency histograms are on-par.

Base

server 85% CPU utilization

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  609.00us
 75.000%    0.88ms
 90.000%    1.04ms
 99.000%    1.16ms
 99.900%    1.24ms
 99.990%    1.74ms
 99.999%    4.63ms
100.000%    6.49ms

NOOP tracing, just collecting and ignoring the events

otelhttp 15%
server from 85% to 95%

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  632.00us
 75.000%    0.89ms
 90.000%    1.05ms
 99.000%    1.17ms
 99.900%    1.25ms
 99.990%    1.40ms
 99.999%    2.26ms
100.000%    2.63ms

Full OTel HTTP with local Mimir, Tempo, Agent

server from 85% to 95%
otelhttp 95%

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  603.00us
 75.000%    0.86ms
 90.000%    1.01ms
 99.000%    1.12ms
 99.900%    1.63ms
 99.990%    2.11ms
 99.999%    2.46ms
100.000%    3.07ms

mariomac · 2023-04-18T07:02:34Z

@grcevski do you have the stress scripts somewhere?

mariomac · 2023-04-18T12:37:13Z

@grcevski would it be possible to repeat your experiments with the version in the following PR?

#61

grcevski · 2023-04-18T14:41:40Z

@grcevski do you have the stress scripts somewhere?

Yes, I committed the script in test/cmd/pingserver/driver.sh. I use wrk2 to create a constant load, regardless if the service can keep up. I had slightly modified wrk2 locally to close and re-establish the connection each time, thinking it's possibly closer to what happens in the real world.

grcevski · 2023-04-18T14:41:56Z

@grcevski would it be possible to repeat your experiments with the version in the following PR?

#61

Definitely, I'll get these runs done today!

mariomac mentioned this issue Apr 19, 2023

Performance optimizations #61

Merged

mariomac closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance evaluation (and possible optimization?) #41

Performance evaluation (and possible optimization?) #41

mariomac commented Mar 30, 2023

grcevski commented Apr 17, 2023

mariomac commented Apr 18, 2023

mariomac commented Apr 18, 2023

grcevski commented Apr 18, 2023

grcevski commented Apr 18, 2023

Performance evaluation (and possible optimization?) #41

Performance evaluation (and possible optimization?) #41

Comments

mariomac commented Mar 30, 2023

grcevski commented Apr 17, 2023

Environment

Notes

Base

NOOP tracing, just collecting and ignoring the events

Full OTel HTTP with local Mimir, Tempo, Agent

mariomac commented Apr 18, 2023

mariomac commented Apr 18, 2023

grcevski commented Apr 18, 2023

grcevski commented Apr 18, 2023