-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance evaluation (and possible optimization?) #41
Comments
I did couple of benchmark runs in HTTP mode, using our wrk2 based driver on top of the simple pingserver we have in our tests. I didn't use any delay in the API requests, so this is pretty much the worst case scenario - all instrumentation overhead. Here are the results for a 10,000 QPS (which is very unrealistic for a single service) : EnvironmentLinux kernel version 5.19.0-38 NotesLatency varies a bit from run to run, so the latency histograms are on-par. Baseserver 85% CPU utilization
NOOP tracing, just collecting and ignoring the eventsotelhttp 15%
Full OTel HTTP with local Mimir, Tempo, Agentserver from 85% to 95%
|
@grcevski do you have the stress scripts somewhere? |
Yes, I committed the script in test/cmd/pingserver/driver.sh. I use wrk2 to create a constant load, regardless if the service can keep up. I had slightly modified wrk2 locally to close and re-establish the connection each time, thinking it's possibly closer to what happens in the real world. |
In projects previously using a similar architecture, 200K events/second consumed 0.5 CPUs. The impact was caused mostly because of the userspace having to wakeup on each message in the ringbuffer. In the case of the linked project, moving from a ringbuffer to a hashmap allowed aggregating millions of events/second with 0.1% CPU.
In the case of HTTP/GRPC servers, it seems unlikely that a single host will need to process so many events/second. So our current implementation should be fine. However, we could create performance tests to see how many resources we are consuming on a typical high-load scenarios.
If we decide to optimize the performance, I would recommend to address the userspace wakeups on each HTTP request message.
We can configure the ringbuffer to accumulate messages and send them to the user space in batches of X (user configurable). If, after a user-configured timeout, the batch size didn't reach X, we will anyway send them. This will decrease by orders of magnitude the number of userspace wakeups.
The text was updated successfully, but these errors were encountered: