-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison against alternative crates? #39
Comments
Thanks for bringing this up! This is an interesting but complicated topic ... First, the simple thing: SPSC is always faster than MPSC/SPMC/MPMC. All When it comes to comparing SPSC implementations, you should consider multiple aspects:
Correctness is very hard to check, because all SPSC implementations use a fair amount of I think that I think the most important difference between the existing SPSC crates is their API. What kind of operations are you planning to use? If there is something missing in the API of Finally, performance ... It's very hard to create meaningful benchmarks. I've created some in the benches directory, which can be run with If you want to compare All this is probably meaningless, because the benchmark code will most likely not reflect your actual usage pattern, and the real-life performance differences between SPSC implementations are probably negligibly small anyway. |
Keeping in mind that all benchmarks are wrong, here is one result from running the You should of course not trust the good result for Changes of about 10% are common between runs, sometimes there are even 20% changes. If you have any ideas how to make the results more stable (or more meaningful in general), please let me know! |
I've just tried it with the suggested The differences are not that big, but it looks like the Could that mean that maybe an
Are you talking about changes to the Linux scheduler? This might have an influence on how the secondary thread (the one that generates contention on the atomic variables) is scheduled and therefore distort the measurements. |
I followed the red hat low latency tuning guide, https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf, I disable the hyperthreading and turned all the kernel commands mentioned in doc. And use the low latency tuned-adm profile setting. I also use cpuset clear the core I use for benchmark. |
Thanks @zhenpingfeng for the information about latency tuning. This looks very promising and I'll have a closer look when I have more time. In the meantime, I've modified the benchmark code a bit: #42. I also modified the performance comparison and created a new branch: https://github.com/mgeier-forks/rtrb/tree/performance-comparison2 |
@TheButlah Coming back to your original question about I tried my latest benchmark with I must say I'm quite surprised how fast they are! In the uncontended case they are quite a bit slower than most SPSC implementations but faster than In the contended case it's much closer. |
I have some questions. The elapsed time I get by using the above code is about 4us (FIFO scheduler). Is my method of using this library wrong? Or sending an Instant structure does take this amount of time? How can I reduce the latency to the nanosecond level? New update:
After removing the println! marco, it now only cost around 120ns per send. problem solved. |
Thanks @zhenpingfeng for running the benchmarks again, the results seem pretty much consistent with mine, which is good! The idea of sending an BTW, I think you could replace your |
Actually, the elapsed function call is worked, if I remove it, each sends only cost around 100ns. |
Yes, sure, the We are not interested in benchmarking the If it takes 100ns without the |
Thanks @zhenpingfeng for the updated measurement! It's interesting that BTW, there is a new branch with additional crates for performance comparison: https://github.com/mgeier-forks/rtrb/tree/performance-comparison3. And I've removed |
|
Can we get this kind of pics in the README on the landing page? Super interesting. |
Thanks for the hint @kasparthommen, I have never heard of it! I didn't quite understand the API though ... I have added the performance comparison to the codebase (see #123), would you like to add a PR adding the crates you suggested? Speaking of which, I have updated the benchmarks recently, so I think it's time to share some plots again. I did those on Linux, with an Intel(R) Core(TM) i5-7Y54 CPU. I split the benchmarks in two parts. One uses a very small buffer size (only 2 elements!), which means there is a lot of contention and many of the attempted read and write operations will fail: The other benchmark uses a very large buffer size, and therefore no contention at all, so that every single intended read and write operation will succeed: I think those are the worst-case and best-case scenarios, respectively, and any real use case will be somewhere in between. Note that On the other hand, If anyone else wants to share their results (especially on other CPU architectures), please go ahead! |
Hi, I'm considering using this crate but am unsure whether the performance is any better than the other SPSPC wait free ringbuffers. A comparison to crossbeam_channel might also be merited, since I could see using their bounded queues in a very similar way to the ringbuffer.
The text was updated successfully, but these errors were encountered: