New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please support runtimes other than tokio #6
Comments
I will look into doing this! Flume looks excellent btw. I'll see what I can do with rayon threadpools, and if I can make things contributor friendly to add more backends in an easy manner. I agree that |
On Tue, Aug 10, 2021 at 07:05:46AM -0700, Seth wrote:
I will look into doing this! Flume looks excellent btw.
Yeah, I've found it to be incredibly useful in a variety of projects. I
particularly enjoy that it supports both sync and async operations on
the same channel, so that you can (for instance) have one end in a
thread and the other in an async task. Great for bridging the two
worlds.
If you had to choose one backend that would be acceptable across other
projects would you have a preference?
Definitely Rayon. There's no one async backend that'll work for the
majority of people, and folks not using async will prefer to not have an
async backend at all. Rayon should be acceptable in any project.
Thanks for looking into it!
|
So, I have a few branches going and some interesting initial results. See the The notable conclusion is that rayon threadpools lag in performance when using less threads. I'm pretty sure that this is mainly a result of the rayon version basically just spinning on 2 threads:
Additionally the rayon version adds an extra channel that acts a bit like a future to eventually pull a result. Maybe there is a better way to organize all this on a rayon threadpool. In general the rayon version performs as well as the futures async version - 2 cores, i.e. Gzip/6 with rayon ~= Gzip/4 The tokio runtime with its explicit I'll try to formalize this some more with tables comparing things. I'm mostly surprised that the tokio version is so much faster than the I'm not fully convinced I all things are equal between impls yet, these are just some interesting preliminary results. |
It'd be worth trying tokio with flume channels, to see if that's causing any performance delta (in either direction). I don't know anything about the performance of the For Rayon, you may also want to try |
Flume ends up being a negligible difference. The real difference is with tokio::task::spawn_blocking which ends up being way more performant than a regular task.
I did manage to get rayon down to a more reasonable performance. It's worth noting that running the same benchmark with vanilla single threaded gzip encoding takes about 6.6s. So for all runtimes except tokio 2 threads breaks even with the overhead of multithreading and 4 sees some substantial performance improvements. Getting rayon / sync threads to be speedy here required some largish changes. Branches: Make bench data: cd bench-data
cp shakespeare.txt shakespeare.txt.orig
for i in {0..100}; do cat shakespeare.txt.orig >> shakespeare.txt; done Run benchmarks: cargo bench --features pargz,zlib-ng-compat,parsnap_default -- Gzip --sample-size 10 At the moment I'm inclined to say that tokio is pulling its weight here and proves to be a worth-while dependency. |
Is 'Gzip/2' supposed to use 2 compression threads in rayon? Your code runs it with only 1: let handle = std::thread::spawn(move || {
ParGz::run(rx_compressor, rx_writer, self.writer, self.num_threads - 1, comp_level)
}); similarly, Gzip/4 uses 3 threads (you can observe that with htop) |
By contrast, your tokio (blocking) version doesn't perform any concurrency throttling at all - you spawn a new thread for each chunk. Could it be you're comparing apples and oranges here? |
@godmar, that is correct, the The tokio runtimes have the thread count set in the tokio runtime builder, and have the same let rt = tokio::runtime::Builder::new_multi_thread()
.worker_threads(num_threads)
.build()?; The docs on So, I'm pretty sure this is all still apples to apples, but I appreciate trying to find holes in it as I've been staring at it for too long at this point. |
I repeated your experiments and monitored the CPU usage. Of course, if you're using more CPUs, results come in faster (as long as you're not out of CPUs as in the case with 30 threads where you'll see the performance become roughly equal). This is not an apples to apples comparison. Your rayon threadpool is explicitly instructed to use only 1 thread, and it does that. Since the compression is CPU bound, this uses 100% of one core or CPU. If you're asking about the impact of different threadpools, you need to apply the same concurrency throttling strategy to all scenarios, in my opinion, or else the results don't make sense.
The "background" thread does hardly any CPU work, so I wouldn't count it.
This is referring to the threadpool tokio uses for async tasks, which again here do very little work, if any. You offload all CPU intensive work onto the so-called "spawn_blocking" threadpool which is not subject to concurrency control unless the (very large, larger than the number of CPUs) limit is reached. |
You are correct. I thought that I'm going to rework these benchmarks and likely just move entirely to rayon. Thanks for pushing on this till I read the docs 👍 |
Please see release v0.4.0 for |
I'd love to use gzp in libraries that are already using a different async runtime, and I'd like to avoid adding the substantial additional dependencies of a separate async runtime that isn't otherwise used by the library.
Would you consider either supporting a thread pool like Rayon, or supporting other async runtimes?
(If it helps, you could use a channel library like flume that works on any runtime, so that you only need a spawn function.)
The text was updated successfully, but these errors were encountered: