-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading support #55
Comments
Probably need to add new types, one for writing (compress) and one for reading (decompress), that handle blocks in parallel. Dont have any time free atm to tackle this, but if you do, feel free to send PRs! |
Here are some of my experiences: It could be part of the Reader/Writer. Doing the compression part is probably the one with most potential. Decompression should be really fast on a single thread and will likely just waste a lot of cpu cycles synchronizing. For S2 compression I made a pretty simple setup which scales fine:
The reason for When enough data is collected:
When data for the block is compressed, send output to A mutex protected error state is kept in the Writer. For cases where the user doesn't want concurrency the Writer falls back to using sync compression and doesn't use goroutines. While getting concurrent compression up and running is pretty easy, the tricky part is error handling and proper flushing and releasing resources. Some users may also have expectations that writes to output will only happen while the Writer itself is being called. While this can be done, it will of course be slightly inefficient compared to async writes. |
@klauspost thank you for your insights, it looks like an elegant way to deal with concurrent compression |
First version of the concurrent writer added to branches master and v3. |
@pierrec
http://mattmahoney.net/dc/10gb.html and https://files.klauspost.com/compress/rawstudio-mint14.7z Speed wise it is looking fine compared to S2: Using test files here: https://github.com/klauspost/compress/tree/master/s2#performance
I've added numbers at the bottom here: https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing |
@pierrec Looks nice. I noticed is that
Another minor thing: This attempt at BCE does not do anything anything positive. In the following line
The best I could get it was simply to remove the line which brings it from 3 to 2 bounds checks. |
@klauspost thanks a lot for this analysis! |
Looks very good! The crashes are fixed.
So it is definitely "competitive". The margins to S2 are pretty slim in most cases and I think the only difference really comes down to the encoding chosen for each format. I included LZ4 as a very competitive format for Go compression in a talk I just gave. |
Thanks. It has not been released yet as it is racy, so bear with me while I fix them. Once this is done (hopefully this weekend) I will issue a release. |
I noticed that this library uses only one cpu core. It may be problem on processing large data. Do you have ideas about how to add multi thread support?
The text was updated successfully, but these errors were encountered: