-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompress with ISA-L #418
Conversation
I found two issues:
I’ll leave this for now, there are bigger improvements to be made elsewhere. |
Htslib uses libdeflate so maybe that would be a better option? |
The Python isal bindings support multiblock gzips, so it is doable with ISA-L (maybe only a newer version is needed). |
add isa-l to build fix isal include and library path add separate class to hold the file. add IsalIO class to read gzipped file add RawIO class. Add double buffering to RawIO fix github action Fix when file names do not include file extensions. fix github action remove warning remove inline fix method name fix classname typo fix typo Switched to using the package manager's version instead of building ISA-L. fix CI apply clang-format to iowrap.hpp fix include guard fix forgot to add override to RawIO::ReaderName apply clang-format to iowrap.cpp remove meaningless line fix isa-l include Change class names move find_package(PkgConfig) up Adjust buffer size rename ReaderName to name and add maybe_unused add maybe_unused fixed test failure on MacOS adjust buffer size
The difference is small and maybe not significant, but if anything, this makes it faster.
It is now unused, but we leave it in for now so that we can later enable it as a fallback in case isa-l is not available.
Still not working as only the first block is processed
I’ve managed to get the code to work with multiblock gzips (concatenated gzips) and also observed that there’s not actually an increase in memory usage. So from that side, everything seems to be fine. However, when trying to compile this on our cluster, I noticed that the ISA-L library is not available and so I get an error at build time. I cannot just Also, we’ve earlier changed the build system in such a way that no external code is downloaded during the build because this was requested at some point in time. So the most flexible solution would allow for three possibilities:
Let’s see whether I can get the above to work somehow. |
5a4a022
to
462301e
Compare
Some helpful inspiration for the CMakeLists.txt voodo comes from https://github.com/charlesnicholson/cmake-external-project-test/blob/master/CMakeLists.txt
Ok, this is now done. Without any options, CMake tries to use a system-installed libisal. If that does not work, an error message will be printed telling the user to try I left out the fallback to regular gzip because that requires more code changes, and I don’t really see the advantage of not using isal. We can still add that if anyone requests it. Performance-wise, it looks really good: With this, running strobealign with 128 cores will fully utilize all cores, see #435 (comment). |
This is great! But I get a more cryptic error message (both as default build and with Built on two Macbook pros (one with intel and one with M1) with the same error.
Printing the
|
By the way, does this PR mean that there is no need to do
but instead we could feed the
? |
Also, I noticed in #435 (comment) that the runtimes with ISA-L are slower for 64 and 32 cores (6% and 18%, resp) even though having higher CPU utilization. So I imagine running strobealign with fewer cores (8,4,2) would suffer even more? I still think it is nice to get the speedup for highly parallelised runs as it represents a use case. But it would be good to see what the trade-off will be for the other use cases. |
As you can tell, I haven’t tested on Mac ... but since the CI checks run on Mac, I’m sure that it can be made to work somehow. You need
Yes!
I ran these measurements just once on busy nodes (with other jobs running), the aim was not to get accurate runtimes, but to see the CPU usage. I am quite sure the differences are not due to actual runtime differences. I tested this locally on my machine with 8 cores, and there the isal version is consistently 3% faster. I can do some more measurements to be sure, but I am quite certain that there is no tradeoff, this is just faster even when running on a single core. |
I did
Then I added
as I wanted to try this option (i.e., I never did
|
Hm, yes, you need to do I made the error message when NASM is missing a bit more helpful. |
Thanks, I like the new error messages. If I do
I get the error pasted at bottom. If i instead do
the installation completes without errors. So maybe there is something with the download option that doesn’t work for OSx? Error
|
Can you try running |
Ok, now I did:
Error
|
I think you need to install the GNU version of |
Great, got it to work according to your suggestion, i.e.,
Trusting you in that there is no substantial speed trade-off for few cores I approve a merge. |
I’ve done a few more tests to convince myself. On my local machine, the version in this PR is consistently ~2% faster (both wall-clock and CPU time) up to 8 cores (which is how many I have). On the KTH cluster (dardel), the variance is very high, but it appears that
Some notes for when we want to or need to revisit this:
|
Just FYI that the addition of isal breaks Windows builds due to usage of sys/mman.h. |
We don’t support strobealign on Windows at the moment. If it compiled previously, that was a happy accident. If you would like us to support Windows, can you open a separate issue please? Then we can discuss how to achieve that. |
This is a continuation of #269 where I fixed the things I commented on.
I will need to re-measure memory usage. At some point, it was higher than previously, but that seems to be fixed now.
As intended, runtime improves slightly.
Marked as draft because two issues remain:
mmap
usage, which I don’t think makes so much sense.