Crc+libdeflate#581
Conversation
|
Some stats: Flagstat for BAM read test (1 thread). Conclusion.
NB Libdeflate version also has the additional CRC check PR merged, not present in any others. Hence the additional flagstat tests. The flagstat on uncompressed BAM test shows time spent in other code (hashing etc) rather than uncompression. We can also see the impact of checking our CRCs with libdeflate. Sizes - default compress levels look broadly similar in size. None |
When specifying CPPFLAGS, LDFLAGS and LIBS we normally have those fields being added to whatever the compilation needs for itself. This is true for CPPFLAGS and LDFLAGS, but not LIBS which needed e.g. "-lz -llzma -lbz2" manually adding in also to our own extra LIBS.
Currently there is no autoconf setup. Compile with:
make CPPFLAGS="-DHAVE_LIBDEFLATE -I$HOME/ftp/compression/libdeflate" LDFLAGS="-L$HOME/ftp/compression/libdeflate" LIBS="-ldeflate"
This overly complex and ideally should just be a single parameter:
./configure --with-libdeflate=$HOME/ftp/compression/libdeflate
make
It fails one test in htslib; rebgzip, which is fundamentally broken by
design and cannot be fixed. I'm tempted to suggest we just skip the
test in this scenario.
It fails several in samtools, but these are probably also broken tests
which would also fail if we switched to e.g. Intel or CloudFlare
zlibs.
Speed tests on a small file:
u.BAM -> BAM Was 13.5s, now 6.3s.
BAM -> u.BAM Was 2.5s, now 1.8s.
It's disabled by default.
24be5f1 to
7780e00
Compare
Frankly this ability is suspect. It's there for specific use cases, but we cannot reliably test with it given there is no requirement that htslib only uses vanilla zlib. We could add a test for this perhaps, but it doesn't seem worth it for this one corner case.
c9fc8a5 to
455cfe6
Compare
|
Updated to work with autoconf and fixed (OK removed!) the duff test. I experimented with using libdeflate in CRAM, but dropped it as a) it's a tiny speed change and b) it's hard to get right given cram mixes deflate and gzip (index) streams. |
I don't see how this worked before with HTS_PROG_CC_WARNINGS.
|
Finally enough to get travis working. I view this as ready for review again. See also the samtools/samtools#777 to make tests work there. |
|
|
||
| script: | ||
| - if test "$USE_CONFIG" = "yes" ; then autoreconf && ./configure ; fi && make -e && make test | ||
| - if test "$USE_CONFIG" = "yes" ; then autoreconf -I m4 && ./configure ; fi && make -e && make test |
There was a problem hiding this comment.
This should not be necessary. Put an m4_include line in configure.ac instead to allow autoconf to find the file.
We should also have at least one test that actually builds and tests against libdeflate.
There was a problem hiding this comment.
Ah that explains how come the warnings m4 worked withput the -I m4. I was scratching my head on that but failed to find it - I'd suggest moving the include down to adjacent to the usage of it.
However surely it's simpler to have one minor command line tweak than two much longer lines to specify the include files? Trivial to fix though.
| EXTRA_CFLAGS_PIC = -fpic | ||
| LDFLAGS = | ||
| LIBS = $(htslib_default_libs) | ||
| HTS_LIBS = $(htslib_default_libs) |
There was a problem hiding this comment.
This does not work properly with autoconf, which expects to be able to override the default in LIBS. The result is that builds using configure end up with duplicated libraries listed on the link line. It also make it impossible to remove any of the defaults should you want to.
This change (and the related ones to the link lines) should be reverted.
| then | ||
| if test "$_libdeflate_with" != "yes" | ||
| then | ||
| if test -f "${_libdeflate_with}/include/libdeflate.h" |
There was a problem hiding this comment.
This introduces behaviour which is different to all the other library-related with- and enable- options. It's also behaviour which is discouraged, for example the GNU coding standards state:
Do not use a ‘--with’ option to specify the file name to use to find certain files. That is outside the scope of what ‘--with’ options are for.
(Yes, I know we don't follow them in lots of other respects. I think we should in this one).
I suspect these tests could be fragile, and I don't see why this library has to be special compared to all the others we use.
There was a problem hiding this comment.
The original PR didn't have any configure script at all because I knew you'd object to the way I do it, but after languishing for 6 months I decided to move it along and do it. GNU themselves aren't sure of how to do this, hence I did it my way.
-
This is compatible with the wording of the autoconf manual, which states "An argument that is neither ‘yes’ nor ‘no’ could include a name or number of a version of the other package, to specify more precisely which other package this program is supposed to work with". A directory is a very precise way of indicating the package to work with and at no point does it forbid this. https://www.gnu.org/software/autoconf/manual/autoconf-2.66/html_node/External-Software.html#External-Software
-
GNU themselves also recommend use of it in this manner. Eg https://www.gnu.org/software/libidn/manual/html_node/Autoconf-tests.html
-
It's useful. Let's face it, people aren't going to find libdeflate on the system automatically. I could understand the argument for something common and available for install on most systems, but custom software is basically going to boil down to manually specifying LIBS, CPPFLAGS and LDFLAGS. In which case you may as well also add CFLAGS=-DHAVE__LIBDEFLATE and ignore the whole autoconf in the first place for what little gain it gives you.
-
If you really dislike this, then I assume you'll be modifying samtools configure.ac to disallow --with-htslib=DIR too? There is precedent within this package.
-
Specifying a directory isn't mandatory. You could always use
--with-libdeflate=yesand then add in the directory it resides, twice (once for CPPFLAGS and once for LDFLAGS), if you enjoy typing. :-)
| [ | ||
| AC_ARG_WITH(libdeflate, | ||
| AC_HELP_STRING([--with-libdeflate=DIR],[look for libdeflate in DIR]), | ||
| [_libdeflate_with=$withval],[_libdeflate_with="no"]) |
There was a problem hiding this comment.
Variables in autoconf macros should start with ac_. See https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Coding-Style.html
| else | ||
| CPPFLAGS="$CPPFLAGS -I${_libdeflate_with}" | ||
| fi | ||
| if test -f "${_libdeflate_with}/lib/libdeflate.a" -o -f "${_libdeflate_with}/lib/libdeflate.so" |
| [AC_CHECK_HEADER(libdeflate.h, [libdeflate_ok=yes LIBS="$LIBS -ldeflate"], libdeflate_ok=no)]) | ||
| if test "$libdeflate_ok" != "yes" | ||
| then | ||
| AC_MSG_WARN("--with-libdeflate specified, but non functioning") |
There was a problem hiding this comment.
This should be an error if you actually asked for libdeflate. The message is also very terse compared to the other ones we print when tests fail.
| fi | ||
|
|
||
| AH_TEMPLATE([HAVE_LIBDEFLATE], [Define if libdeflate is installed]) | ||
| AM_CONDITIONAL(HAVE_LIBDEFLATE, test "$libdeflate_ok" = "yes") |
There was a problem hiding this comment.
This is an automake macro. It adds dependency on automake to the entire configure script. The results are also not used, so it can be removed.
| [], [enable_s3=check]) | ||
|
|
||
| AX_LIBDEFLATE([ | ||
| pc_requires="$pc_requires $LIBDEFLATE_LDFLAGS" |
There was a problem hiding this comment.
pc_requires is for pkg-config packages, not libraries.
You need private_LIBS
| #include <libdeflate.h> | ||
| #define crc32(a,b,c) libdeflate_crc32((a),(b),(c)) | ||
| #endif | ||
|
|
There was a problem hiding this comment.
Will there be a libdeflate equivalent of zlib_mem_deflate()?
| if test "$libdeflate_ok" = "yes" | ||
| then | ||
| AC_DEFINE(HAVE_LIBDEFLATE, 1, | ||
| [Define to 1 if you have a functional libz.]) |
A proof of concept for integrating htslib with https://github.com/ebiggers/libdeflate
NB: this includes #579 (fixes #579).
Speed wise, it thrashes zlib and looks to beat intel and cloudflare implementations too, apart from compression level 1 (specialised in Intel). Plus it works on ARM neon. The API is totally different to zlib (ie it's not insane) and it doesn't support compression level 0 so that needed special hacks.
More work is needed on test harness (especially samtools) due to binary comparison of gz. These would equally fail on intel & cloudflare implementations too so it is the tests that are at fault.
More work is needed on configuration, pkgconfig, etc. See the commit message for more. I'm not going to do this myself though as it's clear from discussions here that my autoconf mods would be rejected anyway! So over to someone else! ;-)