Link plugins against libhts.so/.dylib and fix dynamically unloading HTSlib#1072
Link plugins against libhts.so/.dylib and fix dynamically unloading HTSlib#1072daviesrob merged 5 commits intosamtools:developfrom
Conversation
When libhts.so/.dylib has been dynamically loaded with RTLD_LOCAL (which is the default on Linux), plugins that use hts_always_remote() etc will not be loadable unless they make libhts's symbols available themselves. When libhts.so/.dylib has been dynamically loaded, some plugins have been opened, and dlclose(htslib) is called: htslib is not actually unloaded (or its atexit functions e.g. hfile_exit() called) as the plugins are still using libhts symbols. Eventually at program exit, all atexit functions are called; so the plugins are closed by hfile_exit(), htslib is suddenly unloaded when the final plugin is unloaded, and a segfault occurs because this all happens within hfile_exit() -- which has just been unloaded. To break this cycle, introduce hts_lib_shutdown() which must be called before dlclose(htslib), if that is called explicitly prior to program exit. This unloads the plugins so that dlclose() can immediately unload htslib. Add test exercising libhts via dlopen(). This ensures that all available hFILE plugins have been loaded, then calls hts_lib_shutdown() and dlclose(htslib). As this is the first htslib test that tests things linked to the shared library, we need to set $LD_LIBRARY_PATH etc so that the freshly-built libhts.so/.dylib/etc is used. Add with-shlib.sh script that creates a suitable temporary libdir containing *only* the freshly-built shared libhts. On Windows platforms, this directory is added to %PATH% so shouldn't contain anything else. On macOS, having hfile_*.bundle files in $DYLD_LIBRARY_PATH would reduce the ability to test against different sets of plugins by varying $HTS_PATH.
|
For people building their own rather than using packaged htslib/samtools/etc, the requirement to set up |
|
I'm wondering who voluntarily uses plugins in anger bar the irods one, and I doubt many other than Sanger are using that. One benefit is that we make linking against libhts.a easier as we don't have to know the list of additional dynamic dependencies (which can be many with libcurl), but that just shifts the problem somewhere else rather than solving it. However by default they're not built so people presumably cope fine, unless everyone is enabling them (unlikely; we can barely get the user-base to do configure first). It's always felt a bit cumbersome that if we enable plugins in order to gain irods support, then we are forced to disable having things like https and s3 internal to htslib. That feels rather like a case of the tail wagging the dog. So I'd be up for linking the plugins that are distributed within htslib into htslib itself and then enabling external plugins by default rather than having it as a configure option. That would mean the default libhts.so and libhts.a have exactly the same list of dependencies, but we've gained the ability to add additional plugins later if we choose to build it. If we really want flexibility, then it could be better achieved by a new configure option listing which plugins should be compiled as external objects and built via dlopen. That then gives the best of both worlds and puts the choice back in the hands of the user (or whoever is bundling your package for you). I know this doesn't quite answer you regarding whether the plugins should have a |
It's basically the same question as “who voluntarily uses a shared libhts.so?”. Most people building htslib/bcftools/samtools themselves don't bother and in fact the vanilla makefiles still hardcode the static library (in most configurations other than OTOH packagers (Debian, Fedora, bioconda, etc) work a little harder to make a good distribution, and they configure samtools/bcftools to use the shared libhts.so from their htslib packages. Similarly the sensible ones work a little harder to produce an htslib package with well-configured plugins.
This is a significant benefit (and was one of the motivations for introducing the plugin mechanism). It shifts the problem to one place (building the plugin, perhaps built by your distribution or experienced local person e.g. system administrator) rather than having the problem every time you want to build an end user program that uses htslib. For example, @daviesrob's htslib-crypt4gh builds only as a plugin and I doubt he has any plan to build it inside libhts.
That may be true, though I note that the maintainers have had five years to change this if they really feel like it's cumbersome. OTOH things like https and S3 also benefit from the linking isolation and separate installation and upgradability afforded by building them as plugins. For example, the samtools/openssl condapocalyse (proximate causes being incorrectly written package dependencies and conda's inability to represent shared library soversion dependencies) was entirely preventable. If conda's htslib had been built using plugins at the time, For example, people could update hfile_s3.so for new-style AWS signatures or add hfile_s3_write.so and gain these facilities in their pre-existing installations of samtools and any other htslib-using tools.
This PR fixes a serious bug rather than just asking a whether question (though there are some advantages being traded away). They have indeed always used functions declared in hfile_internal.h; previously it has been hoped that they could access these functions via the already-loaded htslib that opened the plugins, but this only works when that htslib was loaded |
I just looked at Ubuntu and it doesn't appear to be using the plugins. I don't really see the benefit for them to do this as it adds complexity while not actually solving anything for them. If they have a bug in htslib they'll just issue a new package, which covers all bar crypt4gh.
I've never found it problematic myself, but generally I'm not explicitly forcing static linking. If you do then that's the whole point of the pkgconfig, which we support. I'd also say it moves the problem from every time you build a package to every time you run that package, with the requirement in maintaining the environment. (Sometimes at least, depending on how it was built.)
Cheap shot! You know as well as I that there are more things to do than time to do them in and the lack of them being done doesn't mean we don't know they're problems. You're the author of the plugin system, so let's just leave it as a user request that I'd prefer there to be an explicit choice of which plugins to compile in and which not to.
A poor example I think. While it is indeed possible to download a new htslib release, compile it up and install just one single hfile_s3.so I think it's an unlikely use case given they're bundled in the same package. I've never heard of people upgrading an individual .so while not updating the others including the main library at the same time. It'd be a maintenance / packaging nightmare. It's valid for the crypt4gh though as that's a separate package and may well be updated out of sync with other things. I'm not arguing for doing away with the plugin system. Simply that I feel it's a bit binary: all or nothing. |
|
This PR is not about whether the plugin system is a good idea or not, or how best to configure the common libcurl-based facilities. It is about fixing a bug for those who do have plugins activated. |
(That's because the Debian packagers haven't read and considered |
|
Bump: This PR fixes a significant bug for those who do have plugins activated. It would be great if someone would like to review it. BTW you will find that
|
|
Sorry, time for HTSlib is a bit limited at the moment. I see It would be nice if the library could still clean up after itself without having to call an extra function before The other thing I've been looking into is if there's any risk of getting #964 like problems on Unix. I think it should be OK as long as the plug-in links the HTSlib shared library as the necessary symbols are already present. I haven't tried plug-ins linked statically to HTSlib yet. |
|
After more experimentation is seems there's no way to get around having
Branch https://github.com/daviesrob/htslib/tree/pr1072_exp2 implements this, along with a couple of other fixes I had to make when I tried getting Travis to build with |
Avoid -pedantic warnings about data pointer <-> function pointer conversions (which are required to work on POSIX platforms so that dlsym() works, but are nonetheless warned about by compilers). As we're providing wrappers around dlsym() anyway, we can provide separate function and data wrappers; change load_plugin()'s return type as it (practically) always returns a function. This uses the workaround recommended in POSIX Issue 6 dlsym() (since removed in Issue 7 by [1]; at least for GCC, type-punning through a union might be preferable). Hat tip Rob Davies. [1] https://www.austingroupbugs.net/view.php?id=74
Facilitates adding debugging printfs to e.g. hfile_shutdown() to observe exactly where in the sequence they occur.
|
While you're looking at this, I noticed a few minutes ago that htslib |
Using Address Sanitizer on Ubuntu Xenial with gcc-8 somehow breaks the configure test to see if -ldl is needed, making it incorrectly decide that it isn't. This causes a link failure when building test/plugins-dlhts when the compiler decides that it really did want -ldl after all. Oddly, other outputs that reference dlopen(), dlsym() etc. have already built successfully by this point. Fix by making configure test for dlsym() instead of dlopen(), which seems to work better.
|
That should be fixed by the version of your “fix odd -ldl configure test breakage” commit that I just pushed to this PR. |
|
Thanks for experimenting with it. Re your experiment 1 with On to experiment 2, your pr1072_exp2 branch. Thanks for reminding me that despite POSIX's wishes, function and data pointers are different things! I've pushed a commit that fixes this by providing a better API rather than writing out the awful POSIX workaround multiple times. I've also applied an updated version of your commit enabling plugins on travis and the resulting configure.ac tweak. Now to the main part, “Don't dl_close() [sic] plugins in atexit() handler”. The purpose of this is to defang forgetting to call However personally I'd prefer not to pander to this scenario, mainly because it pessimises your scenario 2. It means that simple code that e.g. staticly links to libhts.a and uses some plugins doesn't clean up after itself. So you'd get additional valgrind warnings about not having closed the plugins. (Admittedly this is not a strong argument as — with glibc at least — using dlopen at all leads to ‘leaks’ reported by valgrind.) Alternatively even code that uses static libhts.a has the option of using Fortunately this final change is entirely internal to libhts and doesn't affect any APIs. And python3 |
Prevents a segfault when HTSlib has been imported using dlopen(), a plugin that itself links against HTSlib is in use and hts_lib_shutdown() was not called before dlclose(htslib). If hts_lib_shutdown() is not called, dlclose(htslib) will not remove libhts.so if any plugins linking it are installed. In this case the atexit() handler runs at program exit where it can clean up any memory still in use. There's no need to remove the plugin shared libraries (which causes the segfault as it removes libhts.so while code inside it is running) as the runtime will tidy them up anyway. If hts_lib_shutdown() has been called, the plugin shared libraries will already have been removed.
|
I get memory leaks from code that statically links Most of this came from Using this PR where the It's not easy to work out where the lost memory came from in this case because the functions involved have been unloaded by the time the back-trace gets printed, so the addresses no longer exist and the names are missing. Stopping the Given this, I'm inclined to go with the version that doesn't call |
|
No |
|
Thanks for merging it, Rob. Bioconda's pysam package will now (in due course) be able to return to using their htslib package rather than bundling an incomplete build. |
PR samtools#1072 changed plugin linking so that plugins are linked back to the dynamic libhts.so/.dylib, to facilitate use when libhts is itself dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime which uses default dlopen() flags which on Linux means RTLD_LOCAL. This broke plugin loading on macOS when opening plugins in an executable in which libhts.a has been statically linked, as there were then two copies of the library globals (notably hfile.c::schemes), one from the executable's libhts.a and one from the plugin's libhts.NN.dylib. (The Linux loading model does not suffer from this issue.) The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed by reverting the change (on macOS only) and depending on the symbols supplied by a static libhts.a, a dynamically linked libhts.NN.dylib, or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL, but this is not a common case. Fixes samtools#1176. Disable the `plugins-dlhts -l` test case on macOS. Add a test of accessing plugins from an executable with a statically linked libhts.a (namely, htsfile) to test/test.pl.
PR samtools#1072 changed plugin linking so that plugins are linked back to the dynamic libhts.so/.dylib, to facilitate use when libhts is itself dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime which uses default dlopen() flags which on Linux means RTLD_LOCAL. This broke plugin loading on macOS when opening plugins in an executable in which libhts.a has been statically linked, as there were then two copies of the library globals (notably hfile.c::schemes), one from the executable's libhts.a and one from the plugin's libhts.NN.dylib. (The Linux loading model does not suffer from this issue.) The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed by reverting the change (on macOS only) and depending on the symbols supplied by a static libhts.a, a dynamically linked libhts.NN.dylib, or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL, but this is not a common case. Fixes samtools#1176. Disable the `plugins-dlhts -l` test case on macOS. Add a test of accessing plugins from an executable with a statically linked libhts.a (namely, htsfile) to test/test.pl.
PR #1072 changed plugin linking so that plugins are linked back to the dynamic libhts.so/.dylib, to facilitate use when libhts is itself dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime which uses default dlopen() flags which on Linux means RTLD_LOCAL. This broke plugin loading on macOS when opening plugins in an executable in which libhts.a has been statically linked, as there were then two copies of the library globals (notably hfile.c::schemes), one from the executable's libhts.a and one from the plugin's libhts.NN.dylib. (The Linux loading model does not suffer from this issue.) The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed by reverting the change (on macOS only) and depending on the symbols supplied by a static libhts.a, a dynamically linked libhts.NN.dylib, or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL, but this is not a common case. Fixes #1176. Disable the `plugins-dlhts -l` test case on macOS. Add a test of accessing plugins from an executable with a statically linked libhts.a (namely, htsfile) to test/test.pl.
Previously I was trying to avoid linking the
hfile_*.soplugins back tolibhts, probably mainly because I didn't want arbitrary numbers of libhtses in the address space at once — but of coursedlopen's reference counting prevents any such weirdness. Other reasons for not linking them to libhts.so/.dylib/etc are:$LD_LIBRARY_PATHor rpath or etc so that you can find libhts.so/.dylibBut there is a very good reason for linking them to libhts.so:
dlopenflags on Linux able to use htslib plugins at allWho on earth dynamically loads libhts? Language bindings.
Notably Python loads Cython modules with default
dlopenflags, so this enables pysam on Linux to use remote file access plugins. Which is a pretty big win.Also a new test program test/plugins-dlhts that exercises this. This program
dlopen()s htslib, accesses a few dummy files, anddlclose()s htslib. Which led to a segfault. The other half of this commit fixes the segfault, by adding ahts_lib_shutdown()function that must be called if you want todlclose(htslib)prior to general program exit. See the commit message for details.As this is the first htslib test that tests things linked to the shared library, we need to set
$LD_LIBRARY_PATHetc so that the freshly-built libhts.so/.dylib/etc is used. This commit also adds test/with-shlib.sh, a script that creates a suitable temporary libdir containing only the freshly-built shared libhts.