Skip to content

Link plugins against libhts.so/.dylib and fix dynamically unloading HTSlib#1072

Merged
daviesrob merged 5 commits intosamtools:developfrom
jmarshall:dlhts
Jul 7, 2020
Merged

Link plugins against libhts.so/.dylib and fix dynamically unloading HTSlib#1072
daviesrob merged 5 commits intosamtools:developfrom
jmarshall:dlhts

Conversation

@jmarshall
Copy link
Member

Previously I was trying to avoid linking the hfile_*.so plugins back to libhts, probably mainly because I didn't want arbitrary numbers of libhtses in the address space at once — but of course dlopen's reference counting prevents any such weirdness. Other reasons for not linking them to libhts.so/.dylib/etc are:

  • Avoiding plugins being tied to a particular libhts SOVERSION
  • Especially if your main program (e.g. samtools) has libhts.a statically linked, avoiding needing to dynamically link to libhts at all and so avoiding needing to set $LD_LIBRARY_PATH or rpath or etc so that you can find libhts.so/.dylib

But there is a very good reason for linking them to libhts.so:

  • Making programs that dynamically load libhts with the default dlopen flags on Linux able to use htslib plugins at all

Who on earth dynamically loads libhts? Language bindings.

Notably Python loads Cython modules with default dlopen flags, so this enables pysam on Linux to use remote file access plugins. Which is a pretty big win.


Also a new test program test/plugins-dlhts that exercises this. This program dlopen()s htslib, accesses a few dummy files, and dlclose()s htslib. Which led to a segfault. The other half of this commit fixes the segfault, by adding a hts_lib_shutdown() function that must be called if you want to dlclose(htslib) prior to general program exit. See the commit message for details.

As this is the first htslib test that tests things linked to the shared library, we need to set $LD_LIBRARY_PATH etc so that the freshly-built libhts.so/.dylib/etc is used. This commit also adds test/with-shlib.sh, a script that creates a suitable temporary libdir containing only the freshly-built shared libhts.

When libhts.so/.dylib has been dynamically loaded with RTLD_LOCAL
(which is the default on Linux), plugins that use hts_always_remote()
etc will not be loadable unless they make libhts's symbols available
themselves.

When libhts.so/.dylib has been dynamically loaded, some plugins have been
opened, and dlclose(htslib) is called: htslib is not actually unloaded
(or its atexit functions e.g. hfile_exit() called) as the plugins are still
using libhts symbols. Eventually at program exit, all atexit functions are
called; so the plugins are closed by hfile_exit(), htslib is suddenly
unloaded when the final plugin is unloaded, and a segfault occurs because
this all happens within hfile_exit() -- which has just been unloaded.

To break this cycle, introduce hts_lib_shutdown() which must be called
before dlclose(htslib), if that is called explicitly prior to program exit.
This unloads the plugins so that dlclose() can immediately unload htslib.

Add test exercising libhts via dlopen(). This ensures that all available
hFILE plugins have been loaded, then calls hts_lib_shutdown() and
dlclose(htslib). As this is the first htslib test that tests things
linked to the shared library, we need to set $LD_LIBRARY_PATH etc so
that the freshly-built libhts.so/.dylib/etc is used.

Add with-shlib.sh script that creates a suitable temporary libdir
containing *only* the freshly-built shared libhts. On Windows platforms,
this directory is added to %PATH% so shouldn't contain anything else.
On macOS, having hfile_*.bundle files in $DYLD_LIBRARY_PATH would reduce
the ability to test against different sets of plugins by varying $HTS_PATH.
@jmarshall jmarshall changed the title Link plugins against libhts.so./.dylib and fix dynamically loading HTSlib Link plugins against libhts.so./.dylib and fix dynamically unloading HTSlib May 18, 2020
@jmarshall
Copy link
Member Author

For people building their own rather than using packaged htslib/samtools/etc, the requirement to set up LD_LIBRARY_PATH or equivalent might be painful enough that it would be worth facilitating building the plugins twice, once with libhts linked in and once without…

@jmarshall jmarshall changed the title Link plugins against libhts.so./.dylib and fix dynamically unloading HTSlib Link plugins against libhts.so/.dylib and fix dynamically unloading HTSlib May 18, 2020
@jkbonfield
Copy link
Contributor

I'm wondering who voluntarily uses plugins in anger bar the irods one, and I doubt many other than Sanger are using that. One benefit is that we make linking against libhts.a easier as we don't have to know the list of additional dynamic dependencies (which can be many with libcurl), but that just shifts the problem somewhere else rather than solving it. However by default they're not built so people presumably cope fine, unless everyone is enabling them (unlikely; we can barely get the user-base to do configure first).

It's always felt a bit cumbersome that if we enable plugins in order to gain irods support, then we are forced to disable having things like https and s3 internal to htslib. That feels rather like a case of the tail wagging the dog. So I'd be up for linking the plugins that are distributed within htslib into htslib itself and then enabling external plugins by default rather than having it as a configure option. That would mean the default libhts.so and libhts.a have exactly the same list of dependencies, but we've gained the ability to add additional plugins later if we choose to build it.

If we really want flexibility, then it could be better achieved by a new configure option listing which plugins should be compiled as external objects and built via dlopen. That then gives the best of both worlds and puts the choice back in the hands of the user (or whoever is bundling your package for you).

I know this doesn't quite answer you regarding whether the plugins should have a -lhts in their link lines. I'm ambivalent to that, but note the plugins do infact already use hfile_internal.h so have some (minimal) tie it anyway.

@jmarshall
Copy link
Member Author

jmarshall commented May 19, 2020

I'm wondering who voluntarily uses plugins in anger

It's basically the same question as “who voluntarily uses a shared libhts.so?”. Most people building htslib/bcftools/samtools themselves don't bother and in fact the vanilla makefiles still hardcode the static library (in most configurations other than --with-htslib=system) for deployment simplicity.

OTOH packagers (Debian, Fedora, bioconda, etc) work a little harder to make a good distribution, and they configure samtools/bcftools to use the shared libhts.so from their htslib packages. Similarly the sensible ones work a little harder to produce an htslib package with well-configured plugins.

One benefit is that we make linking against libhts.a easier as we don't have to know the list of additional dynamic dependencies (which can be many with libcurl), but that just shifts the problem somewhere else rather than solving it.

This is a significant benefit (and was one of the motivations for introducing the plugin mechanism). It shifts the problem to one place (building the plugin, perhaps built by your distribution or experienced local person e.g. system administrator) rather than having the problem every time you want to build an end user program that uses htslib.

For example, @daviesrob's htslib-crypt4gh builds only as a plugin and I doubt he has any plan to build it inside libhts.

It's always felt a bit cumbersome that if we enable plugins in order to gain irods support, then we are forced to disable having things like https and s3 internal to htslib.

That may be true, though I note that the maintainers have had five years to change this if they really feel like it's cumbersome. OTOH things like https and S3 also benefit from the linking isolation and separate installation and upgradability afforded by building them as plugins.

For example, the samtools/openssl condapocalyse (proximate causes being incorrectly written package dependencies and conda's inability to represent shared library soversion dependencies) was entirely preventable. If conda's htslib had been built using plugins at the time, samtools (other than S3 access) would have run fine whether the user's conda environment had a compatible libssl/libcrypto or not.

For example, people could update hfile_s3.so for new-style AWS signatures or add hfile_s3_write.so and gain these facilities in their pre-existing installations of samtools and any other htslib-using tools.

I know this doesn't quite answer you regarding whether the plugins should have a -lhts in their link lines. I'm ambivalent to that, but note the plugins do infact already use hfile_internal.h so have some (minimal) tie it anyway.

This PR fixes a serious bug rather than just asking a whether question (though there are some advantages being traded away).

They have indeed always used functions declared in hfile_internal.h; previously it has been hoped that they could access these functions via the already-loaded htslib that opened the plugins, but this only works when that htslib was loaded RTLD_GLOBAL which can't be depended on in all circumstances. Basically it's the same problem as I changed the plugin mechanism for and added hfile_irods_wrapper for to solve iRODS 4.x compatibility problems a few years ago.

@jkbonfield
Copy link
Contributor

jkbonfield commented May 19, 2020

OTOH packagers (Debian, Fedora, bioconda, etc) work a little harder to make a good distribution, and they configure samtools/bcftools to use the shared libhts.so from their htslib packages. Similarly the sensible ones work a little harder to produce an htslib package with well-configured plugins.

I just looked at Ubuntu and it doesn't appear to be using the plugins. I don't really see the benefit for them to do this as it adds complexity while not actually solving anything for them. If they have a bug in htslib they'll just issue a new package, which covers all bar crypt4gh.

One benefit is that we make linking against libhts.a easier as we don't have to know the list of additional dynamic dependencies (which can be many with libcurl), but that just shifts the problem somewhere else rather than solving it.

This is a significant benefit (and was one of the motivations for introducing the plugin mechanism). It shifts the problem to one place (building the plugin, perhaps built by your distribution or experienced local person e.g. system administrator) rather than having the problem every time you want to build an end user program that uses htslib.

I've never found it problematic myself, but generally I'm not explicitly forcing static linking. If you do then that's the whole point of the pkgconfig, which we support.

I'd also say it moves the problem from every time you build a package to every time you run that package, with the requirement in maintaining the environment. (Sometimes at least, depending on how it was built.)

It's always felt a bit cumbersome that if we enable plugins in order to gain irods support, then we are forced to disable having things like https and s3 internal to htslib.

That may be true, though I note that the maintainers have had five years to change this if they really feel like it's cumbersome.

Cheap shot! You know as well as I that there are more things to do than time to do them in and the lack of them being done doesn't mean we don't know they're problems.

You're the author of the plugin system, so let's just leave it as a user request that I'd prefer there to be an explicit choice of which plugins to compile in and which not to.

For example, people could update hfile_s3.so for new-style AWS signatures or add hfile_s3_write.so and gain these facilities in their pre-existing installations of samtools and any other htslib-using tools.

A poor example I think. While it is indeed possible to download a new htslib release, compile it up and install just one single hfile_s3.so I think it's an unlikely use case given they're bundled in the same package. I've never heard of people upgrading an individual .so while not updating the others including the main library at the same time. It'd be a maintenance / packaging nightmare.

It's valid for the crypt4gh though as that's a separate package and may well be updated out of sync with other things. I'm not arguing for doing away with the plugin system. Simply that I feel it's a bit binary: all or nothing.

@jmarshall
Copy link
Member Author

This PR is not about whether the plugin system is a good idea or not, or how best to configure the common libcurl-based facilities.

It is about fixing a bug for those who do have plugins activated.

@jmarshall
Copy link
Member Author

jmarshall commented May 19, 2020

I just looked at Ubuntu and it doesn't appear to be using the plugins.

(That's because the Debian packagers haven't read and considered INSTALL sufficiently, and I haven't yet advocated for it to them. Distribution packagers are clearly in a better position than average users to get this right, and then everyone using those packages gets to benefit. Look at the Fedora or bioconda htslib/samtools/bcftools packages for examples.)

@jmarshall
Copy link
Member Author

Bump: This PR fixes a significant bug for those who do have plugins activated. It would be great if someone would like to review it.

BTW you will find that __attribute__((destructor)) functions fail in the same way as atexit functions.

When libhts.so/.dylib has been dynamically loaded, some plugins have been opened, and dlclose(htslib) is called: htslib is not actually unloaded (or its atexit functions e.g. hfile_exit() called hence destructors are not called) as the plugins are still using libhts symbols.

@daviesrob
Copy link
Member

Sorry, time for HTSlib is a bit limited at the moment.

I see __attribute__((destructor)) has an interesting effect on Macs, which is a pity as it works very well on Linux.

It would be nice if the library could still clean up after itself without having to call an extra function before dlclose() - especially when forgetting to do so mostly works, apart from when someone loads a plugin at which point they get a segmentation fault.

The other thing I've been looking into is if there's any risk of getting #964 like problems on Unix. I think it should be OK as long as the plug-in links the HTSlib shared library as the necessary symbols are already present. I haven't tried plug-ins linked statically to HTSlib yet.

@daviesrob
Copy link
Member

After more experimentation is seems there's no way to get around having hts_lib_shutdown(). As you noted, it's required to get HTSlib itself to unload. However, I also concluded that the atexit() handler should not call dl_close(), which fixes the segmentation fault if you forget to call hts_lib_shutdown() before trying to close libhts. As I see it, there are three basic scenarios where the atexit() handler is called:

  1. hts_lib_shutdown() gets called before dl_close(htslib). The plugins have already been unloaded, so the atexit() handler doesn't need to dl_close() anything.
  2. It's program exit time. The atexit() handler only needs to clean up the memory; the operating system will close the libraries for us. This is the case you can arrive at by failing to call hts_lib_shutdown() prior to dl_close(htslib) and have plugins that link libhts.so; by not calling dl_close(htslib); or when htslib is linked directly instead of via dl_open().
  3. You failed to call hts_lib_shutdown(), did call dl_close(htslib) and there are no plugins that link to libhts.so. In this case the plugins will remain loaded until program exit, where they will be cleaned up. I don't think this is a major problem.

Branch https://github.com/daviesrob/htslib/tree/pr1072_exp2 implements this, along with a couple of other fixes I had to make when I tried getting Travis to build with --enable-plugins.

Avoid -pedantic warnings about data pointer <-> function pointer
conversions (which are required to work on POSIX platforms so that
dlsym() works, but are nonetheless warned about by compilers).

As we're providing wrappers around dlsym() anyway, we can provide
separate function and data wrappers; change load_plugin()'s return
type as it (practically) always returns a function. This uses the
workaround recommended in POSIX Issue 6 dlsym() (since removed in
Issue 7 by [1]; at least for GCC, type-punning through a union
might be preferable). Hat tip Rob Davies.

[1] https://www.austingroupbugs.net/view.php?id=74
Facilitates adding debugging printfs to e.g. hfile_shutdown()
to observe exactly where in the sequence they occur.
@daviesrob
Copy link
Member

While you're looking at this, I noticed a few minutes ago that htslib --enable-plugins seems to break the samtools build. I think it's due to -ldl not being added to static-LIBS in configure, resulting in missing dlopen() etc. while trying to link some of samtool's test programs. Might be worth adding a fix here?

Using Address Sanitizer on Ubuntu Xenial with gcc-8 somehow
breaks the configure test to see if -ldl is needed, making it
incorrectly decide that it isn't.  This causes a link failure
when building test/plugins-dlhts when the compiler decides
that it really did want -ldl after all.  Oddly, other
outputs that reference dlopen(), dlsym() etc. have already
built successfully by this point.

Fix by making configure test for dlsym() instead of dlopen(),
which seems to work better.
@jmarshall
Copy link
Member Author

That should be fixed by the version of your “fix odd -ldl configure test breakage” commit that I just pushed to this PR.

@jmarshall
Copy link
Member Author

jmarshall commented Jun 24, 2020

Thanks for experimenting with it. Re your experiment 1 with __attribute__((destructor)), I see it works on Linux as, unlike atexit functions in the normal text sections, the dtor sections aren't released until the destructor functions therein have been run. I thought I tested this both ways on Linux, but I guess I only tested the destructor version on macOS — where unfortunately it crashes and burns.

On to experiment 2, your pr1072_exp2 branch.

Thanks for reminding me that despite POSIX's wishes, function and data pointers are different things! I've pushed a commit that fixes this by providing a better API rather than writing out the awful POSIX workaround multiple times.

I've also applied an updated version of your commit enabling plugins on travis and the resulting configure.ac tweak.

Now to the main part, “Don't dl_close() [sic] plugins in atexit() handler”. The purpose of this is to defang forgetting to call hts_lib_shutdown() in the experts-only scenario of dlclosing a dynamically opened HTSlib — changing the result from a segfault to a bit of a leak until the main program exits. (The main purpose BTW of all this shutting down infrastructure is to provide plugins an opportunity to call things like curl_global_cleanup(), and I see that is unaffected by this commit as the plugin.destroy() call is unchanged. +1 to that.)

However personally I'd prefer not to pander to this scenario, mainly because it pessimises your scenario 2. It means that simple code that e.g. staticly links to libhts.a and uses some plugins doesn't clean up after itself. So you'd get additional valgrind warnings about not having closed the plugins. (Admittedly this is not a strong argument as — with glibc at least — using dlopen at all leads to ‘leaks’ reported by valgrind.) Alternatively even code that uses static libhts.a has the option of using hts_lib_shutdown() itself, but the point of this defanging commit is to make hts_lib_shutdown less necessary, not more!

Fortunately this final change is entirely internal to libhts and doesn't affect any APIs. And python3 import pysam; pysam.AlignmentFile('s3:expect_invalid_argument') now works on Linux as well either with or without this final change. So I'm not going to mind either way, and really it's up to whoever merges this PR to decide whether to add that final commit or not.

Prevents a segfault when HTSlib has been imported using dlopen(),
a plugin that itself links against HTSlib is in use and
hts_lib_shutdown() was not called before dlclose(htslib).

If hts_lib_shutdown() is not called, dlclose(htslib) will
not remove libhts.so if any plugins linking it are installed.
In this case the atexit() handler runs at program exit where
it can clean up any memory still in use.  There's no need to
remove the plugin shared libraries (which causes the segfault
as it removes libhts.so while code inside it is running) as
the runtime will tidy them up anyway.

If hts_lib_shutdown() has been called, the plugin shared
libraries will already have been removed.
@daviesrob
Copy link
Member

I get memory leaks from code that statically links libhts.a when using plug-ins even from the current develop branch (82d01ba):

==7890== LEAK SUMMARY:
==7890==    definitely lost: 0 bytes in 0 blocks
==7890==    indirectly lost: 0 bytes in 0 blocks
==7890==      possibly lost: 0 bytes in 0 blocks
==7890==    still reachable: 57,663 bytes in 161 blocks
==7890==         suppressed: 0 bytes in 0 blocks

Most of this came from dlopen() and OpenSSL.

Using this PR where the atexit() handler calls dlclose() changes this, but doesn't get rid of the leaks completely:

==5606== LEAK SUMMARY:
==5606==    definitely lost: 208 bytes in 2 blocks
==5606==    indirectly lost: 784 bytes in 4 blocks
==5606==      possibly lost: 0 bytes in 0 blocks
==5606==    still reachable: 1,581 bytes in 5 blocks
==5606==         suppressed: 0 bytes in 0 blocks

It's not easy to work out where the lost memory came from in this case because the functions involved have been unloaded by the time the back-trace gets printed, so the addresses no longer exist and the names are missing.

Stopping the atexit() handler from calling dlclose() gives results similar to develop:

==5954== LEAK SUMMARY:
==5954==    definitely lost: 0 bytes in 0 blocks
==5954==    indirectly lost: 0 bytes in 0 blocks
==5954==      possibly lost: 0 bytes in 0 blocks
==5954==    still reachable: 57,695 bytes in 162 blocks
==5954==         suppressed: 0 bytes in 0 blocks

Given this, I'm inclined to go with the version that doesn't call dlclose() in atexit(). We can always revisit it if we come up with a better solution later.

@daviesrob
Copy link
Member

No dlclose() commit added....

@daviesrob daviesrob merged commit c9fdcbf into samtools:develop Jul 7, 2020
@jmarshall
Copy link
Member Author

Thanks for merging it, Rob. Bioconda's pysam package will now (in due course) be able to return to using their htslib package rather than bundling an incomplete build.

@jmarshall jmarshall deleted the dlhts branch July 7, 2020 13:31
jmarshall added a commit to jmarshall/htslib that referenced this pull request Dec 1, 2020
PR samtools#1072 changed plugin linking so that plugins are linked back to
the dynamic libhts.so/.dylib, to facilitate use when libhts is itself
dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime
which uses default dlopen() flags which on Linux means RTLD_LOCAL.

This broke plugin loading on macOS when opening plugins in an executable
in which libhts.a has been statically linked, as there were then two
copies of the library globals (notably hfile.c::schemes), one from the
executable's libhts.a and one from the plugin's libhts.NN.dylib.
(The Linux loading model does not suffer from this issue.)

The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed
by reverting the change (on macOS only) and depending on the symbols
supplied by a static libhts.a, a dynamically linked libhts.NN.dylib,
or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case
of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL,
but this is not a common case. Fixes samtools#1176.

Disable the `plugins-dlhts -l` test case on macOS. Add a test of
accessing plugins from an executable with a statically linked libhts.a
(namely, htsfile) to test/test.pl.
jmarshall added a commit to jmarshall/htslib that referenced this pull request Dec 1, 2020
PR samtools#1072 changed plugin linking so that plugins are linked back to
the dynamic libhts.so/.dylib, to facilitate use when libhts is itself
dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime
which uses default dlopen() flags which on Linux means RTLD_LOCAL.

This broke plugin loading on macOS when opening plugins in an executable
in which libhts.a has been statically linked, as there were then two
copies of the library globals (notably hfile.c::schemes), one from the
executable's libhts.a and one from the plugin's libhts.NN.dylib.
(The Linux loading model does not suffer from this issue.)

The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed
by reverting the change (on macOS only) and depending on the symbols
supplied by a static libhts.a, a dynamically linked libhts.NN.dylib,
or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case
of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL,
but this is not a common case. Fixes samtools#1176.

Disable the `plugins-dlhts -l` test case on macOS. Add a test of
accessing plugins from an executable with a statically linked libhts.a
(namely, htsfile) to test/test.pl.
daviesrob pushed a commit that referenced this pull request Dec 15, 2020
PR #1072 changed plugin linking so that plugins are linked back to
the dynamic libhts.so/.dylib, to facilitate use when libhts is itself
dynamically dlopen()ed with RTLD_LOCAL, e.g., by the Python runtime
which uses default dlopen() flags which on Linux means RTLD_LOCAL.

This broke plugin loading on macOS when opening plugins in an executable
in which libhts.a has been statically linked, as there were then two
copies of the library globals (notably hfile.c::schemes), one from the
executable's libhts.a and one from the plugin's libhts.NN.dylib.
(The Linux loading model does not suffer from this issue.)

The default dlopen() flag on macOS is RTLD_GLOBAL, so this can be fixed
by reverting the change (on macOS only) and depending on the symbols
supplied by a static libhts.a, a dynamically linked libhts.NN.dylib,
or a RTLD_GLOBALly dlopen()ed libhts.NN.dylib. This rebreaks the case
of dlopen()ing libhts on macOS while explicitly specifying RTLD_LOCAL,
but this is not a common case. Fixes #1176.

Disable the `plugins-dlhts -l` test case on macOS. Add a test of
accessing plugins from an executable with a statically linked libhts.a
(namely, htsfile) to test/test.pl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments