Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbol versions to libhts.so #1560

Merged
merged 4 commits into from
Feb 8, 2023

Conversation

daviesrob
Copy link
Member

@daviesrob daviesrob commented Feb 6, 2023

The first commit with the htslib.map file is entirely based on the content in this comment from @jmarshall, so I've taken the liberty of making him the author of it.

I've also added configure support, so people who dislike versioned symbols can turn them off using ./configure --disable-versioned-symbols. The option name was copied from the same feature in curl's configure script.

There's also a rudimentary Makefile rule to update the htslib.map file with any new exported symbols. It includes checks to ensure that HTSlib's version number has been updated before adding anything. If you want to force it to do something for testing, it's possible by overriding the PACKAGE_VERSION variable, e.g.:

make htslib.map PACKAGE_VERSION=1.17

Note that the htslib.map file should only be updated on package release, otherwise you risk breaking compatibility for programs that link against the library when the latter gets updated.

To see how symbol versions affect linking, it's possible to make multiple copies of the shared library with different map files installed in separate directories. Then you can try building e.g. samtools against each version using configure --with-htslib=/tmp/htslib_1.16 etc., and use LD_LIBRARY_PATH to force it to use the different versions of the shared library to see what happens. I've summarised the results for the current samtools and HTSlib develop branches below. The three versions I used were:

"unversioned" : Symbol versioning turned off using ./configure --disable-versioned-symbols

"HTSLIB_1.16" : The htslib.map file in this PR. New symbols added since 1.16 are unversioned.

"HTSLIB_1.17" : Using an updated htslib.map with the new symbols in an HTSLIB_1.17 section. This is what the next release should look like.

Built against ↓ / runtime linked to → unversioned HTSLIB_1.16 HTSLIB_1.17
unversioned works works works
HTSLIB_1.16 works works works
HTSLIB_1.17 works fails works

So samtools build against an unversioned library will work with anything, and an unversioned library will work with samtools even if it was built against a versioned one. samtools built against the HTSLIB_1.16 library will still work even if the library is upgraded to HTSLIB_1.17. But samtools expecting HTSLIB_1.17 will complain about "version 'HTSLIB_1.17' not found" and refuse to run if given the HTSLIB_1.16 library (even though the symbols needed are present in unversioned form).

Fixes #1505

jmarshall and others added 4 commits February 6, 2023 10:08
To ease dependency tracking.  In particular, Linux distributions
will be able to automatically check dependencies at package
build time.
Default is to try to enable versioned symbols in libhts.so.
This can be overridden using --disable-versioned-symbols.  A
simple test is made to check that the linker option is recognised.
The test uses the htslib.map file in the source - it seems to work
even though the symbols in the map aren't referenced in the test
program being linked.

The Makefile and config.mk.in are adjusted so that the configure
result overrides the Makefile default.
Adds a very basic script to do this.  It expects an htslib.map
file to be already present, assumes that all entries are
in order and that it has a suitable format for the part that
extracts the previous version number.  It extracts any new symbols
from the .so file and appends them to the map, but only if the
HTSlib version number (extracted from PACKAGE_VERSION) has changed.

Apart from for testing, this rule should only be used when building
a release.  Between releases new symbols should be left unversioned
so that binaries linked against a libhts.so will continue to work
if the library is upgraded from one build against develop to the
next release.

After running this rule, the shared library should be rebuilt
so that it picks up the new version names, which makes for a
circular dependency between libhts.so and htslib.map.  This rule
does not attempt to solve that, so the shared library will
currently have to be rebuilt manually.
@jkbonfield
Copy link
Contributor

"HTSLIB_1.17" : Using an updated htslib.map with the new symbols in an HTSLIB_1.17 section. This is what the next release should look like.
Built against ↓ / runtime linked to → unversioned HTSLIB_1.16 HTSLIB_1.17
unversioned works works works
HTSLIB_1.16 works works works
HTSLIB_1.17 works fails works

So samtools build against an unversioned library will work with anything, and an unversioned library will work with samtools even if it was built against a versioned one. samtools built against the HTSLIB_1.16 library will still work even if the library is upgraded to HTSLIB_1.17. But samtools expecting HTSLIB_1.17 will complain about "version 'HTSLIB_1.17' not found" and refuse to run if given the HTSLIB_1.16 library (even though the symbols needed are present in unversioned form).

I'm genuinely curious on this one. What's actually going on here?

If current samtools can build, link and uses htslib 1.16, then it clearly doesn't need any symbols from 1.17. So why if we build it against 1.17 and then replace the shared library with a 1.16 one does it die?

Is it due to some inline functions used by samtools that have changed, that now refer to htslib 1.17 symbols only?

Naively what I'd expect is samtools when built and linked against a newer versions would be specifying all the symbols it uses and their version numbers, and as long as those versions are present in whatever library we use at run time then it should work.

@daviesrob
Copy link
Member Author

That's because ld.so doesn't resolve all symbols at start-up (unless you set LD_BIND_NOW). So when versioned symbols are present, it doesn't bother to check all the functions to see if they're around. Instead it does a quick check to ensure that all the version names it expects are present, and fails if any of them are missing.

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 7, 2023

For clarity incase others didn't read the above very carefully, when the table states samtools "built against HTSLIB_1.16", it doesn't mean built against the released htslib 1.16 tarball or package install. Instead it means samtools develop built against htslib develop (pre-1.17) with a 1.16 symbol version file.

Hence my confusion as to why the versioning wasn't apparently helping with binary compatibility.

@whitwham
Copy link
Contributor

whitwham commented Feb 8, 2023

This does not seem to work with bcftools. Is that expected behaviour?

@daviesrob
Copy link
Member Author

It should work. What went wrong?

@whitwham
Copy link
Contributor

whitwham commented Feb 8, 2023

I tried a 117 version of bcftools and it worked quite happily on the 116 version of htslib instead of throwing an error.

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 8, 2023

I think that's expected to happen, as the 1.16 version didn't have any symbol versioning.

To verify this is working as intended what we ought to do is create a local version of 1.15, 1.16, 1.17 built with their appropriate symbol table files. This gives us something to experiment with to verify how things will work in the future. I'm not proposing we go back and rerelease things, but just to use this as a test bed to verify it does what we want it to.

Edit: although I wouldn't be suprised if someone like Debian did take the symbol version file and retrofit it to older LTS releases, as once the leg work has been done it's quite easy for them to add it as a patch.

@whitwham
Copy link
Contributor

whitwham commented Feb 8, 2023

I was trying to reproduce the results in the table of the original comment but with bcftools rather than samtools.

@daviesrob
Copy link
Member Author

I think bcftools develop works because it hasn't started to use any of the symbols introduced since 1.16 yet. If I try it built against my HTSLIB_1.17 library, I find it referenced symbols from these versions:

nm -D -g --with-symbol-versions bcftools | grep HTSLIB | sed 's/.*@//' | sort | uniq -c
    139 HTSLIB_1.0
      3 HTSLIB_1.1
      2 HTSLIB_1.10
      1 HTSLIB_1.13
      3 HTSLIB_1.16
      4 HTSLIB_1.2.1
      5 HTSLIB_1.3
     21 HTSLIB_1.4
      1 HTSLIB_1.5
      3 HTSLIB_1.6
      1 HTSLIB_1.7

So no HTSLIB_1.17 in there. This isn't too surprising as most of the new symbols in HTSlib and bam and cram-related. The ones bcftools might be most interested in are bcf_strerror(), and the faidx interfaces fai_adjust_region(), faidx_seq_len64(), and fai_line_length(). None of them have made it into its code-base yet.

@whitwham
Copy link
Contributor

whitwham commented Feb 8, 2023

Thanks, that would explain it.

@whitwham whitwham merged commit f68c84f into samtools:develop Feb 8, 2023
@jengelh
Copy link

jengelh commented Feb 8, 2023

You used -Wl,-version-script, but this may need to be -Wl,--version-script—that's how GNU ld documents it.

@daviesrob daviesrob deleted the version_symbols branch February 8, 2023 18:06
@daviesrob
Copy link
Member Author

Hm, so it does. Although -version-script evidently works.

@jmarshall
Copy link
Member

jmarshall commented Feb 9, 2023

The single-hyphen ‑version-script also appears in my hacky draft over on the original issue.

I don't recall whether this was a typo or transcribed from one of the other projects I was looking at or influenced by ‑shared and ‑soname on the same link command line (which are documented as being spelled with one hyphen, presumably for compatibility with older vendor linkers).

The same GNU ld documentation page also says

For options whose names are multiple letters, either one dash or two can precede the option name; for example, ‘-trace-symbol’ and ‘--trace-symbol’ are equivalent. [etc]

Looking at the source code (see ONE_DASH/TWO_DASHES/EXACTLY_TWO_DASHES in ld/lexsup.c) this has been like this for many years.

Likewise the gold source code documentation says it accepts either.

Are there any other linkers that accept a similar version script option? If so, they might provide a ruling. Otherwise… ¯\_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a more conservative approach for SOVERSION
5 participants