Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please avoid bundling libraries #742

Open
teythoon opened this issue Feb 14, 2016 · 14 comments
Open

Please avoid bundling libraries #742

teythoon opened this issue Feb 14, 2016 · 14 comments

Comments

@teythoon
Copy link
Contributor

genometools bundles quite a number of libraries:

teythoon@europa /tmp/genometools-1.5.8 % ls -1 src/external
bzip2-1.0.6/
cgilua-5.1.3/
expat-2.0.1/
lpeg-0.10.2/
lua-5.1.5/
luafilesystem-1.5.0/
md5-1.1.2/
samtools-0.1.18/
sqlite-3.8.7.1/
tre/
zlib-1.2.8/

This is generally frowned upon. See the Debian policy [0] or the Fedora equivalent [1]. The latter gives quite a number of reasons why this is considered bad practice. In short, it creates more work for both the genometools maintainers and all packagers.

0: https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles
1: https://fedoraproject.org/wiki/Archive:Bundled_Library_Packaging_Draft#Why_no_Bundled_Libraries

@gordon
Copy link
Member

gordon commented Feb 14, 2016

I appreciate the argument from Debian and Fedora. However, our bundling strategy has served us very well in the past. Because we bundle everything except libcairo, users usually can compile GenomeTools out of the box with make cairo=no. Many users run on older systems which do not have the libraries installed or where the libraries are outdated, which would lead to problems. On university systems the users often do not have root access to install the libraries either.

If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes. That's how our Debian packages are build to comply with the Debian policy (and I agree that it makes total sense in that case).

Bundling the necessary libraries also makes it possible for us to cross-compile GenomeTools for many platforms, because we simply cross-compile all dependencies with it.

So I don't see us dropping bundled libraries anytime soon. Sorry.

@satta
Copy link
Member

satta commented Feb 14, 2016

I completely agree with @gordon. This is a compromise we are making for our users' convenience. IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention. Please also notice that we usually do not introduce patches to the bundled code or require GenomeTools to be built using the code copies.

However, now that you mention it I (as the maintainer of the GenomeTools Debian package) will make sure that the embedded code copies disappear from Debian's source packages. The embedded source files are all DFSG free, so it's merely a question of cleanliness I guess. I will exclude src/external from the next upstream version onward.

Thanks for having such a watchful eye on the code though, much appreciated :)

@teythoon
Copy link
Contributor Author

On university systems the users often do not have root access to install the libraries either.

That's a red herring. You don't need root to do that. If you did, you couldn't install genometools either.

If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes.

Interesting, however, that option is not documented.

IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention.

It also exposes them to the risk of running outdated versions of said libraries. In effect you shift the burden of making sure the libraries are up to date from distribution maintainers to the genometools maintainers. I feel that users should opt-in to that, not opt-out.

Please also notice that we usually do not introduce patches to the bundled code

A cursory look over the commits touching src/external says otherwise.

@satta
Copy link
Member

satta commented Feb 15, 2016

On university systems the users often do not have root access to install the libraries either.

That's a red herring. You don't need root to do that. If you did, you couldn't install genometools either.

Maybe the notion of absolutely needing root was misleading here. The reality is that many typical GenomeTools users work in environments where they are not their own system administrators. That means, to install new software, there are usually two options:

  • get the institution's sysadmin to install the software using native package management, which might me difficult to motivate e.g. if one just wants to evaluate a new tool, or
  • compile the code from source.

Of course the second option is always available as long as a compiler toolchain is available on the target system, which is a reasonable expectation IMHO.
However, we want to avoid having a user (who might well be struggling with make options to begin with!) to go through the pains of unpackaged software installation:

  • locate and download all library dependencies (and their dependencies!) as source tarballs
  • build all of them, potentially using different build systems, and
  • make sure GenomeTools finds the new libraries' headers and object files, which are in
    unpredictable directories.

From my experience it would be a major obstacle for our user base if we required this kind of build process from source, and the support requests we've had over the years seem to confirm that building from source is usually one of the main sources of problems.

If you want to, you can compile GenomeTools with linking against shared libraries by using useshared=yes.

Interesting, however, that option is not documented.

Indeed it isn't. Thanks for pointing this out, will add a note to the INSTALL documentation.

IMHO allowing people with no access to proper package management to build and run a version at all outweighs the indeed very legitimate disadvantages you mention.

It also exposes them to the risk of running outdated versions of said libraries.

As I said, this is not the case when GenomeTools is part of a distribution. In this case it uses the shared libraries provided by the distribution, which can be updated separately.

In effect you shift the burden of making sure the libraries are up to date from distribution maintainers to the genometools maintainers.

See above. Do not get me wrong -- I understand your concerns and they bother me too. For Debian I went into quite a bit of additional work to allow the use of shared libraries, not only to comply with their policy but also to ensure the viability of GenomeTools in a more organized software environment such as Debian's. But scientific software also exists to serve a specific purpose, and this means one has to consider and address the user's computing reality by balancing ease of packaging versus ease of installation by end users.

I think that documenting that the option of using shared libraries exists is a step in the right direction. The point is that packagers are (at least expected to be) more knowledgeable about how dependencies should be organized, and it's reasonable to shift the burden of getting this right to them than to the end users.

I feel that users should opt-in to that, not opt-out.

Are you proposing to use a separate usebundle=yes option and making useshared=yes the default?

Please also notice that we usually do not introduce patches to the bundled code

A cursory look over the commits touching src/external says otherwise.

That's why I said 'usually', not 'absolutely' ;)
The patches to the lua source mostly deal with making the code compile with our default -Werror on newer compilers, IIRC. What I meant was that we don't add major API adjustments to the source, making it incompatible with unmodified upstream (and I've seen that case multiple times when packaging third party software). Replacing them with unmodified counterparts is still possible.

satta added a commit to satta/genometools that referenced this issue Feb 15, 2016
@teythoon
Copy link
Contributor Author

Are you proposing to use a separate usebundle=yes option and making useshared=yes the default?

Yes.

The patches to the lua source mostly deal with making the code compile with our default -Werror on newer compilers, IIRC.

So again this is creating more work for you that could have been better spent developing genometools or fixing the issues in the libraries upstream.

@Garonenur
Copy link
Member

well, if anyone has the time and energy to prepare something like docker packages with genometools we could get rid of the external libraries. But as @satta said: we not only have to think about the work for us, but the user base.
And I know how much I hate it to get dependencies right for some software I want to use but can't install directly because of no root - many of our users are not as experienced.
Then again, those should maybe use just the binaries (possibly inside docker or something so it works everywere).

@teythoon
Copy link
Contributor Author

@satta: The Debian package uses the bundled libraries header files, I'm working on a patch to address this.

@satta
Copy link
Member

satta commented Feb 15, 2016

Excellent, thanks for your work on this :) Looking forward to your patch.

@teythoon
Copy link
Contributor Author

While working on that I realized that you don't only embed the libraries, but also build them using your own Makefile and just link them together. I'm somewhat surprised that this approach works on that scale.

@satta
Copy link
Member

satta commented Feb 20, 2016

Yep. That approach started out when substantially less external code was used and indeed grew over time.

@gordon
Copy link
Member

gordon commented Feb 22, 2016

It proves the capabilities of our powerful build system 😄

As argued in #748 (comment) this allows to build statically linked binaries quite easily.

The classic paper Recursive Make Considered Harmful argues why it it is better to define all dependencies in a single makefile.

@teythoon
Copy link
Contributor Author

Err, no. Any decent build system allows the creation of statically linked binaries, no need to either bundle libraries nor, if you must do that, side-step the build system of said libraries.

That paper speaks about dependencies within a project. Also, it is about speed, and "classic" is a euphemism for "18 years old".

@mmokrejs
Copy link

mmokrejs commented Apr 12, 2018

Hi, I am occasionally contributing packages into Gentoo Linux. I agree with others that you should not bundle copies of 3rd-party's code but cleanly reluy on that. Would you make the package cleaner it could be added to major distro's easily. Exactly due to the above issues it won't happen so quickly.

We (maintainers) need to tag packages with a LICENSE string and we are not going to pollute it with licenses of pango, samtools, sqlite, ... At least in my case they were linked in dynamically but still, our system-wide copies were well tested and often were carefully patched. We just cannot accept you code uses a local bundled copy.

I hit this thread because the *.so library does not have a version number. That blocks installation of this pakcage on Gentoo at the moment completely.

>>> Source compiled.
>>> Test phase [not enabled]: sci-biology/genometools-1.5.10

>>> Install genometools-1.5.10 into /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/ category sci-biology
make -j2 DESTDIR=/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/ install 
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin
cp bin/gt /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin
strip /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/gt
mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/share/genometools
cp -a gtdata /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/share/genometools/
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core \
  || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core
cp src/core/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/core
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended \
          || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended
cp src/extended/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/extended
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch \
          || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch
cp src/annotationsketch/*_api.h \
          /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/annotationsketch
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr \
          || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr
cp src/ltr/*_api.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools/ltr
cp obj/gt_config.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools
cp src/genometools.h /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include/genometools
test -d /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib || mkdir -p /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
cp lib/libgenometools.a /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
cp lib/libgenometools.so /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib
[build config script install]
sed -e 's!@CC@!cc!' -e 's!@CFLAGS@!-O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize!' \
  -e 's!@CPPFLAGS@!-I\\"/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/include\\"  -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DHAVE_MEMMOVE -D_LARGEFILE64_SOURCE=1 -DHAVE_HIDDEN -DLUA_DL_DLOPEN -DLUA_USE_MKSTEMP -DSQLITE_THREADSAFE=0 -DHAVE_SQLITE!' \
  -e 's!@CXX@!c++!' -e 's!@CXXFLAGS@!-O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize!' \
  -e 's!@LDFLAGS@!-L/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/lib -Wl,-O1 -Wl,--as-needed -L/usr/local/lib!' \
  -e 's!@LIBS@! -lm  -ldl -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lcairo -lpangocairo-1.0 -lpango-1.0 -lgobject-2.0 -lglib-2.0 -lcairo -lpthread -ldl!' -e "s!@VERSION@!`cat VERSION`!" \
  -e 's!@BUILDSTAMP@!"2018-04-12 09:34:33"!' \
  -e 's!@SYSTEM@!Linux!' <src/genometools-config.in \
  >/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/genometools-config
chmod 755 /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image//usr/bin/genometools-config
>>> Completed installing genometools-1.5.10 into /scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image/

 * Final size of build directory: 252792 KiB (246.8 MiB)
 * Final size of installed tree:   75092 KiB ( 73.3 MiB)


 * QA Notice: The following shared libraries lack a SONAME
 * /usr/lib/libgenometools.so

Files matching a file type that is not allowed:
   usr/lib/libgenometools.so
 * ERROR: sci-biology/genometools-1.5.10::science failed:
 *   multilib-strict check failed!
 * 
 * Call stack:
 *   misc-functions.sh, line 603:  Called install_qa_check
 *   misc-functions.sh, line 217:  Called source 'install_symlink_html_docs'
 *   80multilib-strict, line  46:  Called multilib_strict_check
 *   80multilib-strict, line  42:  Called die
 * The specific snippet of code:
 *   		[[ ${abort} == yes ]] && die "multilib-strict check failed!"
 * 
 * If you need support, post the output of `emerge --info '=sci-biology/genometools-1.5.10::science'`,
 * the complete build log and the output of `emerge -pqv '=sci-biology/genometools-1.5.10::science'`.
 * The complete build log is located at '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/build.log'.
 * The ebuild environment file is located at '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/environment'.
 * Working directory: '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/image'
 * S: '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/work/genometools-1.5.10'
!!! post install failed; exiting.

>>> Failed to emerge sci-biology/genometools-1.5.10, Log file:

>>>  '/scratch/var/tmp/portage/sci-biology/genometools-1.5.10/temp/build.log'

gentoo-bot pushed a commit to gentoo/sci that referenced this issue Apr 12, 2018
The build packaging system is suboptimal. Hand-edit the Makefile
to force use of shared libs from system-wide. This brings us
to a non-functional package because two lua-based packages are
not yet in the tree.

genometools/genometools#742
genometools/genometools#748

Package-Manager: Portage-2.3.27, Repoman-2.3.9
@gordon
Copy link
Member

gordon commented Apr 23, 2018

Getting rid of src/external is not an option, that would complicate the build process for normal users. However, Debian has a GenomeTools package and they have the same requirements. @satta did a lot of work to make this work. @mmokrejs If you send a PR that helps with building GenomeTools as a Gentoo package without removing src/external or breaking the Debian package I'm happy to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants