Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: change `dynlib` handling on linux to use standard dynamic linking instead of runtime loading via `dlopen` #58

Open
arnetheduck opened this issue Aug 16, 2018 · 17 comments

Comments

Projects
None yet
8 participants
@arnetheduck
Copy link

commented Aug 16, 2018

In the current scheme, when using dynlib on linux, Nim will generate code that loads the library via dlopen and successive dlsym calls. This is problematic for several reasons:

  • linking on linux should typically be done using the soname - linking via filename appears to work on developer system but will cause issues when binary is used on a machine where the named file is symlinked to a different version: see also http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html - note that typically, when passing a library name to C compiler via -lXXX, the compiler will translate the XXX (linker convenience name) to the soname as part of the linking process - in nim, one has to be extra careful because wrapping libraries without using header files may lead to subtle ABI mismatches.

  • opening library via dlopen means a larger binary that takes longer to load - the generated code is verbose and slow compared to what ld can do using the special linking sections in the executable

  • linking via dlopen means tools like ldd, gdb and others show incomplete information about application dependencies, making dependency analysis harder than it has to be

  • using plain libXXX.so file (without versioning information, as given by soname) requires the plain .so to be installed both when running and when compiling the program - this plain file is a symlink that's only installed by -devel packages that users typically don't keep installed - this makes it harder to both distribute and compile nim programs. it also promotes situations where things work on developer computers but not on end-user systems, leading to unnecessary friction in the release process.

Instead, I'd like to suggest that Nim uses standard linux linking when using dynlib - that will solve all of the above problems: fewer ABI issues, faster startup/load times, smaller binaries, standard tools work and no devel packages are needed.

@awr1

This comment has been minimized.

Copy link

commented Aug 16, 2018

What's your evidence that dlopen()/dlsym() is significantly faster than implicit linking via ld?

@zielmicha

This comment has been minimized.

Copy link

commented Aug 16, 2018

You can already use --dynlibOverrideAll and --passl:"-lXXX".

But the solution with dynlib makes Nim programs more portable - wrapper author can specify whole range of library versions that will work (for example dynlib: "libtcl(|8.5|8.4|8.3).so.(1|0)".}).

@awr1

This comment has been minimized.

Copy link

commented Aug 16, 2018

Wasn't Nim also going to get (optional) weak-linking/lazy-loading features? IMO that would necessitate dlsym() usage

@arnetheduck

This comment has been minimized.

Copy link
Author

commented Aug 16, 2018

What's your evidence that dlopen()/dlsym() is significantly faster than implicit linking via ld?

the other way around - dlopen/dlsym is slower

@awr1

This comment has been minimized.

Copy link

commented Aug 16, 2018

Whoops, lol, mixed up my words. What is the performance difference then between implicit and explicit linking?

@yglukhov

This comment has been minimized.

Copy link
Member

commented Aug 16, 2018

While I agree that system linking is more appropriate than the dlopen glue in most cases, there are rare cases when the glue approach is nice because you don't have to have the DLLs at build time. Also as noted above, the --dynlibOverride approach offers a finer grained control over the linkage rules on per library basis.

If we're really to do anything about the current linkage, I'd suggest a system that would allow a consistent fine grained control.
E.g. Instead of {.dynLib:"foo".} pragma introduce a {.lib: "foo".} (naming doens't matter but should not be biased towards dynamic linking). The "foo" link mode is default, by default. Available modes:

  • system (equivalent to -lfoo, doesn't distinguish between static or dynamic)
  • static
  • dynamic - on windows this is equivalent to dynamic_glue
  • dynamic_glue
  • default = system on unix, dynamic on windows
  • custom??? Crazy idea but why not. Provide your own glue or smth like that.

So for every lib, you could control the way of linking with it. e.g. --libMode:foo=static --libPath:foo=~/.nimble/pkgs/foo/libfoo.a or smth like that.

@arnetheduck

This comment has been minimized.

Copy link
Author

commented Aug 16, 2018

@zielmicha Yeah, I've seen this feature, and far as I can tell, it looks like a real fringe case for when:

  • you're wrapping a (small) stable subset of a library
  • upstream (the ones controlling the soname / ABI versioning) are frivolously updating the version number when it's not need
    are there other uses? else, it seems that this should not be the focus or default mode of linking, but rather an extra that can explicitly be enabled for when whoever is doing the wrapper has made sure that there is indeed no impactful ABI difference between versions.

@yglukhov good idea! in the source file, you specify a name and by default, the compiler does "the right thing", but there's an opportunity to change that behavior on a per-library basis (by using the name as a "key", or possibly by changing the default ("I want a fully static binary, when possible"). this also gives enough room for the slower glue mode that @zielmicha points out. very flexible and nice, but with a sane default.

@cheatfate

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

Could not agree with this change, because it breaks cross-platform behavior. FreeBSD for example keeps so symlinks for latest libraries without any devel packages.

Also this statements are not totally correct.

  • opening library via dlopen means a larger binary that takes longer to load - the generated code is verbose and slow compared to what ld can do using the special linking sections in the executable.

Linux loader still needs to perform mapping of shared objects, resolving dependencies and fix import tables, so loading speed will be if not almost equal, but with very low difference. And generated binary code will be also not significantly bigger. You forgot about section alignment, so if your import table will have only 2 functions, table size will still be equal to page size (4096 bytes for example). So actually usage of dlopen() makes binary size in most of the cases less, equal or a little bit more in some situations.

  • linking via dlopen means tools like ldd, gdb and others show incomplete information about application dependencies, making dependency analysis harder than it has to be.

Can't say anything about ldd, but when using gdb and performing something like break NimMainModule, at the point of this breakpoint you will have full information about loaded libraries.

Also I want to remind last openssl story when most of Linux distros switched from openssl version 0.9.x to 1.1.x. Linux distros just replaced their openssl libraries .so file and most of the software which was compiled to use 0.9.x become broken...

So implicit dynamic linking is not a good solution at all. I understand that sometimes dlopen is also making our life more complex, but its impossible to satisfy every single OS in the world with the best solution it uses, so we need to find compromise, and dlopen was actually such compromise. Of course we can add more options to dynlib, and maybe this is one more compromise.

Also with explicit linking your application will lose flexibility, because with dlopen it can not fail and exit but resolve dependencies in other application-specific way.

@awr1

This comment has been minimized.

Copy link

commented Aug 17, 2018

@cheatfate isn't using dlopen()/dlsym() etc. considered explicit and using ld considered implicit? I guess if you think about it, in Nim's case, dlopen() is used an implicit manner as all the linking stuff happens at the start of the program before any user code runs.

In any case, thanks for clarifying my question wrt performance.

@cheatfate

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

@awr1 you are right, i always confuse this implicit / explicit terms.

@cheatfate

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

Also you can read here https://nullprogram.com/blog/2018/05/27/. The most interesting parts of this article are Procedure Linkage Tables and Indirect Dynamic Calls.

@zah

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

With classic C-style linking, the linked library name/path often depends on the build settings (i.e. release vs debug, MSVC vs GCC, etc). For this reason, it's best if the specific paths are managed in the build environment (in nim.cfg files, in build scripts that call Nim, etc). dynlibOverride seems to provide sufficient control to implement any behavior in cfg files (@yglukhov, would you agree that all your cases can be expressed there?), so the two questions are:

  1. Should we keep the unsual Nim behaviour as the default? (Araq's answer would probably be that it simplifies things for people who don't want to deal with the complexities of C-style linking).

  2. Can we handle all the complexity in the Nim wrappers? This would mean specifying correct library names across OSes/compilers/debug vs release settings and so on. I don't think this can be reasonably solved, but perhaps a system where default names are provided is possible.

@yglukhov

This comment has been minimized.

Copy link
Member

commented Aug 17, 2018

@yglukhov, would you agree that all your cases can be expressed there?

@zah, yes, dynlibOverride is ingeniously dead simple, and gets its job done just fine. Might be not the fanciest way of doing this kind of things, but not a big deal for me.

@runkharr

This comment has been minimized.

Copy link

commented Nov 29, 2018

Hi!

I have a small problem for you.

Some '.so' files are not really shared libraries, but 'linker scripts'. 'dlopen()' is simply unable to load them. So, what now about 'dynlib'? I did some research on this matter, but i found no solution. Especially the 'ncurses'-module fails because of this.

Either someone develops something an 'ld script interpreter' frontend for 'dlopen()', which would be difficult, because such linker scripts may even contain references to static libraries, or the corresponding modules need be rewritten/modified. (BTW, thanks to 'zielmicha' for the hint with the '--dynlibOverrideAll --passl'-options. This way anything worked as expected.

The problem is (btw) not linux-specific, because 'linker scripts' are often used in the UNIX world. And you should see if Windows doesn't use a similar mechanism ...

@Araq

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

  1. Windows does not use a similar mechanism.
  2. dlopen is widely used by Python, Ruby, Perl, ... what you're really telling me here is that your OS is fundamentally broken. That might be news for you, but it isn't for me.
@runkharr

This comment has been minimized.

Copy link

commented Nov 30, 2018

No, my OS is not "fundamentally broken". And yes, Python, Ruby, Perl, ... use dynamic loading. But there is a difference: They do not use the developer files for dynamic loading! 'libncursesw.so' (e.g.) is a developer file which is to be used solely by the (static) linker 'ld'. The right way would be if 'libncursesw.so.number' would be used. You simply cannot assume that the developer file contains the correct data (for 'dlopen'). And i think that the number of developer '.so'-files which are in fact linker scripts will increase over the time. You simply have to use the correct files (links) or you must think about another mechanism for dynamic loading. And - especially in the 'ncurses.nim'-case - you have the problem concerning the dependency from 'libtinfo.so'. Instead, a wrapper should be constructed, which is (statically) linked to 'ncurses' and the 'tinfo' library. This wrapper itself may be a dynamically loadable module.

Dynamic loading just is not always the right way. Sometimes you must use a wrapper or something like that (which is statically linked to the required libraries), or even link the module itself at compile-time.

And, i don't think that you will tell me that Nim (which is marvelous, by the way) will fail because it doesn't work together with some slightly more modern libraries.

P.S.: I don't know much about Ruby or Python, but Perl curses module uses a binarydynamic linking, but it has a "stub" (or wrapper) with the name 'Curses.so', which itself is linked correctly to 'libncursesw.so.6' (in my case) and 'libtinfo.so.6'.

@runkharr

This comment has been minimized.

Copy link

commented Nov 30, 2018

Sorry, i din't check the last sentence. What i meant was that Perl's 'curses' module uses dynamic linking, but this linking is done against a wrapper with the name 'Curses.so', which itself is statically linked to 'libncursesw.so.6' and 'libtinfo.so.6'. This is - by the way - enforced by the linker script 'libncursesw.so'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.