Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELF section .note.ABI-tag breaks shared libraries #3023

Closed
sthalik opened this issue Mar 14, 2018 · 44 comments
Closed

ELF section .note.ABI-tag breaks shared libraries #3023

sthalik opened this issue Mar 14, 2018 · 44 comments
Assignees

Comments

@sthalik
Copy link

sthalik commented Mar 14, 2018

Your Windows build number: (Type ver at a Windows Command Prompt)

10.0.16299.192
Linux ananke 4.4.0-43-Microsoft #1-Microsoft Wed Dec 31 14:42:53 PST 2014 x86_64 GNU/Linux
Debian sid

What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)

Executing lupdate from Qt5 tools, LD_DEBUG=all ldd /usr/lib/libQt5Xml.so, etc.

Doesn't treat libQt5Core.so.5 as a suitable library, as per

lupdate: error while loading shared libraries: libQt5Core.so.5: cannot open shared object file: No such file or directory

What's wrong / what should be happening instead:

As long as ELF section called .note.ABI-tag exists, the library can't be linked to by other shared objects. It can be executed directly by the ld-linux linker however.

The workaround is to strip --remove-section=.note.ABI-tag /usr/lib/libQt5Core.so.5.10.1.

See how the section's presence influences file(1) output:

/usr/lib/libQt5Core.so.5.10.1: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=147dcf3333ff6b490dabb4028a217effa08013e3, for GNU/Linux 3.17.0, stripped

Finally, without the section:

/usr/lib/libQt5Core.so.5.10.1: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=147dcf3333ff6b490dabb4028a217effa08013e3, stripped

Strace of the failing command, if applicable[...]

openat of the library, read, lseek, close, barf out error message on the standard error file descriptor and exit_group.

Similarly LD_DEBUG=all shows opening the ELF without further information other than an error message.

      3954:     file=libQt5Core.so.5 [0];  needed by lupdate [0]
      3954:     find library=libQt5Core.so.5 [0]; searching
      3954:      search path=/usr/lib           (system search path)
      3954:       trying file=/usr/lib/libQt5Core.so.5
      3954:      search cache=/etc/ld.so.cache
      3954:      search path=/usr/lib           (system search path)
      3954:       trying file=/usr/lib/libQt5Core.so.5
      3954:
lupdate: error while loading shared libraries: libQt5Core.so.5: cannot open shared object file: No such file or directory

After removing the tag, the runtime linking process proceeds normally.

@sthalik
Copy link
Author

sthalik commented Mar 14, 2018

This "atomic bomb" workaround helps too:

find /lib /usr/lib /usr/libexec -name '*.so' | xargs strip --remove-section=.note.ABI-tag

Will barf out a few errors where libraries are actually linker scripts. No ill effects.

@jkj
Copy link

jkj commented Apr 17, 2018

Just came across this in a Google search. Not sure if it is relevant or not but just FYI glibc creates an ABI-invalid .note section for that exact section. I have just filed a bug against glibc about that.
https://sourceware.org/bugzilla/show_bug.cgi?id=23072
I don't know exactly what lupdate is, but if it is a strictly conforming gABI ELF processor it will get wrong values from that section. Just in case that helps.

@sthalik
Copy link
Author

sthalik commented May 17, 2018

Works for me on Linux in a VM. The ELF standard notwithstanding, having the runtime linker not work due to a quirk in WSL is an issue.

Interestingly enough, the newer shared objects added the option to call them directly, having them say more or less "I'm a library". Calling that .so directly works, it's only linking to it that's broken.

@fweimer
Copy link

fweimer commented May 19, 2018

What, exactly, is causing this? The ELF note in a shared object is only parsed by the glibc dynamic linker during program loading, not the kernel, so it is unclear to me why there would be a difference between Linux and Windows in this context.

Perhaps the auxiliary vector contains some data which tells glibc not to load the shared object with this kind of ABI note?

@therealkenc
Copy link
Collaborator

therealkenc commented May 19, 2018

What, exactly, is causing this? The ELF note in a shared object is only parsed by the glibc dynamic linker during program loading, not the kernel

I've been also curious for my own edification: What, exactly, is special about the libQt5Core.so build process that generates the (let's call it) unusual binary? I understand the bug cited by jkj. I am just not clear on why libQt5Core trips it in this scenario. How would one go about creating a helloworld.so library that exhibits the fail? And why does libQt5Core generate the binary that way? I could do a deep dive but I suspect someone in this thread (or Ben) already takes the reason as obvious.

@fweimer
Copy link

fweimer commented May 19, 2018

Yes, it would certainly be interesting to see the output of objdump -s -j .note.ABI-tag /usr/lib/libQt5Core.so.5.10.1. So far, we only know that the object requires kernel version 3.17.

If the Windows kernel self-identifies itself as something earlier than kernel version 3.17, then the dynamic linker is completely right to ignore the object.

@therealkenc
Copy link
Collaborator

If the Windows kernel self-identifies itself as something earlier than kernel version 3.17

WSL self identifies as 4.4. But I'd be real real surprised if the linker looks. [I have been surprised before though.]

@fweimer
Copy link

fweimer commented May 19, 2018

@therealkenc Actually, the glibc dynamic linker looks at the ABI notes and refuses to load incompatible DSOs. To determine the kernel version, it prefers the version reported in the vDSO, followed by uname, followed by the contents of /proc/sys/kernel/osrelease. I wonder if the version in the vDSO isn't 4.4.

@sthalik
Copy link
Author

sthalik commented May 19, 2018

I can't remember the Linux distribution that had the binary. It was either Debian sid or Arch testing. Sorry guys. You can list sections for your own binaries though.

@therealkenc
Copy link
Collaborator

therealkenc commented May 19, 2018

I can't remember the Linux distribution that had the binary.

Ah. That was kinda relevant to mention in the the OP (grumble). This is possibly related to #2820, which was on Arch. In that issue libQt5Core.so.5 exists but could not be opened. The discussion there subsequently went downhill to what was almost certainly a rename2() blind alley, and was ultimately closed for lack of a repro. Something this issue is in need of as well; and is likely a contributing factor to the chirping crickets.

@sthalik
Copy link
Author

sthalik commented May 28, 2018

@noahbaxter it's somewhere in site-packages.

@therealkenc the issue's reproducible on Debian sid. You can install it into a chroot via debootstrap into the default Ubuntu environment, or some other distro on MSFT Store. When debootstrap doesn't work, download cdebootstrap-static and manually unpack the .deb archive. That one works even on Android.

No idea what causes it to begin with, binutils, gcc, GNU libc or something else...?

@Yamakuzure
Copy link

Reproducible on Gentoo with Qt-5.11.1, Glibc-2.27 and gcc-8.1.0

@sthalik : Thanks for your "atomic bomb", this one saved my day! 👍

kdudka added a commit to csutils/csmock that referenced this issue Jun 27, 2018
... by stripping an incompatible ABI note as suggested here:

microsoft/WSL#3023
@ChrisTX
Copy link

ChrisTX commented Aug 8, 2018

Ah. That was kinda relevant to mention in the the OP (grumble). This is possibly related to #2820, which was on Arch.

I can confirm that stripping the tag resolves the problem with Arch. Afterwards Qt5 works fine.

@FrankHB
Copy link

FrankHB commented Sep 27, 2018

This "atomic bomb" workaround helps too:

find /lib /usr/lib /usr/libexec -name '*.so' | xargs strip --remove-section=.note.ABI-tag

Will barf out a few errors where libraries are actually linker scripts. No ill effects.

This would break programs relying on stripped information in the binary files. For example, modified /usr/lib/ld-2.28.so (linked as /usr/lib/ld-linux-x86-64.so.2) would not work with valgrind on my Arch Linux instance. This would be fixed by reinstalling the corresponding package which provides the unmodified binary files, which would also void the effect of strip command.

@du291
Copy link

du291 commented Dec 7, 2018

Hello, I have a problem that Qt5 based applications won't run on WSL. After googling I was drawn here.
I haven't stripped yet the ABI tag from the lib, and hope to create more insight into the problem.

Yes, it would certainly be interesting to see the output of objdump -s -j .note.ABI-tag /usr/lib/libQt5Core.so.5.10.1. So far, we only know that the object requires kernel version 3.17.

Here's the dump

$ objdump -s -j .note.ABI-tag /lib/libQt5Core.so.5|less

/lib/libQt5Core.so.5: file format elf64-x86-64

Contents of section .note.ABI-tag:
4ef138 04000000 10000000 01000000 474e5500 ............GNU.
4ef148 00000000 03000000 11000000 00000000 ................

Feel free to ask for more info.

@ChrisTX
Copy link

ChrisTX commented Jan 8, 2019

For ArchLinux users, it's possible to utilize pacman hooks to automatize the stripping of the libQt5Core.so during the installation or upgrade of Qt. Simply create a .hook file in /etc/pacman.d/hooks with the following contents:

[Trigger]
Operation = Install
Operation = Upgrade
Type = Package
Target = qt5-base

[Action]
Depends = binutils
When = PostTransaction
Exec = /usr/bin/strip --remove-section=.note.ABI-tag /usr/lib/libQt5Core.so

Upon every Qt5 upgrade after that, the offending section will be stripped automatically. If qt5-base was already installed, simply force a reinstallation.

@du291
Copy link

du291 commented Jan 8, 2019

Thanks for the tip. Does the stripping result in any instability? If the ABI versions don't match I would imagine Qt issuing an unsupported syscall sooner or later?

@ChrisTX
Copy link

ChrisTX commented Jan 8, 2019

No, the field is to be read as follows, see here and here: The minimum kernel version for the binary is required to be the Linux kernel version given by the last 3 blocks, i.e. in your case 0x03, 0x11, 0x00, so 3.17.0. For latest Arch binaries it's 0x04, 0x0b, 0x00, so 4.11.0. Given that WSL identifies as Linux kernel 4.4.0, the binary is incompatible.

It's worth noting that the ABI tag is specified and required for LSB binaries, and that this is not an issue with Qt5 at all. I can't say why Qt5 uses the field, but LSB requires support for it, and the low kernel version emulated by Microsoft is the issue here. This is by design enforced by the dynamic linker and thus the installed glibc version of the user space, and the respective Linux distro. It would be nice if Microsoft worked towards finding a better version scheme or updated the version identifier at least somewhat regularly in upgrades to WSL.

FWIW, stripping the tag only disables the check by the dynamic linker for the kernel version to be at least as high and has no other effects. So if a given binary works after the check was removed, it won't cause any particular issues or changes in behavior. For WSL the kernel version isn't particularly meaningful anyways.

@ChrisTX
Copy link

ChrisTX commented Jan 8, 2019

Okay, I've debugged this a bit, there's actually another issue at hand here. Since I've got this on Arch with a requirement of 4.11.0 I thought it was just the fact that the kernel version is too low. However that's not quite the case, and that's why people with the older 3.17.0 version see this too.

This needs a bit of explanation how glibc works, so bear with me. The version check we ultimately run into can be found in the glibc source file /sysdeps/unix/sysv/linux/dl-sysdep.c in the glibc sources. The responsible function there is _dl_discover_osversion. If the output of that function is too low in comparison to what .note.ABI-tag requires, glibc refuses to load the binary at all, see /elf/dl-load.c for the respective code segment.

Now, _dl_discover_osversion tries discovering the kernel version by the following three methods:

  1. Check the loaded kernel vDSO for a .note section that identifies the kernel.
  2. Use uname(2)
  3. Use /proc/sys/kernel/osrelease

The problem for us is going to be 1), the other two identify as kernel 4.4.0. To see what 1 gives, we need to dump the kernel vDSO ourselves. The easy way to do this is by opening gdb on any arbitrary process, using info proc map to see the mapping of [vdso] and then dump it using the dump command. For example, it looked like this for me:

(gdb) info proc map
[…]
      0x7ffffffef000     0x7fffffff0000     0x1000        0x0 [vdso]
(gdb) dump binary memory vdso.dump 0x7ffffffef000 0x7fffffff0000

Now with vdso.dump on disk, we can analyze the section glibc parses, namely:

$ objdump -s -j .note vdso.dump

vdso.dump:     file format elf64-x86-64

Contents of section .note:
 ffffffffff700318 06000000 04000000 00000000 4c696e75  ............Linu
 ffffffffff700328 78000000 0b0d0300                    x.......

The relevant part is the last 4 bytes, it is to be read as kernel version 0x03.0x0d.0x0b, i.e. kernel 3.14.11. This is not correct however, since WSL wants to identify as kernel 4.4.0, and this has to match uname(2). This precisely is why people have been seeing the error with requirements of kernel 3.17.0.

@therealkenc can you forward this to whoever is responsible for the vDSO?

@therealkenc
Copy link
Collaborator

therealkenc commented Jan 8, 2019

Very nice investigation. 🏆

@therealkenc can you forward this to whoever is responsible for the vDSO?

I don't really know whose around anymore. @benhillis maybe.

What might help the chances of this one being looked at immeasurably would be specific repro steps to create a shared library with one int hello() function that results in the error while loading shared libraries when the library is linked with a one-liner main() that calls return hello().
Doing that has the nice properties of: (a) avoids saying 'libQt' which is technically out of scope (b) avoids saying 'Arch', which isn't in the Store. Which is at base why/how this one went dark.

Same was suggested in May 2018. It might be overkill, your analysis might be self-evident, and this might be a matter of changing a {0x03,0x0d,0x0b} constant in the vDSO. But that's a lot of mights, and a new well-formed issue is how I'd try to get attention.

@ChrisTX
Copy link

ChrisTX commented Jan 8, 2019

Right, I'll see if I can generate a simple example. It's easily reproduced by using a similar trick to what Qt5 itself does. They explicitly add an assembly file with the section information and have a header that differentiates between the available kernel features it was configured with. Qt5 can use renameat(2), getentropy(2) and statx(2), which were introduced in 3.16, 3.17, and 4.11 accordingly. It should be noted that WSL (as of 1809 at least) supports all three syscalls despite identifying as 4.4. I suspect the original reason people blamed renameat(2) is related to this, too - if you configured Qt5 with it, it requires kernel 3.16, but to glibc it appears as if 3.14.11 was in use, making the binary invalid.

Aside from the mismatch, which can be easily reproduced and argued why it's an issue - LSB also demands that LSB compliant binaries require the proper kernel version, and if renameat(2) is used, it should be 3.16 - the second problem at hand here is that the kernel version of 4.4 is too low. It makes little sense to retain an older kernel identifier other than for kernel modules, which WSL obviously does not have.

That aside, I don't understand why Qt5 would be entirely out of scope. Qt is not a GUI library in itself, and there's a lot of features in the library that have nothing to do with GUI or hardware abstractions, like XML parsers, an SQL abstraction layer, networking, concurrency extensions, and these are also common place in the C++ world, even on servers. The problem also has nothing to do with the combination of Qt and certain distros, but rather depends on what syscalls are exposed in the glibc of the system the binary is being built on. If you were to compile Qt5 yourself on Ubuntu, you'd still run into this problem, albeit not in the statx(2) case, since it took until glibc 2.28 for that to be implemented in the C library. On the upcoming Ubuntu 19.04, this is going to be the case, though.

@fweimer
Copy link

fweimer commented Jan 8, 2019

Qt has fallback code for other kernels. They could do run-time detection and drop the ABI note. The Qt in Red Hat Enterprise Linux 8 has been patched to do just this, to preserve compatibility with the Red Hat Enterprise Linux 7 kernel.
Of course there is still a WSL bug here, the version really should be consistent. (Although calling the kernel 4.4 is a bit of a stretch, considering that various system calls are not implemented properly.)

@therealkenc
Copy link
Collaborator

Qt is not a GUI library in itself, and there's a lot of features in the library that have nothing to do with GUI or hardware abstractions

Preaching to the choir, brother. Personally I don't think any of these libraries have thing-one to do with 'graphics'. Some of the functions end up reading and writing bytes down a socket, maybe, sometimes, depending on the function.

But, sadly, this line of reasoning (most unfortunately) falls down, because of issue submissions along the lines of (say, made-up example) "VLC doesn't run I get <some libQt* function s---ts bed>". Which is, in effect, useless for tracking down syscall surface bugs. By taking libqt or say libgtk+ off the table, and limiting the scope to CLI (ie tty) scenarios, there is at least a snowballs chance of getting an actionable repro with a manageable strace demonstrating a diverge. The problem has nothing to do with pixels. The distinction doesn't really have to do with ttys either, which is why "server scenarios" are out of scope too. Made-up example would be "My Apache PHP service isn't working, I get error <whatever>". Today's necropost on some mysql EIO problem comes to mind.

You can game the system of course, and that's exactly what folks should do. Any large graphic or server scenario can be turned into a small CLI one. Chrome (about as big as it gets) used to die because getdents64() was subtly broken. People can post "Chrome is broken", and then sit back and enjoy the soothing sound of the chirping crickets. Or folks can post a CLI repro like this and have a fighting chance.

@sthalik
Copy link
Author

sthalik commented Jan 9, 2019

They explicitly add an assembly file with the section information and have a header that differentiates between the available kernel features it was configured with.

This being an universal issue in itself is good in a way. At some point it's bound to become a problem serious enough for WSL's vendor to fix it. Especially given Debian's and Ubuntu's preference for pulling in all possible compile-time dependencies.

Does the stripping result in any instability?

Not in practice, either. I'm running some Qt project with -platform offscreen at the very least.

@therealkenc
Copy link
Collaborator

Especially given Debian's and Ubuntu's preference for pulling in all possible compile-time dependencies.

Part of the reason this one went sideways though is Qt apps, even stuff like VLC, run alright on Ubuntu 18.04 from the Store. Then it gets revealed after the fact it wasn't Ubuntu from the Store and the coffin nails came out. I can't tell you why Qt5 is okay on Ubuntu, because I never looked. But it doesn't require any stripping of .note tricks on WSL, even if the WSL vDSO is getting kernel versions crossed.

$ cat /proc/version
Linux version 4.4.0-18309-Microsoft (Microsoft@Microsoft.com) (gcc version 5.4.0 (GCC) ) #1000-Microsoft Thu Dec 20 12:56:00 PST 2018
$ lsb_release -d
Description:    Ubuntu 18.04.1 LTS
$ sudo apt-get install qt5-default
[...blah blah kitchen sink]
$ apt-cache show qt5-default | grep Version
Version: 5.9.5+dfsg-0ubuntu1
$ /usr/lib/qt5/bin/qtpaths --qt-version   # look ma no error while loading shared libs
5.9.5
$ ldd /usr/lib/qt5/bin/qtpaths | grep libQt
libQt5Core.so.5 => /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 (0x00007f3683d20000)

I suspect (or would like to hope) this issue would have gotten more grease if qtpaths or qmake or whatever (legit CLI scenarios if there is such a thing) faceplanted on a supported Debian or Ubuntu distro. But that's not how it went down.

@sthalik
Copy link
Author

sthalik commented Jan 9, 2019

I suspect (or would like to hope) this issue would have gotten more grease if qtpaths or qmake or whatever (legit CLI scenarios if there is such a thing) faceplanted on a supported Debian or Ubuntu distro.

So you're saying that some recent Debian version doesn't exhibit this behavior? My sid/experimental with Qt 5.11 has this.

it doesn't require any stripping of .note tricks on WSL, even if the WSL vDSO is getting kernel versions crossed.

Qt has some dubious stuff where Qt5Core can be invokes as an executable directly too. I don't think it should exist in the tree to begin with, but of course WSL ought to support ELF binaries correctly.

@gojimmypi
Copy link

gojimmypi commented May 19, 2020

As detailed in nextpnr #444 post-close comments, I am seeing this error in a fresh WSL1 Ubuntu 20.04 install:

error while loading shared libraries: libQt5Core.so.5: cannot open shared object file: No such file or directory
$ ls /lib/x86_64-linux-gnu/libQt5Core.so.5 -al
lrwxrwxrwx 1 root root 20 Apr  9 00:57 /lib/x86_64-linux-gnu/libQt5Core.so.5 -> libQt5Core.so.5.12.8

$ file /lib/x86_64-linux-gnu/libQt5Core.so.5.12.8
/lib/x86_64-linux-gnu/libQt5Core.so.5.12.8: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e54a6cea97c3942fbcec1973574545fb3ad74015, for GNU/Linux 3.17.0, stripped

$ uname -a
Linux DESKTOP-AHQNIJN 4.4.0-18362-Microsoft #836-Microsoft Mon May 05 16:04:00 PST 2020 x86_64 x86_64 x86_64 GNU/Linux

edit: version info WSL on Microsoft Windows [Version 10.0.18362.836]

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04 LTS
Release:        20.04
Codename:       focal
Python 2.7.18rc1
Python 3.8.2
cmake version 3.16.3

CMake suite maintained and supported by Kitware (kitware.com/cmake).
clang version 10.0.0-4ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
qtf_default Version: 5.12.8+dfsg-0ubuntu1
libboost-all-dev Version: 1.71.0.0ubuntu2
build-essential Version: 12.8ubuntu1
clang Version: 1:10.0-50~exp1
bison Version: 2:3.5.1+dfsg-1
flex Version: 2.6.4-6.2
libreadline-dev Version: 8.0-4
gawk Version: 1:5.0.1+dfsg-1
tcl-dev Version: 8.6.9+1
libffi-dev Version: 3.3-4
git Version: 1:2.25.1-1ubuntu3
mercurial Version: 5.3.1-1ubuntu1
graphviz Version: 2.42.2-3build2
xdot Version: 1.1-2
pkg-config Version: 0.29.1-0ubuntu4
python
python3 Version: 3.8.2-0ubuntu2
libftdi-dev Version: 0.20-4build8
qt5-default Version: 5.12.8+dfsg-0ubuntu1
python3-dev Version: 3.8.2-0ubuntu2
libboost-dev Version: 1.71.0.0ubuntu2
nextpnr-ecp5: error while loading shared libraries: libQt5Core.so.5: cannot open shared object file: No such file or directory
binutils is already the newest version (2.34-6ubuntu1).
binutils set to manually installed.
The following package was automatically installed and is no longer required:
  guile-2.0-libs

@1337bowen
Copy link

I had an error with libQt5Core on WSL and I fixed it by using this workaround: YosysHQ/nextpnr#444 (comment)
I hope WSL can fix this issue.

@mikofski
Copy link

this also happens for octave install in WSL1 Ubuntu focal (20.04LTS) from store on Windows 10.0.18363.1556:

$ sudo apt install octave

$ octave  # doesn't work
octave: error while loading shared libraries: libQt5Core.so.5: cannot open shared object file: No such file or directory

$ sudo strip --remove-section=.note.ABI-tag /usr/lib/x86_64-linux-gnu/libQt5Core.so.5 

$ octave  # works fine

@thisisnotmyrealname
Copy link

Considering running GUI linux apps in win11 is supposed to be a feature, you'd think this 4+ year old bug would have been looked at

@fweimer
Copy link

fweimer commented Jul 10, 2022

glibc 2.36 will remove the kernel version check. From the NEWS file:

  • The Linux kernel version check has been removed along with the
    LD_ASSUME_KERNEL environment variable. The minimum kernel used to built
    glibc is still provided through NT_GNU_ABI_TAG ELF note and also printed
    when libc.so is issued directly.

@afidegnum
Copy link

I'm also facing a similar situation, applying the patch strip --remove-section=.note.ABI-tag at anywhere Qt5Core.so is located didn't help. What else can i do ?

Copy link
Contributor

This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests