Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler.find_library() function considers unusable libraries to be found #10936

Open
Arfrever opened this issue Oct 20, 2022 · 4 comments
Open

Comments

@Arfrever
Copy link

Arfrever commented Oct 20, 2022

compiler.find_library() function considers unusable "libraries" to be found:

1. Linker scripts referring to non-existent files

As seen in scylladb/seastar#533 and systemd/systemd#25069, this is situation happening on Red Hat / Fedora with respect to their /usr/lib/gcc/x86_64-redhat-linux/8/libatomic.so (provided by gcc package) which is a linker script which contains:

INPUT ( /usr/lib64/libatomic.so.1.2.0 )

However libatomic package providing /usr/lib64/libatomic.so.1.2.0 is not necessarily installed.
But Meson's find_library('atomic', required : false) still claims to have found such library.

Generic steps to reproduce:

# echo 'INPUT ( /lib64/libnonexistent )' > /lib64/liba.so

In test project:

$ cat meson.build
project('test', 'c')
cc = meson.get_compiler('c')
a = cc.find_library('a', required: false)
mylib = library('mylib', [], dependencies: [a])
$ meson build
...
Library a found: YES
...
$ meson compile -C build -v
ninja: Entering directory `/tmp/meson-test/build'
[1/1] cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
FAILED: libmylib.so 
cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /lib64/libnonexistent
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

2. Corrupted libraries

# echo $'\x80\x80\x80\x80' > /lib64/liba.so

In test project:

$ meson build
...
Library a found: YES
...
$ meson compile -C build -v
ninja: Entering directory `/tmp/meson-test/build'
[1/1] cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
FAILED: libmylib.so 
cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

3. Inaccessible libraries

# chmod a-r /lib64/liba.so

In test project:

$ meson build
...
Library a found: YES
...
$ meson compile -C build -v
ninja: Entering directory `/tmp/meson-test/build'
[1/1] cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
FAILED: libmylib.so 
cc  -o libmylib.so  -Wl,--as-needed -Wl,--no-undefined -shared -fPIC -Wl,--start-group -Wl,-soname,libmylib.so /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so -Wl,--end-group
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so: Permission denied
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
@eli-schwartz
Copy link
Member

#3 is obviously a fake example. "Doctor, it hurts if I chmod my libraries -r" -> "Then don't chmod any libraries -r".. There's no good reason for this to happen, and if it happened anyway then things should break.

#2 I would be pretty skeptical about as well, if your system has file corruption then maybe it isn't Meson's place to report that but I would also say that it's not Meson's problem if builds fail as a result.

In both of these situations, the user has an inherently buggy system and that is what needs to be fixed, not Meson.

However, #1 is, not only all that is worth discussing, but also IMO all that needs to be discussed.

@eli-schwartz
Copy link
Member

#1 is actually a valid discussion to have, because the library isn't installed with the intended purpose of being usable, then failing because those desirable libraries are unfortunately broken for incomprehensible reasons... the library is installed with the intended purpose of being a proxy redirector that loads something else instead, but the optional thing it loads is not available.

I do find Red Hat's behavior questionable -- IMHO it seems like they should not install this linker script as part of the GCC package, and install it as part of the libatomic package instead. But it's not a broken library, it's obviously intended to produce a situation where the real library cannot be found.

The question is what to do about it. Here's Meson's implementation of "this is a list of possible filenames where we might be able to find a library, check this list and try to find that library":

def _get_file_from_list(env: 'Environment', paths: T.List[Path]) -> Path:
'''
We just check whether the library exists. We can't do a link check
because the library might have unresolved symbols that require other
libraries. On macOS we check if the library matches our target
architecture.
'''
# If not building on macOS for Darwin, do a simple file check
if not env.machines.host.is_darwin() or not env.machines.build.is_darwin():
for p in paths:
if p.is_file():
return p
# Run `lipo` and check if the library supports the arch we want
for p in paths:
if not p.is_file():
continue
archs = mesonlib.darwin_get_object_archs(str(p))
if archs and env.machines.host.cpu_family in archs:
return p
else:
mlog.debug(f'Rejected {p}, supports {archs} but need {env.machines.host.cpu_family}')
return None

I don't really understand this rationale here, but plainly there is some sort of situation whereby people want to cc.find_library('foo', static: false), and this fails to link due to missing symbols, but it should still be found. G-d only knows why. The code comment itself was introduced in this PR: #3894 (comment)

The comment I linked to discusses some past rationale about:

  • some poor soul that discovered gsl only links correctly with certain values of -fuse-ld, and the library is probably broken, but "even so, a past version of Meson detected this library regardless of how broken, so that should be respected; meson should still find it"
  • libboost_python.so cannot be linked to without also manually sideloading libpythonX.Y.so, but that's not a bug at all! Why not? I have no earthly clue

The code comment is documenting some old code, and removing part of that old code. The code itself came from #3833, which makes a distinction between compiler default dirs and custom specified dirs, because only (???) in the latter case could the library be a static library that doesn't work with link_whole:

both because it doesn't have compatible compiler arguments (in my case the static library uses hard floating point so you get lots of [library] uses VFP register arguments, /tmp/tmpjjtvpdqe/output.exe does not), but it also just doesn't have a bunch of symbols that it would need to link because it was never designed to be linked with some arbitrary test file...

In short, I do not understand this mess, and the more I link surf the more confused I get. But it seems wrong. I am tempted to say that things have been wrong for a very long time, and were repeatedly papered over with hacks, but then people started to claim that the hacks were the original point. And some of them weren't very good solutions to begin with (like changing how shared libraries work, because of static libraries).

@Arfrever
Copy link
Author

Arfrever commented Oct 21, 2022

def _get_file_from_list(env: 'Environment', paths: T.List[Path]) -> Path:
'''
We just check whether the library exists. We can't do a link check
because the library might have unresolved symbols that require other
libraries. On macOS we check if the library matches our target
architecture.
'''
# If not building on macOS for Darwin, do a simple file check
if not env.machines.host.is_darwin() or not env.machines.build.is_darwin():
for p in paths:
if p.is_file():
return p
# Run `lipo` and check if the library supports the arch we want
for p in paths:
if not p.is_file():
continue
archs = mesonlib.darwin_get_object_archs(str(p))
if archs and env.machines.host.cpu_family in archs:
return p
else:
mlog.debug(f'Rejected {p}, supports {archs} but need {env.machines.host.cpu_family}')
return None

I don't really understand this rationale here, but plainly there is some sort of situation whereby people want to cc.find_library('foo', static: false), and this fails to link due to missing symbols, but it should still be found. G-d only knows why. The code comment itself was introduced in this PR: #3894 (comment)

The comment I linked to discusses some past rationale about:

* some poor soul that discovered gsl only links correctly with certain values of -fuse-ld, and the library is probably broken, but "even so, a past version of Meson detected this library regardless of how broken, so that should be respected; meson should still find it"

* libboost_python.so cannot be linked to without also manually sideloading libpythonX.Y.so, but that's not a bug at all! Why not? I have no earthly clue

"We can't do a link check because the library might have unresolved symbols that require other libraries." part of code comment seems to be untrue (with both GNU linkers and LLD) for operation creating libraries (not executables), except when using --no-allow-shlib-undefined flag (supported by both GNU linkers and LLD):

$ gcc -shared -fPIC -o lib1.so -x c - <<<"void y(); void x(){y();}"
$ gcc -shared -fPIC -o lib2.so -x c - </dev/null -L. -l1
$ nm -CD lib1.so
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w __cxa_finalize@GLIBC_2.2.5
                 w __gmon_start__
                 U __stack_chk_fail@GLIBC_2.4
0000000000001138 T x
                 U y
$ nm -CD lib2.so
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w __cxa_finalize@GLIBC_2.2.5
                 w __gmon_start__
$ gcc -shared -fPIC -fuse-ld=gold -o lib2.so -x c - </dev/null -L. -l1
$ gcc -shared -fPIC -fuse-ld=lld -o lib2.so -x c - </dev/null -L. -l1
$ gcc -shared -fPIC -Wl,--no-allow-shlib-undefined -o lib2.so -x c - </dev/null -L. -l1
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: ./lib1.so: undefined reference to `y'
collect2: error: ld returned 1 exit status
$ gcc -o test -x c - <<<"int main(){}" -L. -l1
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: ./lib1.so: undefined reference to `y'
collect2: error: ld returned 1 exit status

@Arfrever
Copy link
Author

Arfrever commented Oct 21, 2022

Solution sufficiently handling these situations (and also working for static libraries with undefined references), would be to perform link check by creating shared library, not executable.

# echo 'INPUT ( /lib64/libnonexistent )' > /lib64/liba.so
$ gcc -shared -fPIC -o lib.so -x c - </dev/null -la
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /lib64/libnonexistent
collect2: error: ld returned 1 exit status
# echo $'\x80\x80\x80\x80' > /lib64/liba.so
$ gcc -shared -fPIC -o lib.so -x c - </dev/null -la
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../lib64/liba.so: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
# chmod a-r /lib64/liba.so
$ gcc -shared -fPIC -o lib.so -x c - </dev/null -la
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -la
collect2: error: ld returned 1 exit status

Test for static library (lib1.a) with undefined reference:

$ gcc -c -fPIC -o lib1.o -x c - <<<"void y(); void x(){y();}"
$ ar cr lib1.a lib1.o
$ nm -C lib1.a

lib1.o:
                 U __stack_chk_fail
0000000000000000 T x
                 U y
$ gcc -c -fPIC -o lib2.o -x c - </dev/null
$ gcc -shared -fPIC -o lib2.so lib2.o lib1.a
$ gcc -shared -fPIC -Wl,--no-allow-shlib-undefined -o lib2.so lib2.o lib1.a

When creating executable linked against static library (lib1.a) with undefined reference, results depends on whether there is any reference to code containing undefined reference:

$ gcc -c -o main1.o -x c - <<<"int main(){}"
$ gcc -c -o main2.o -x c - <<<"void x(); int main(){x();}"
$ gcc -o main1 main1.o lib1.a
$ gcc -o main2 main2.o lib1.a
/x86_64-pc-linux-gnu/gcc-bin/12/../../../lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: lib1.a(lib1.o): in function `x':
:(.text+0x1d): undefined reference to `y'
collect2: error: ld returned 1 exit status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants