Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LTO] Objects brought in by LTO-generated libcalls do not handle deplibs. #56070

Closed
mysterymath opened this issue Jun 16, 2022 · 13 comments · Fixed by #98565
Closed

[LTO] Objects brought in by LTO-generated libcalls do not handle deplibs. #56070

mysterymath opened this issue Jun 16, 2022 · 13 comments · Fixed by #98565
Labels
bug Indicates an unexpected problem or unintended behavior lld:ELF

Comments

@mysterymath
Copy link
Contributor

mysterymath commented Jun 16, 2022

We've been seeing failures when attempting to build ICU and FFmpeg on aarch64-...-fuchsia with LTO enabled:

ld.lld: error: undefined symbol: _zx_system_get_features
>>> referenced by cpu_model.c:818 (.../cpu_model.c:818)
>>>               cpu_model.c.obj:(init_have_lse_atomics) in archive .../clang/linux-x64/lib/clang/15.0.0/lib/aarch64-unknown-fuchsia/libclang_rt.builtins.a
clang++: error: ld.lld command failed with exit code 1 (use -v to see invocation)

We've traced this down to an LTO corner case. LTO code generation can introduce new undefined symbols by emitting external references to libcalls; this can, for example, happen if external atomics are used on aarch64. From the comments in the ELF LTO driver, it appears that it's not generally desirable to pull these in before LTO (except if absolutely necessary, say, if they contain bitcode), since any given libcall may not actually ever be emitted, and this may pull in unnecessary dependencies. Until LTO code generation is finished, the precise set of libcalls needed is unknown, and cannot be determined by examining the bitcode.

The LTO object file seems to be parsed and added to the link largely if it were an input file; this may pull in other object files to satisfy these new libcall references. This seems to work fine, as far as it goes; the trouble is that the objects files pulled in are only partially processed. In particular, if they contain a .deplibs section pointing to another library, then that library is never fully added to the link, which can cause symbols in the object file (_zx_system_get_features above) to remain unresolved.

A candidate fix (https://reviews.llvm.org/D127885) is to add parseFile() calls to handle any newly-introduced files (such as those produced by .deplibs) as they would have been handled had they been present before LTO occurred.

@mysterymath mysterymath added the bug Indicates an unexpected problem or unintended behavior label Jun 16, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 16, 2022

@llvm/issue-subscribers-bug

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 16, 2022

@llvm/issue-subscribers-lld-elf

@MaskRay
Copy link
Member

MaskRay commented Jun 16, 2022

@pcc @teresajohnson

@tstellar
Copy link
Collaborator

This seems similar to the problem I ran into with rust: https://discourse.llvm.org/t/lto-and-probe-stack-functions/59462

@bd1976bris
Copy link
Collaborator

Interesting, just shows how difficult it is to introduce even a comparatively simple feature like .deplibs! Currently, the design of LTO (AIUI) is that the bitcode inputs must provide all of the information required by the linker's scan phase (COMDATs, symbols etc..) as if the linker was scanning the complied object file generated from that bitcode file. Later in the link, when the LTO-compiled files are processed, they only fill in the stuff expected after scan - they don't add e.g. new symbols.

libcalls are at odds with this design as there is no way to know the list of required libcalls without compiling the bitcode. In order to resolve this problem we currently pretend that the bitcode contains references to all of the libcalls - but as explained in the following comment taken from the source we are conservative in doing this:

"""
// However, adding all libcall symbols to the link can have undesired
// consequences. For example, the libgcc implementation of
// __sync_val_compare_and_swap_8 on 32-bit ARM pulls in an .init_array entry
// that aborts the program if the Linux kernel does not support 64-bit
// atomics, which would prevent the program from running even if it does not
// use 64-bit atomics.
//
// Therefore, we only add libcall symbols to the link before LTO if we have
// to, i.e. if the symbol's definition is in bitcode. Any other required
// libcall symbols will be added to the link after LTO when we add the LTO
// object file to the link.
"""

So, given the current design, some possibilities to fix this are (off the top of my head):

  • Ignore the problem but document it - some libraries would need modifying to work with LTO (e.g. by replacing current object file archive members with bitcode equivalents), or perhaps .deplibs couldn't be used with LTO in certain libraries.
  • Pull in both object and bitcode archive members if they resolve a libcall - would need to resolve problems with existing libraries as noted in the code comment.
  • Pull in object file archive members if they resolve a libcall and they have .deplibs - maybe that would be restrictive enough to avoid the problems mentioned in the comment.

@mysterymath
Copy link
Contributor Author

I'm very far from a linker expert, so take all the following with a grain of salt.

If I'm understanding the present situation correctly, at the end of the link, the new LTO-ed object files may add new references to lazy symbols present in the link. This would in turn call resolve on those symbols, which would call Symbol->extract(), which then call parseFile() on the containing file. This would happen after the link, and could add additional symbols. (?)

If it's already the case that parseFile is called on some files post-LTO, why not call it again on those added as .depfiles? The halfway situation we have seems odder than either recursively pulling in everything libcall-related post-LTO or preemptively pulling in everything possibly libcall-related pre-LTO.

@mysterymath
Copy link
Contributor Author

Ah, found the error in my thinking (I think); the libraries to use when satisfying post-LTO symbols have already been added to the link, and stock taken of their symbols when they were scanned in the first place. Extracting and explicitly parsing one of their object files shouldn't change the symbol table, right?

@mysterymath
Copy link
Contributor Author

mysterymath commented Jun 17, 2022

Of the options you gave, the last seems like it'd be most desirable here. It's not as dramatic as pulling completely new undefined symbol references into the link; adding a library's symbols to the link lazily won't necessarily bring in any object files at all (right?); it seems like the most dramatic difference would be that certain weak symbols might go from unsatisfied to satisfied. (Which would then also pull in possibly-unnecessary object files.)

There's two big implementation issues I could see with this off the bat.

First, the AArch64 libcalls aren't in the list used by lld; generally anytime a target calls setLibcallNames their libcalls run the risk of getting missed by LTO. Would we want to just add those to the linker's list as well? Should that become a repository of all libcalls, used by anything, anywhere? Or maybe it would be better to try to initialize a target machine somehow and query its libcalls?

Second, in handleLibcalls, we'd need to scan the containing object files and see if their were any deplibs sections. This would be a somewhat more work than usual, but might not be too bad.

@mysterymath
Copy link
Contributor Author

I've reworked https://reviews.llvm.org/D127885 to parse deplib sections from libcall-supplying object files before LTO occurs; please take a look.

@MaskRay
Copy link
Member

MaskRay commented Jun 30, 2022

Thanks for the clear report and analysis.
libclang_rt.builtins.a or libgcc.a is really intended to be a leaf library (with exceptions that some part may call very specific libc/libunwind functions).
I am unsure having .deplibs in such a library is an intended use case, therefore your use case looks like an unsupported use to me.
https://reviews.llvm.org/D127885 introduces substantial complexity to symbol resolution so I am very nervous.
The hard-coded list of AArch64 libcalls in lld/ELF/Arch/AArch64.cpp also makes me concerned.

Can you figure out a way without using .deplibs? Then we can just note down this known limitation and close this as wontfix.

@mysterymath
Copy link
Contributor Author

Thanks for the clear report and analysis. libclang_rt.builtins.a or libgcc.a is really intended to be a leaf library (with exceptions that some part may call very specific libc/libunwind functions).

I took a moment to track down where the deplibs is coming from:

#pragma comment(lib, "zircon")

Looks like the determination of whether certain atomics are available on Fuchsia requires a direct syscall via the libzircon.so vDSO provided by the kernel.

@petrhosek
Copy link
Member

This is a perfectly valid use case for .deplibs and I'm not aware of any other solution that isn't inferior so my preference would be the solution implemented in https://reviews.llvm.org/D127885.

@mysterymath
Copy link
Contributor Author

So, I finally got around to reproducing the test case to work against COFF lld-link, just to see how it behaves with the equivalent COFF feature. (MSVC's pragma comment was the origin of the deplibs feature, right?) It seems to work out of the box; COFF lld maintains a task queue, and any deplibs encountered when processing post-LTO objects will add new items on the queue. These may in turn add to the queue, which is run until completion at various points in the COFF link, including right after LTO code generation.

The difference in behavior is a bit surprising; it's more similar to the original patch I had given near the top of the discussion. I'm not sure whether this creates any oddities in the COFF LTO symbol resolution semantics either; not sure what to look for along those lines.

mysterymath added a commit to mysterymath/llvm-project that referenced this issue Jul 12, 2024
Parsing the new input file's symbols might invalidate LTO codegen, but
the semantics of deplibs require them to be parsed. Accordingly, report
an error unless the file had already been added to the link.

Fixes llvm#56070
mysterymath added a commit that referenced this issue Jul 12, 2024
Parsing the new input file's symbols might invalidate LTO codegen, but
the semantics of deplibs require them to be parsed. Accordingly, report
an error unless the file had already been added to the link.

Fixes #56070
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this issue Jul 14, 2024
Parsing the new input file's symbols might invalidate LTO codegen, but
the semantics of deplibs require them to be parsed. Accordingly, report
an error unless the file had already been added to the link.

Fixes llvm#56070
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior lld:ELF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants