Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lld:MachO crashes when linking with same symbol from different architectures' archives #56386

Open
PRESIDENT810 opened this issue Jul 5, 2022 · 6 comments
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:MachO

Comments

@PRESIDENT810
Copy link
Contributor

PRESIDENT810 commented Jul 5, 2022

How to produce:
foo.cpp:

void foo(){
  return;
}

bar.cpp

extern void foo();

int bar(){
  foo();
  return 0;
}

main.cpp

extern int bar();

int main() {
  bar();
  return 0;
}

Build the executable with the following command:

clang -arch arm64 foo.cpp -c -o foo_arm64.o
ar rcs libfoo1.a foo_arm64.o
clang -arch x86_64 foo.cpp -c -o foo_x86_64.o
ar rcs libfoo2.a foo_x86_64.o
clang -arch x86_64 bar.cpp -c -o bar.o
ar rcs libbar.a bar.o
clang -arch x86_64 main.cpp libfoo1.a libfoo2.a libbar.a -fuse-ld=/Users/bytedance/github/llvm-project/build/bin/ld64.lld

Then ld64.lld gives the following crash log:

ld64.lld: warning: libfoo1.a(foo_arm64.o) has architecture arm64 which is incompatible with target architecture x86_64
cannot be TLV
UNREACHABLE executed at /Users/bytedance/github/llvm-project/lld/MachO/Symbols.h:68!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /Users/bytedance/github/llvm-project/build/bin/ld64.lld -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -no_deduplicate -dynamic -arch x86_64 -platform_version macos 12.0.0 12.3 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -o a.out -L/usr/local/lib /var/folders/mc/j3dq2z114_l_cpcpjvnlywnr0000gp/T/main-122511.o libfoo1.a libfoo2.a libbar.a -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  lld                      0x0000000104c1127d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
1  lld                      0x0000000104c117fb PrintStackTraceSignalHandler(void*) + 27
2  lld                      0x0000000104c0f616 llvm::sys::RunSignalHandlers() + 134
3  lld                      0x0000000104c12e9f SignalHandler(int) + 223
4  libsystem_platform.dylib 0x00007ff80ba66dfd _sigtramp + 29
5  libsystem_platform.dylib 0x00007ff7bb5ab4a0 _sigtramp + 18446744072362411712
6  libsystem_c.dylib        0x00007ff80b99cd24 abort + 123
7  lld                      0x0000000104a49590 llvm::install_out_of_memory_new_handler() + 0
8  lld                      0x00000001053432f4 lld::macho::Symbol::isTlv() const + 36
9  lld                      0x00000001053262bb lld::macho::validateSymbolRelocation(lld::macho::Symbol const*, lld::macho::InputSection const*, lld::macho::Reloc const&) + 123
10 lld                      0x00000001053a2ce8 (anonymous namespace)::Writer::scanRelocations() + 408
11 lld                      0x0000000105398b01 void (anonymous namespace)::Writer::run<lld::macho::LP64>() + 113
12 lld                      0x0000000105398a31 void lld::macho::writeResult<lld::macho::LP64>() + 49
13 lld                      0x0000000105257fc4 lld::macho::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) + 16596
14 lld                      0x00000001049565a9 lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) + 153
15 lld                      0x0000000104956314 main + 180
16 dyld                     0x000000012480851e start + 462
clang: error: unable to execute command: Abort trap: 6
clang: error: linker command failed due to signal (use -v to see invocation)

The problem occurs because by the linking order, lld first loads libfoo1.a, which is an arm64 archive. At this time lld only reads the header of libfoo1.a, and add void foo() as a lazy archive symbol to lld's symbol table. Then lld loads libfoo2.a, which is a x86_64 archive, and it contains the void foo() symbol that we want, because we are linking a x86_64 executable. However, lld finds that its symbol table already has void foo() from the arm64 archive, so it will not add this x86_64 one to its symbol table.

When lld tries to solve the undefined symbol int bar() in main.o, it loads bar.o from libbar.a, and realizes that there is an undefined symbol void foo() in bar.o, so it tries to load void foo() from foo_arm64.o from libfoo1.a according to info from its symbol table. At this time, lld finds that foo_arm64's architecture is incompatible with our target architecture, so it stops parsing foo_arm64.o. However, void foo() still remains as a lazy archive symbol in lld's symbol table. Therefore, in the relocation stage, lld goes into lld::macho::validateSymbolRelocation:

if (relocAttrs.hasAttr(RelocAttrBits::TLV) != sym->isTlv())
    error(message(Twine("requires that symbol ") + sym->getName() + " " +
                  (sym->isTlv() ? "not " : "") + "be thread-local"));

Because lld cannot determine whether a lazy archive symbol is a TLV (and there is no method override for LazyArchive symbol), it goes to the virtual method in the parent class (Symbol), which gives us the llvm_unreachable error.

I think we can fix this by making ArchiveFile::addLazySymbols check the architecture of the archive it reads. If the architecture doesn't match our target architecture, we shouldn't add lazy archive symbols from its header.

I'm not sure if this is the right way to fix it. Need some suggestions here.

By the way, if you switch the order of libfoo1.a and libfoo2.a, the linking will succeed (lld will load the x86_64 symbol first). I think we should make it works regardless of the linking order we use, and ignore symbols from archives with incompatible architectures?

@EugeneZelenko EugeneZelenko added lld:MachO crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue labels Jul 5, 2022
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 5, 2022

@llvm/issue-subscribers-lld-macho

@int3
Copy link
Contributor

int3 commented Jul 6, 2022

I think we can fix this by making ArchiveFile::addLazySymbols check the architecture of the archive it reads. If the architecture doesn't match our target architecture, we shouldn't add lazy archive symbols from its header.

Seems reasonable; it looks like ld64 is also filtering out archive members from non-matching targets. Though from what I understand, architecture information isn't stored in the archive header, so we have to read the archive member itself to figure this out. I'm slightly concerned about perf cost of the additional I/O this involves, but it looks like LLD-ELF also pages in archive member headers when loading archives, so I guess it might be okay. Some benchmarks would be nice though.

@PRESIDENT810
Copy link
Contributor Author

I think we can fix this by making ArchiveFile::addLazySymbols check the architecture of the archive it reads. If the architecture doesn't match our target architecture, we shouldn't add lazy archive symbols from its header.

Seems reasonable; it looks like ld64 is also filtering out archive members from non-matching targets. Though from what I understand, architecture information isn't stored in the archive header, so we have to read the archive member itself to figure this out. I'm slightly concerned about perf cost of the additional I/O this involves, but it looks like LLD-ELF also pages in archive member headers when loading archives, so I guess it might be okay. Some benchmarks would be nice though.

I can come up with a dumb workaround. We can read the first member's memory buffer from each archives we are about to load their symbol table, and check its header to see if the first member's architecture matches our target. Since an archive cannot consists of object files of different architectures, checking one child's header can give us enough information of this archive's overall architecture type. I guess this could only introduce the overhead of reading one more object file's header per archive? Not sure if this performance cost could be serious or not. Though it could be a little bit ugly. Is there any more clever or elegant ways to do this?

PRESIDENT810 added a commit to PRESIDENT810/llvm-project that referenced this issue Jul 6, 2022
@smeenai
Copy link
Collaborator

smeenai commented Jul 6, 2022

Though from what I understand, architecture information isn't stored in the archive header, so we have to read the archive member itself to figure this out. I'm slightly concerned about perf cost of the additional I/O this involves, but it looks like LLD-ELF also pages in archive member headers when loading archives, so I guess it might be okay. Some benchmarks would be nice though.

Not sure exactly what you're referring to, but it may have been part of a recent change to handle archives the same way as --start-lib objects: https://reviews.llvm.org/D119074. The motivation there was to avoid interning symbols twice for archives, and it turned out to be a speedup in practice. CC @MaskRay.

@int3
Copy link
Contributor

int3 commented Jul 6, 2022

Ah yeah I was just looking at the code, not the diff that implemented it. Makes sense though. I guess implementing that is a bit out of scope for this issue though. We can fix this first, measure the perf impact, and then implement that later on if necessary to speed things up.

Since an archive cannot consists of object files of different architectures

I was playing with this yesterday and it seems like while cctools' ar refuses to generate archives with multiple archs, llvm-ar is capable of doing so. Not sure if anyone actually does that in practice, but it seems prudent not to assume that all object files in an archive have the same architecture.

@smeenai
Copy link
Collaborator

smeenai commented Jul 6, 2022

man libtool on macOS implies that an archive containing universal objects is technically invalid, and I'd expect the same to extend to archives containing objects of different architectures:

Ranlib takes all correct forms of libraries (universal files containing archives, and simple archives) and updates the table of contents for all archives in the file. Ranlib also takes one common incorrect form of archive, an archive whose members are universal object files, adding or updating the table of contents and producing the library in correct form (a universal file containing multiple archives).

If cctools-ar disallows that and llvm-ar allows it, that sounds like an llvm-ar bug, though that's also a different issue, of course.

PRESIDENT810 added a commit to PRESIDENT810/llvm-project that referenced this issue Jul 12, 2022
This fixes llvm#56386, but it's based on the assumption that an archive cannot contain object files of multiple architecture. In practice however, such archives can be produced by llvm-ar, which should be a bug because cctool's ar would refuse to do so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:MachO
Projects
None yet
Development

No branches or pull requests

5 participants