-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LinkerWrapper] Relax ordering of static libraries for offloading #87532
Conversation
Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while `ld.lld` relaxes this rule. The linker wrapper went with the
former as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like `libc` in the OpenMP runtime because
it would sometimes be linked before or after.
This patch reworks the logic to more or less perform the following logic
for static libraries.
1. Split library / object inputs.
2. Include every object input and record its undefined symbols
3. Repeatedly try to extract static libraries to resolve these
symbols. If a file is extracted we need to check every library
again to resolve any new undefined symbols.
This allows the following to work and will cause fewer issues when
replacing HIP, which does `--whole-archive` so it's very likely the old
logic will regress.
```console
$ clang -lfoo main.c -fopenmp --offload-arch=native
```
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: This patch reworks the logic to more or less perform the following logic
This allows the following to work and will cause fewer issues when $ clang -lfoo main.c -fopenmp --offload-arch=nativeFull diff: https://github.com/llvm/llvm-project/pull/87532.diff 2 Files Affected:
diff --git a/clang/test/Driver/linker-wrapper-libs.c b/clang/test/Driver/linker-wrapper-libs.c
index 9a78200d7d3cfc..119e306857187e 100644
--- a/clang/test/Driver/linker-wrapper-libs.c
+++ b/clang/test/Driver/linker-wrapper-libs.c
@@ -44,6 +44,8 @@ int bar() { return weak; }
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld %t.o %t.a -o a.out 2>&1 \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN: --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
// RUN: | FileCheck %s --check-prefix=LIBRARY-RESOLVES
// LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
@@ -66,6 +68,8 @@ int bar() { return weak; }
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld %t.o %t.a -o a.out 2>&1 \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN: --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
// RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL
// LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index c60be2789bd61e..7eb57e1f6562b2 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1447,9 +1447,9 @@ getDeviceInput(const ArgList &Args) {
StringSaver Saver(Alloc);
// Try to extract device code from the linker input files.
- DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputFiles;
- DenseMap<OffloadFile::TargetID, DenseMap<StringRef, Symbol>> Syms;
bool WholeArchive = Args.hasArg(OPT_wholearchive_flag) ? true : false;
+ SmallVector<OffloadFile> ObjectFilesToExtract;
+ SmallVector<OffloadFile> ArchiveFilesToExtract;
for (const opt::Arg *Arg : Args.filtered(
OPT_INPUT, OPT_library, OPT_whole_archive, OPT_no_whole_archive)) {
if (Arg->getOption().matches(OPT_whole_archive) ||
@@ -1485,50 +1485,87 @@ getDeviceInput(const ArgList &Args) {
if (Error Err = extractOffloadBinaries(Buffer, Binaries))
return std::move(Err);
- // We only extract archive members that are needed.
- bool IsArchive = identify_magic(Buffer.getBuffer()) == file_magic::archive;
- bool Extracted = true;
- while (Extracted) {
- Extracted = false;
- for (OffloadFile &Binary : Binaries) {
- // If the binary was previously extracted it will be set to null.
- if (!Binary.getBinary())
+ for (auto &OffloadFile : Binaries) {
+ if (identify_magic(Buffer.getBuffer()) == file_magic::archive &&
+ !WholeArchive)
+ ArchiveFilesToExtract.emplace_back(std::move(OffloadFile));
+ else
+ ObjectFilesToExtract.emplace_back(std::move(OffloadFile));
+ }
+ }
+
+ // Link all standard input files and update the list of symbols.
+ DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputFiles;
+ DenseMap<OffloadFile::TargetID, DenseMap<StringRef, Symbol>> Syms;
+ for (OffloadFile &Binary : ObjectFilesToExtract) {
+ if (!Binary.getBinary())
+ continue;
+
+ SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
+ for (const auto &[ID, Input] : InputFiles)
+ if (object::areTargetsCompatible(Binary, ID))
+ CompatibleTargets.emplace_back(ID);
+
+ for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
+ Expected<bool> ExtractOrErr = getSymbols(
+ Binary.getBinary()->getImage(), Binary.getBinary()->getOffloadKind(),
+ /*IsArchive=*/false, Saver, Syms[ID]);
+ if (!ExtractOrErr)
+ return ExtractOrErr.takeError();
+
+ // If another target needs this binary it must be copied instead.
+ if (Index == CompatibleTargets.size() - 1)
+ InputFiles[ID].emplace_back(std::move(Binary));
+ else
+ InputFiles[ID].emplace_back(Binary.copy());
+ }
+ }
+
+ // Archive members only extract if they define needed symbols. We do this
+ // after every regular input file so that libraries may be included out of
+ // order. This follows 'ld.lld' semantics which are more lenient.
+ bool Extracted = true;
+ while (Extracted) {
+ Extracted = false;
+ for (OffloadFile &Binary : ArchiveFilesToExtract) {
+ // If the binary was previously extracted it will be set to null.
+ if (!Binary.getBinary())
+ continue;
+
+ SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
+ for (const auto &[ID, Input] : InputFiles)
+ if (object::areTargetsCompatible(Binary, ID))
+ CompatibleTargets.emplace_back(ID);
+
+ for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
+ // Only extract an if we have an an object matching this target.
+ if (!InputFiles.count(ID))
continue;
- SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
- for (const auto &[ID, Input] : InputFiles)
- if (object::areTargetsCompatible(Binary, ID))
- CompatibleTargets.emplace_back(ID);
-
- for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
- // Only extract an if we have an an object matching this target.
- if (IsArchive && !WholeArchive && !InputFiles.count(ID))
- continue;
-
- Expected<bool> ExtractOrErr = getSymbols(
- Binary.getBinary()->getImage(),
- Binary.getBinary()->getOffloadKind(), IsArchive, Saver, Syms[ID]);
- if (!ExtractOrErr)
- return ExtractOrErr.takeError();
-
- Extracted = !WholeArchive && *ExtractOrErr;
-
- // Skip including the file if it is an archive that does not resolve
- // any symbols.
- if (IsArchive && !WholeArchive && !Extracted)
- continue;
-
- // If another target needs this binary it must be copied instead.
- if (Index == CompatibleTargets.size() - 1)
- InputFiles[ID].emplace_back(std::move(Binary));
- else
- InputFiles[ID].emplace_back(Binary.copy());
- }
+ Expected<bool> ExtractOrErr =
+ getSymbols(Binary.getBinary()->getImage(),
+ Binary.getBinary()->getOffloadKind(), /*IsArchive=*/true,
+ Saver, Syms[ID]);
+ if (!ExtractOrErr)
+ return ExtractOrErr.takeError();
- // If we extracted any files we need to check all the symbols again.
- if (Extracted)
- break;
+ Extracted = *ExtractOrErr;
+
+ // Skip including the file if it is an archive that does not resolve
+ // any symbols.
+ if (!Extracted)
+ continue;
+
+ // If another target needs this binary it must be copied instead.
+ if (Index == CompatibleTargets.size() - 1)
+ InputFiles[ID].emplace_back(std::move(Binary));
+ else
+ InputFiles[ID].emplace_back(Binary.copy());
}
+
+ // If we extracted any files we need to check all the symbols again.
+ if (Extracted)
+ break;
}
}
|
| // after every regular input file so that libraries may be included out of | ||
| // order. This follows 'ld.lld' semantics which are more lenient. | ||
| bool Extracted = true; | ||
| while (Extracted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read through this patch yet, before you may consider lld/ELF's algorithm (the basic structure is roughly used everywhere except GNU ld/gold and some ancient Unix linkers)
There is a global symbol table (mapping from name => Symbol).
For def.a undef.o, lld scans def.a and makes the global symbol table entry LazySymbol (a pending state https://github.com/llvm/llvm-project/blob/main/lld/ELF/Symbols.cpp#L628). Then lld scans undef.o and extracts the LazySymbol (https://github.com/llvm/llvm-project/blob/main/lld/ELF/Symbols.cpp#L660 LazySymbol -> Defined).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here basically seeks to get a list of files that participates in the linking. This implementation kind of does a repeated fixed point scanning until it's done since that seemed to be the simplest way to query whether or not a file should be passed to the linker. It's kind of like a first pass that seeks to turn lib.a into its constituent object files and then pass it to ld.lld obj1.o obj3.o.
|
ping |
|
if a.o depends on b.o in libb.a, and b.o depends on c.o in libb.a, will the linker wrapper be able to link all the dependencies? For archive of objects with embedded device bitcode, why not creating an archive of device bitcode for each required target ID and let lld handle the dependencies in its usual way? |
Given an invocation like this with those dependencies $ clang main.o libb.a liba.amain.o will extract
I tried doing that at a very early stage, two reasons.
For 2. I'm thinking of embedding callable kernels into the host symbol table somehow. That way we can make the offloading extraction rules work more like "If this kernel is externally visible and used by the host it should be extracted". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while
ld.lldrelaxes this rule. The linker wrapper went with theformer as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like
libcin the OpenMP runtime becauseit would sometimes be linked before or after.
This patch reworks the logic to more or less perform the following logic
for static libraries.
symbols. If a file is extracted we need to check every library
again to resolve any new undefined symbols.
This allows the following to work and will cause fewer issues when
replacing HIP, which does
--whole-archiveso it's very likely the oldlogic will regress.
$ clang -lfoo main.c -fopenmp --offload-arch=native