Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LinkerWrapper] Relax ordering of static libraries for offloading #87532

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Apr 3, 2024

Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while ld.lld relaxes this rule. The linker wrapper went with the
former as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like libc in the OpenMP runtime because
it would sometimes be linked before or after.

This patch reworks the logic to more or less perform the following logic
for static libraries.

  1. Split library / object inputs.
  2. Include every object input and record its undefined symbols
  3. Repeatedly try to extract static libraries to resolve these
    symbols. If a file is extracted we need to check every library
    again to resolve any new undefined symbols.

This allows the following to work and will cause fewer issues when
replacing HIP, which does --whole-archive so it's very likely the old
logic will regress.

$ clang -lfoo main.c -fopenmp --offload-arch=native

Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while `ld.lld` relaxes this rule. The linker wrapper went with the
former as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like `libc` in the OpenMP runtime because
it would sometimes be linked before or after.

This patch reworks the logic to more or less perform the following logic
for static libraries.

  1. Split library / object inputs.
  2. Include every object input and record its undefined symbols
  3. Repeatedly try to extract static libraries to resolve these
     symbols. If a file is extracted we need to check every library
     again to resolve any new undefined symbols.

This allows the following to work and will cause fewer issues when
replacing HIP, which does `--whole-archive` so it's very likely the old
logic will regress.
```console
$ clang -lfoo main.c -fopenmp --offload-arch=native
```
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Apr 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The linker wrapper attempts to maintain consistent semantics with
existing host invocations. Static libraries by default only extract if
there are non-weak symbols that remain undefined. However, we have
situations between linkers that put different meanings on ordering. The
ld.bfd linker requires static libraries to be defined after the symbols,
while ld.lld relaxes this rule. The linker wrapper went with the
former as it's the easier solution, however this has caused a lot of
issues as I've had to explain this rule to several people, it also make
it difficult to include things like libc in the OpenMP runtime because
it would sometimes be linked before or after.

This patch reworks the logic to more or less perform the following logic
for static libraries.

  1. Split library / object inputs.
  2. Include every object input and record its undefined symbols
  3. Repeatedly try to extract static libraries to resolve these
    symbols. If a file is extracted we need to check every library
    again to resolve any new undefined symbols.

This allows the following to work and will cause fewer issues when
replacing HIP, which does --whole-archive so it's very likely the old
logic will regress.

$ clang -lfoo main.c -fopenmp --offload-arch=native

Full diff: https://github.com/llvm/llvm-project/pull/87532.diff

2 Files Affected:

  • (modified) clang/test/Driver/linker-wrapper-libs.c (+4)
  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+79-42)
diff --git a/clang/test/Driver/linker-wrapper-libs.c b/clang/test/Driver/linker-wrapper-libs.c
index 9a78200d7d3cfc..119e306857187e 100644
--- a/clang/test/Driver/linker-wrapper-libs.c
+++ b/clang/test/Driver/linker-wrapper-libs.c
@@ -44,6 +44,8 @@ int bar() { return weak; }
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o %t.a -o a.out 2>&1 \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-RESOLVES
 
 // LIBRARY-RESOLVES: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
@@ -66,6 +68,8 @@ int bar() { return weak; }
 // RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
 // RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
 // RUN:   --linker-path=/usr/bin/ld %t.o %t.a -o a.out 2>&1 \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN:   --linker-path=/usr/bin/ld %t.a %t.o -o a.out 2>&1 \
 // RUN: | FileCheck %s --check-prefix=LIBRARY-GLOBAL
 
 // LIBRARY-GLOBAL: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx1030 {{.*}}.o {{.*}}.o
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index c60be2789bd61e..7eb57e1f6562b2 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -1447,9 +1447,9 @@ getDeviceInput(const ArgList &Args) {
   StringSaver Saver(Alloc);
 
   // Try to extract device code from the linker input files.
-  DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputFiles;
-  DenseMap<OffloadFile::TargetID, DenseMap<StringRef, Symbol>> Syms;
   bool WholeArchive = Args.hasArg(OPT_wholearchive_flag) ? true : false;
+  SmallVector<OffloadFile> ObjectFilesToExtract;
+  SmallVector<OffloadFile> ArchiveFilesToExtract;
   for (const opt::Arg *Arg : Args.filtered(
            OPT_INPUT, OPT_library, OPT_whole_archive, OPT_no_whole_archive)) {
     if (Arg->getOption().matches(OPT_whole_archive) ||
@@ -1485,50 +1485,87 @@ getDeviceInput(const ArgList &Args) {
     if (Error Err = extractOffloadBinaries(Buffer, Binaries))
       return std::move(Err);
 
-    // We only extract archive members that are needed.
-    bool IsArchive = identify_magic(Buffer.getBuffer()) == file_magic::archive;
-    bool Extracted = true;
-    while (Extracted) {
-      Extracted = false;
-      for (OffloadFile &Binary : Binaries) {
-        // If the binary was previously extracted it will be set to null.
-        if (!Binary.getBinary())
+    for (auto &OffloadFile : Binaries) {
+      if (identify_magic(Buffer.getBuffer()) == file_magic::archive &&
+          !WholeArchive)
+        ArchiveFilesToExtract.emplace_back(std::move(OffloadFile));
+      else
+        ObjectFilesToExtract.emplace_back(std::move(OffloadFile));
+    }
+  }
+
+  // Link all standard input files and update the list of symbols.
+  DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputFiles;
+  DenseMap<OffloadFile::TargetID, DenseMap<StringRef, Symbol>> Syms;
+  for (OffloadFile &Binary : ObjectFilesToExtract) {
+    if (!Binary.getBinary())
+      continue;
+
+    SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
+    for (const auto &[ID, Input] : InputFiles)
+      if (object::areTargetsCompatible(Binary, ID))
+        CompatibleTargets.emplace_back(ID);
+
+    for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
+      Expected<bool> ExtractOrErr = getSymbols(
+          Binary.getBinary()->getImage(), Binary.getBinary()->getOffloadKind(),
+          /*IsArchive=*/false, Saver, Syms[ID]);
+      if (!ExtractOrErr)
+        return ExtractOrErr.takeError();
+
+      // If another target needs this binary it must be copied instead.
+      if (Index == CompatibleTargets.size() - 1)
+        InputFiles[ID].emplace_back(std::move(Binary));
+      else
+        InputFiles[ID].emplace_back(Binary.copy());
+    }
+  }
+
+  // Archive members only extract if they define needed symbols. We do this
+  // after every regular input file so that libraries may be included out of
+  // order. This follows 'ld.lld' semantics which are more lenient.
+  bool Extracted = true;
+  while (Extracted) {
+    Extracted = false;
+    for (OffloadFile &Binary : ArchiveFilesToExtract) {
+      // If the binary was previously extracted it will be set to null.
+      if (!Binary.getBinary())
+        continue;
+
+      SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
+      for (const auto &[ID, Input] : InputFiles)
+        if (object::areTargetsCompatible(Binary, ID))
+          CompatibleTargets.emplace_back(ID);
+
+      for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
+        // Only extract an if we have an an object matching this target.
+        if (!InputFiles.count(ID))
           continue;
 
-        SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
-        for (const auto &[ID, Input] : InputFiles)
-          if (object::areTargetsCompatible(Binary, ID))
-            CompatibleTargets.emplace_back(ID);
-
-        for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
-          // Only extract an if we have an an object matching this target.
-          if (IsArchive && !WholeArchive && !InputFiles.count(ID))
-            continue;
-
-          Expected<bool> ExtractOrErr = getSymbols(
-              Binary.getBinary()->getImage(),
-              Binary.getBinary()->getOffloadKind(), IsArchive, Saver, Syms[ID]);
-          if (!ExtractOrErr)
-            return ExtractOrErr.takeError();
-
-          Extracted = !WholeArchive && *ExtractOrErr;
-
-          // Skip including the file if it is an archive that does not resolve
-          // any symbols.
-          if (IsArchive && !WholeArchive && !Extracted)
-            continue;
-
-          // If another target needs this binary it must be copied instead.
-          if (Index == CompatibleTargets.size() - 1)
-            InputFiles[ID].emplace_back(std::move(Binary));
-          else
-            InputFiles[ID].emplace_back(Binary.copy());
-        }
+        Expected<bool> ExtractOrErr =
+            getSymbols(Binary.getBinary()->getImage(),
+                       Binary.getBinary()->getOffloadKind(), /*IsArchive=*/true,
+                       Saver, Syms[ID]);
+        if (!ExtractOrErr)
+          return ExtractOrErr.takeError();
 
-        // If we extracted any files we need to check all the symbols again.
-        if (Extracted)
-          break;
+        Extracted = *ExtractOrErr;
+
+        // Skip including the file if it is an archive that does not resolve
+        // any symbols.
+        if (!Extracted)
+          continue;
+
+        // If another target needs this binary it must be copied instead.
+        if (Index == CompatibleTargets.size() - 1)
+          InputFiles[ID].emplace_back(std::move(Binary));
+        else
+          InputFiles[ID].emplace_back(Binary.copy());
       }
+
+      // If we extracted any files we need to check all the symbols again.
+      if (Extracted)
+        break;
     }
   }
 

// after every regular input file so that libraries may be included out of
// order. This follows 'ld.lld' semantics which are more lenient.
bool Extracted = true;
while (Extracted) {
Copy link
Member

@MaskRay MaskRay Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read through this patch yet, before you may consider lld/ELF's algorithm (the basic structure is roughly used everywhere except GNU ld/gold and some ancient Unix linkers)

There is a global symbol table (mapping from name => Symbol).
For def.a undef.o, lld scans def.a and makes the global symbol table entry LazySymbol (a pending state https://github.com/llvm/llvm-project/blob/main/lld/ELF/Symbols.cpp#L628). Then lld scans undef.o and extracts the LazySymbol (https://github.com/llvm/llvm-project/blob/main/lld/ELF/Symbols.cpp#L660 LazySymbol -> Defined).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here basically seeks to get a list of files that participates in the linking. This implementation kind of does a repeated fixed point scanning until it's done since that seemed to be the simplest way to query whether or not a file should be passed to the linker. It's kind of like a first pass that seeks to turn lib.a into its constituent object files and then pass it to ld.lld obj1.o obj3.o.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 9, 2024

ping

@yxsamliu
Copy link
Collaborator

yxsamliu commented Apr 9, 2024

if a.o depends on b.o in libb.a, and b.o depends on c.o in libb.a, will the linker wrapper be able to link all the dependencies?

For archive of objects with embedded device bitcode, why not creating an archive of device bitcode for each required target ID and let lld handle the dependencies in its usual way?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 9, 2024

if a.o depends on b.o in libb.a, and b.o depends on c.o in libb.a, will the linker wrapper be able to link all the dependencies?

Given an invocation like this with those dependencies

$ clang main.o libb.a liba.a

main.o will extract a.o from liba.a. We will then re-scan all the libraries so a.o will extract b.o from libb.a. This will then re-scan all the libraries and find nothing left to extract so we are done.

For archive of objects with embedded device bitcode, why not creating an archive of device bitcode for each required target ID and let lld handle the dependencies in its usual way?

I tried doing that at a very early stage, two reasons.

  1. This needs to work for CUDA / x86_64 linking so we can't rely on LLD specific handling
  2. Offloading requires special handling for kernels and globals since we always want to extract callable kernels. This is a little hacky.

For 2. I'm thinking of embedding callable kernels into the host symbol table somehow. That way we can make the offloading extraction rules work more like "If this kernel is externally visible and used by the host it should be extracted".

Copy link
Collaborator

@yxsamliu yxsamliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@jhuber6 jhuber6 merged commit 50d368a into llvm:main Apr 10, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants