Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD] [COFF] Error out if new LTO objects are pulled in after the main LTO compilation #71337

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

mstorsjo
Copy link
Member

@mstorsjo mstorsjo commented Nov 5, 2023

Normally, this shouldn't happen. It can happen in exceptional circumstances, if the compiled output of a bitcode object file references symbols that weren't listed as undefined in the bitcode object file itself.

This can at least happen in the following cases:

  • A custom SEH personality is set via asm()
  • Compiler generated calls to builtin helper functions, such as __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after compiling to a regular object file, that aren't visible on the level of the IR object file.

This is only an issue if the referenced symbols are provided as LTO objects themselves; loading regular object files after the LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in mingw-w64 does this. The referenced pesonality function is usually provided via an import library, but for WinStore targets, a local dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't usually provided as LTO objects (compiler-rt's builtins explicitly pass -fno-lto when building), and many of the builtins are provided as raw .S assembly files, which don't get built as LTO objects anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly with a clear message rather than crashing.

…n LTO compilation

Normally, this shouldn't happen. It can happen in exceptional
circumstances, if the compiled output of a bitcode object file
references symbols that weren't listed as undefined in the bitcode
object file itself.

This can at least happen in the following cases:
- A custom SEH personality is set via asm()
- Compiler generated calls to builtin helper functions, such as
  __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after
compiling to a regular object file, that aren't visible on the
level of the IR object file.

This is only an issue if the referenced symbols are provided as
LTO objects themselves; loading regular object files after the
LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in
mingw-w64 does this. The referenced pesonality function is usually
provided via an import library, but for WinStore targets, a local
dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't
usually provided as LTO objects (compiler-rt's builtins explicitly
pass -fno-lto when building), and many of the builtins are provided
as raw .S assembly files, which don't get built as LTO objects
anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly
with a clear message rather than crashing.
@mstorsjo mstorsjo added lld:COFF platform:windows LTO Link time optimization (regular/full LTO or ThinLTO) labels Nov 5, 2023
@mstorsjo mstorsjo requested review from rnk and MaskRay November 5, 2023 22:40
@llvmbot llvmbot added the lld label Nov 5, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 5, 2023

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lto

Author: Martin Storsjö (mstorsjo)

Changes

Normally, this shouldn't happen. It can happen in exceptional circumstances, if the compiled output of a bitcode object file references symbols that weren't listed as undefined in the bitcode object file itself.

This can at least happen in the following cases:

  • A custom SEH personality is set via asm()
  • Compiler generated calls to builtin helper functions, such as __chkstk, or __rt_sdiv on arm

Both of these produce undefined references to symbols after compiling to a regular object file, that aren't visible on the level of the IR object file.

This is only an issue if the referenced symbols are provided as LTO objects themselves; loading regular object files after the LTO compilation works fine.

Custom SEH personalities are rare, but one CRT startup file in mingw-w64 does this. The referenced pesonality function is usually provided via an import library, but for WinStore targets, a local dummy reimplementation in C is used, which can be an LTO object.

Generated calls to builtins is very common, but the builtins aren't usually provided as LTO objects (compiler-rt's builtins explicitly pass -fno-lto when building), and many of the builtins are provided as raw .S assembly files, which don't get built as LTO objects anyway, even if built with -flto.

If hitting this unusual, but possible, situation, error out cleanly with a clear message rather than crashing.


Full diff: https://github.com/llvm/llvm-project/pull/71337.diff

4 Files Affected:

  • (modified) lld/COFF/SymbolTable.cpp (+5)
  • (modified) lld/COFF/SymbolTable.h (+1)
  • (added) lld/test/COFF/lto-late-arm.ll (+39)
  • (added) lld/test/COFF/lto-late-personality.ll (+48)
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index d9673e159769fbd..15c76461a84a92c 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -61,6 +61,10 @@ void SymbolTable::addFile(InputFile *file) {
     if (auto *f = dyn_cast<ObjFile>(file)) {
       ctx.objFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<BitcodeFile>(file)) {
+      if (ltoCompilationDone) {
+        error("LTO object file " + toString(file) + " linked in after "
+              "doing LTO compilation.");
+      }
       ctx.bitcodeFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<ImportFile>(file)) {
       ctx.importFileInstances.push_back(f);
@@ -876,6 +880,7 @@ Symbol *SymbolTable::addUndefined(StringRef name) {
 }
 
 void SymbolTable::compileBitcodeFiles() {
+  ltoCompilationDone = true;
   if (ctx.bitcodeFileInstances.empty())
     return;
 
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index 511e60d1e3a0873..fc623c2840d401b 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -133,6 +133,7 @@ class SymbolTable {
 
   llvm::DenseMap<llvm::CachedHashStringRef, Symbol *> symMap;
   std::unique_ptr<BitcodeCompiler> lto;
+  bool ltoCompilationDone = false;
 
   COFFLinkerContext &ctx;
 };
diff --git a/lld/test/COFF/lto-late-arm.ll b/lld/test/COFF/lto-late-arm.ll
new file mode 100644
index 000000000000000..0e2f148ef74c6c8
--- /dev/null
+++ b/lld/test/COFF/lto-late-arm.ll
@@ -0,0 +1,39 @@
+; REQUIRES: arm
+
+;; A bitcode file can generate undefined references to symbols that weren't
+;; listed as undefined on the bitcode file itself, when lowering produces
+;; calls to e.g. builtin helper functions. If these functions are provided
+;; as LTO bitcode, the linker would hit an unhandled state. (In practice,
+;; compiler-rt builtins are always compiled with -fno-lto, so this shouldn't
+;; happen.)
+
+; RUN: rm -rf %t.dir
+; RUN: split-file %s %t.dir
+; RUN: llvm-as %t.dir/main.ll -o %t.main.obj
+; RUN: llvm-as %t.dir/sdiv.ll -o %t.sdiv.obj
+; RUN: llvm-ar rcs %t.sdiv.lib %t.sdiv.obj
+
+; RUN: env LLD_IN_TEST=1 not lld-link /entry:entry %t.main.obj %t.sdiv.lib /out:%t.exe /subsystem:console 2>&1 | FileCheck %s
+
+; CHECK: error: LTO object file lto-late-arm.ll.tmp.sdiv.lib(lto-late-arm.ll.tmp.sdiv.obj) linked in after doing LTO compilation.
+
+;--- main.ll
+target datalayout = "e-m:w-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv7-w64-windows-gnu"
+
+@num = dso_local global i32 100
+
+define dso_local arm_aapcs_vfpcc i32 @entry(i32 %param) {
+entry:
+  %0 = load i32, ptr @num
+  %div = sdiv i32 %0, %param
+  ret i32 %div
+}
+;--- sdiv.ll
+target datalayout = "e-m:w-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
+target triple = "thumbv7-w64-windows-gnu"
+
+define dso_local arm_aapcs_vfpcc void @__rt_sdiv() {
+entry:
+  ret void
+}
diff --git a/lld/test/COFF/lto-late-personality.ll b/lld/test/COFF/lto-late-personality.ll
new file mode 100644
index 000000000000000..f768f65f4521e40
--- /dev/null
+++ b/lld/test/COFF/lto-late-personality.ll
@@ -0,0 +1,48 @@
+; REQUIRES: x86
+
+;; A bitcode file can generate undefined references to symbols that weren't
+;; listed as undefined on the bitcode file itself, if there's a reference to
+;; an unexpected personality routine via asm(). If the personality function
+;; is provided as LTO bitcode, the linker would hit an unhandled state.
+
+; RUN: rm -rf %t.dir
+; RUN: split-file %s %t.dir
+; RUN: llvm-as %t.dir/main.ll -o %t.main.obj
+; RUN: llvm-as %t.dir/other.ll -o %t.other.obj
+; RUN: llvm-as %t.dir/personality.ll -o %t.personality.obj
+; RUN: llvm-ar rcs %t.personality.lib %t.personality.obj
+
+; RUN: env LLD_IN_TEST=1 not lld-link /entry:entry %t.main.obj %t.other.obj %t.personality.lib /out:%t.exe /subsystem:console /opt:lldlto=0 /debug:symtab 2>&1 | FileCheck %s
+
+; CHECK: error: LTO object file lto-late-personality.ll.tmp.personality.lib(lto-late-personality.ll.tmp.personality.obj) linked in after doing LTO compilation.
+
+;--- main.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define i32 @entry() {
+entry:
+  tail call void @other()
+  tail call void asm sideeffect ".seh_handler __C_specific_handler, @except\0A", "~{dirflag},~{fpsr},~{flags}"()
+  ret i32 0
+}
+
+declare dso_local void @other()
+
+;--- other.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define dso_local void @other() {
+entry:
+  ret void
+}
+
+;--- personality.ll
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-w64-windows-gnu"
+
+define void @__C_specific_handler() {
+entry:
+  ret void
+}

Copy link

github-actions bot commented Nov 5, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 1d95a071d6fc43fb413a0d3f5a9d1e52a18abab0 373f2c7d9e8043f50d6be1849d3feebde44ce9d1 -- lld/COFF/SymbolTable.cpp lld/COFF/SymbolTable.h
View the diff from clang-format here.
diff --git a/lld/COFF/SymbolTable.cpp b/lld/COFF/SymbolTable.cpp
index 15c76461a..27e1bc75d 100644
--- a/lld/COFF/SymbolTable.cpp
+++ b/lld/COFF/SymbolTable.cpp
@@ -62,7 +62,8 @@ void SymbolTable::addFile(InputFile *file) {
       ctx.objFileInstances.push_back(f);
     } else if (auto *f = dyn_cast<BitcodeFile>(file)) {
       if (ltoCompilationDone) {
-        error("LTO object file " + toString(file) + " linked in after "
+        error("LTO object file " + toString(file) +
+              " linked in after "
               "doing LTO compilation.");
       }
       ctx.bitcodeFileInstances.push_back(f);

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@mstorsjo mstorsjo merged commit 7f9a004 into llvm:main Nov 7, 2023
6 of 7 checks passed
@mstorsjo mstorsjo deleted the lld-lto-late-lto branch November 7, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lld:COFF lld LTO Link time optimization (regular/full LTO or ThinLTO) platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants