Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLD][COFF] Add -build-id flag to generate .buildid section. #71433

Merged
merged 13 commits into from
Dec 5, 2023

Conversation

ZequanWu
Copy link
Contributor

@ZequanWu ZequanWu commented Nov 6, 2023

RFC

Before, lld-link only generate the debug directory containing guid when generating PDB with the hash of PDB content.

With this change, lld-link can generate the debug directory when only /build-id is given:

  1. If generating PDB, /build-id is ignored. Same behaviour as before.
  2. Not generating PDB, using hash of the binary.
    • Not under MinGW, the debug directory is still in .rdata section.
    • Under MinGW, place the debug directory into new .buildid section.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 6, 2023

@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld

Author: Zequan Wu (ZequanWu)

Changes

RFC

With this change, lld-link generates .buildid section that contains build id hash under two cases:

  1. /buildid flag is given.
  2. /lldmingw and /debug are given and not generating PDB.

So, it will generate duplicated debug directories when /buildid and /pdb: are given, one in .buildid section and another one in .rdata section.


Full diff: https://github.com/llvm/llvm-project/pull/71433.diff

6 Files Affected:

  • (modified) lld/COFF/Config.h (+1)
  • (modified) lld/COFF/Driver.cpp (+5)
  • (modified) lld/COFF/Options.td (+2)
  • (modified) lld/COFF/Writer.cpp (+15-7)
  • (modified) lld/MinGW/Options.td (+1-1)
  • (modified) lld/test/COFF/rsds.test (+34-20)
diff --git a/lld/COFF/Config.h b/lld/COFF/Config.h
index 1c338cc63fa87d2..4e4811357d26a46 100644
--- a/lld/COFF/Config.h
+++ b/lld/COFF/Config.h
@@ -317,6 +317,7 @@ struct Configuration {
   bool writeCheckSum = false;
   EmitKind emit = EmitKind::Obj;
   bool allowDuplicateWeak = false;
+  bool buildID = false;
 };
 
 } // namespace lld::coff
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 5613c2e6993a5af..0fa6fd6d34cf6a7 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2332,6 +2332,11 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
     }
   }
 
+  // Generate build id hash in .buildid section when:
+  // 1. /build-id
+  // 2. /lldmingw + /debug and not generating pdb file.
+  config->buildID = args.hasArg(OPT_build_id) ||
+                    (config->mingw && config->debug && !shouldCreatePDB);
   // Set default image base if /base is not given.
   if (config->imageBase == uint64_t(-1))
     config->imageBase = getDefaultImageBase();
diff --git a/lld/COFF/Options.td b/lld/COFF/Options.td
index 977657a433dc581..cc7ce4fdbeea037 100644
--- a/lld/COFF/Options.td
+++ b/lld/COFF/Options.td
@@ -299,6 +299,8 @@ def : Flag<["--"], "time-trace">, Alias<time_trace_eq>,
 def time_trace_granularity_eq: Joined<["--"], "time-trace-granularity=">,
     HelpText<"Minimum time granularity (in microseconds) traced by time profiler">;
 
+def build_id: F<"build-id">, HelpText<"Generate build ID">;
+
 // Flags for debugging
 def lldmap : F<"lldmap">;
 def lldmap_file : P_priv<"lldmap">;
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 43d8e7c1d530859..dca148a62e9d171 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1041,14 +1041,14 @@ void Writer::createMiscChunks() {
 
   // Create Debug Information Chunks
   OutputSection *debugInfoSec = config->mingw ? buildidSec : rdataSec;
-  if (config->debug || config->repro || config->cetCompat) {
+  if (config->buildID || config->debug || config->repro || config->cetCompat) {
     debugDirectory =
         make<DebugDirectoryChunk>(ctx, debugRecords, config->repro);
     debugDirectory->setAlignment(4);
     debugInfoSec->addChunk(debugDirectory);
   }
 
-  if (config->debug) {
+  if (config->debug || config->buildID) {
     // Make a CVDebugRecordChunk even when /DEBUG:CV is not specified.  We
     // output a PDB no matter what, and this chunk provides the only means of
     // allowing a debugger to match a PDB and an executable.  So we need it even
@@ -1069,6 +1069,16 @@ void Writer::createMiscChunks() {
     debugInfoSec->addChunk(r.second);
   }
 
+  // If .buildid section was not chosen and /build-id is given, create .buildid
+  // section.
+  if (config->buildID && debugInfoSec != buildidSec) {
+    buildidSec->addChunk(debugDirectory);
+    for (std::pair<COFF::DebugType, Chunk *> r : debugRecords) {
+      r.second->setAlignment(4);
+      buildidSec->addChunk(r.second);
+    }
+  }
+
   // Create SEH table. x86-only.
   if (config->safeSEH)
     createSEHTable();
@@ -2028,7 +2038,7 @@ void Writer::writeBuildId() {
   // PE contents.
   Configuration *config = &ctx.config;
 
-  if (config->debug) {
+  if (config->debug || config->buildID) {
     assert(buildId && "BuildId is not set!");
     // BuildId->BuildId was filled in when the PDB was written.
   }
@@ -2043,16 +2053,14 @@ void Writer::writeBuildId() {
 
   uint32_t timestamp = config->timestamp;
   uint64_t hash = 0;
-  bool generateSyntheticBuildId =
-      config->mingw && config->debug && config->pdbPath.empty();
 
-  if (config->repro || generateSyntheticBuildId)
+  if (config->repro || config->buildID)
     hash = xxh3_64bits(outputFileData);
 
   if (config->repro)
     timestamp = static_cast<uint32_t>(hash);
 
-  if (generateSyntheticBuildId) {
+  if (config->buildID) {
     // For MinGW builds without a PDB file, we still generate a build id
     // to allow associating a crash dump to the executable.
     buildId->buildId->PDB70.CVSignature = OMF::Signature::PDB70;
diff --git a/lld/MinGW/Options.td b/lld/MinGW/Options.td
index fa4c4ecc75d6543..87b5c4fff68cef6 100644
--- a/lld/MinGW/Options.td
+++ b/lld/MinGW/Options.td
@@ -196,6 +196,7 @@ defm guard_longjmp : B<"guard-longjmp",
   "Do not enable Control Flow Guard long jump hardening">;
 defm error_limit:
   EqLong<"error-limit", "Maximum number of errors to emit before stopping (0 = no limit)">;
+def build_id: F<"build-id">, HelpText<"Generate build ID">;
 
 // Alias
 def alias_Bdynamic_call_shared: Flag<["-"], "call_shared">, Alias<Bdynamic>;
@@ -213,7 +214,6 @@ def alias_undefined_u: JoinedOrSeparate<["-"], "u">, Alias<undefined>;
 // Ignored options
 def: Joined<["-"], "O">;
 def: F<"as-needed">;
-def: F<"build-id">;
 def: F<"disable-auto-image-base">;
 def: F<"enable-auto-image-base">;
 def: F<"end-group">;
diff --git a/lld/test/COFF/rsds.test b/lld/test/COFF/rsds.test
index 475249ca4056669..783ac08255c031b 100644
--- a/lld/test/COFF/rsds.test
+++ b/lld/test/COFF/rsds.test
@@ -22,9 +22,23 @@
 # RUN: lld-link /Brepro /debug /dll /out:%t.dll /entry:DllMain %t.obj
 # RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix REPRODEBUG %s
 
+# Generate .buildid section when /lldmingw + /debug:dwarf are given and not generating pdb file.
 # RUN: rm -f %t.dll %t.pdb
 # RUN: lld-link /lldmingw /debug:dwarf /dll /out:%t.dll /entry:DllMain %t.obj
-# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix MINGW %s
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /lldmingw /debug:dwarf /build-id /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# Generate .buildid section when /build-id is given.
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /build-id /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /build-id /debug /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
 
 # CHECK: File: [[FILE:.*]].dll
 # CHECK: DebugDirectory [
@@ -148,25 +162,25 @@
 # REPRODEBUG:   }
 # REPRODEBUG: ]
 
-# MINGW: File: {{.*}}.dll
-# MINGW: DebugDirectory [
-# MINGW:   DebugEntry {
-# MINGW:     Characteristics: 0x0
-# MINGW:     TimeDateStamp:
-# MINGW:     MajorVersion: 0x0
-# MINGW:     MinorVersion: 0x0
-# MINGW:     Type: CodeView (0x2)
-# MINGW:     SizeOfData: 0x{{[^0]}}
-# MINGW:     AddressOfRawData: 0x{{[^0]}}
-# MINGW:     PointerToRawData: 0x{{[^0]}}
-# MINGW:     PDBInfo {
-# MINGW:       PDBSignature: 0x53445352
-# MINGW:       PDBGUID: [[GUID:\(([A-Za-z0-9]{2} ?){16}\)]]
-# MINGW:       PDBAge: 1
-# MINGW:       PDBFileName:
-# MINGW:     }
-# MINGW:   }
-# MINGW: ]
+# BUILDID: File: {{.*}}.dll
+# BUILDID: DebugDirectory [
+# BUILDID:   DebugEntry {
+# BUILDID:     Characteristics: 0x0
+# BUILDID:     TimeDateStamp:
+# BUILDID:     MajorVersion: 0x0
+# BUILDID:     MinorVersion: 0x0
+# BUILDID:     Type: CodeView (0x2)
+# BUILDID:     SizeOfData: 0x{{[^0]}}
+# BUILDID:     AddressOfRawData: 0x{{[^0]}}
+# BUILDID:     PointerToRawData: 0x{{[^0]}}
+# BUILDID:     PDBInfo {
+# BUILDID:       PDBSignature: 0x53445352
+# BUILDID:       PDBGUID: [[GUID:\(([A-Za-z0-9]{2} ?){16}\)]]
+# BUILDID:       PDBAge: 1
+# BUILDID:       PDBFileName:
+# BUILDID:     }
+# BUILDID:   }
+# BUILDID: ]
 --- !COFF
 header:
   Machine:         IMAGE_FILE_MACHINE_I386

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly reasonable to me, thanks! A couple comments below.

Also, you've got a typo in the subject, "genrate".

lld/COFF/Writer.cpp Outdated Show resolved Hide resolved
lld/COFF/Options.td Outdated Show resolved Hide resolved
lld/MinGW/Options.td Outdated Show resolved Hide resolved
@ZequanWu ZequanWu changed the title [LLD][COFF] Add -build-id flag to genrate .buildid section. [LLD][COFF] Add -build-id flag to generate .buildid section. Nov 7, 2023
Copy link

github-actions bot commented Nov 7, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
lld/MinGW/Options.td Show resolved Hide resolved
lld/MinGW/Driver.cpp Outdated Show resolved Hide resolved
lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MinGW side parts LGTM, with a few minor comments on the test.

For the COFF side (sorry for forgetting to recheck that one earlier), I think the changes are ok, but I'd appreciate if someone else who is familiar with those parts wants to have a look at those bits as well. If not I guess I can give it a second closer look.

There's some restructurings for how the different hashes/structures are made (I presume since you're not reusing the chunks but creating two separate sets, if needed) - have you checked that the output is binary identical to before (where applicable)?

lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
lld/test/MinGW/driver.test Show resolved Hide resolved
Copy link
Contributor Author

@ZequanWu ZequanWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some restructurings for how the different hashes/structures are made (I presume since you're not reusing the chunks but creating two separate sets, if needed) - have you checked that the output is binary identical to before (where applicable)?

Oh, I found a bug that it generates .buildid when it shouldn't. It's complicated on how those flags interacting with each other. After the fix, the output binary is identical to before when not generating pdb and with /Brepro. With PDB, the PDB content hash in debug directory are different which also happens on lld without any changes, maybe executable file name is part of the PDB hashing. (the only difference is executable names when comparing)

lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
lld/test/MinGW/driver.test Show resolved Hide resolved
@mstorsjo
Copy link
Member

mstorsjo commented Nov 9, 2023

Oh, I found a bug that it generates .buildid when it shouldn't. It's complicated on how those flags interacting with each other. After the fix, the output binary is identical to before when not generating pdb and with /Brepro. With PDB, the PDB content hash in debug directory are different which also happens on lld without any changes, maybe executable file name is part of the PDB hashing. (the only difference is executable names when comparing)

Oh, good catches! I think the code in Writer.cpp looks more straightforward/understandable in this case now as well.

lld/COFF/Writer.cpp Outdated Show resolved Hide resolved
lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
lld/test/MinGW/driver.test Outdated Show resolved Hide resolved
Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nobody else wants to comment on this, I guess I can approve it.

(OTOH, if potential reviewers had a long weekend with a holiday, perhaps we can wait another day still before landing it?)

@ZequanWu
Copy link
Contributor Author

Reid is busy this week. Will ping him Friday for reviewing.

@ZequanWu
Copy link
Contributor Author

  1. I feel like it's not necessary to create a separate .buildid section at all when not using MinGW driver. Initially, I thought if I want to find build id at runtime, I need to somehow find the program header and locate .buildid. With a separate section, it's easier to find it. @rnk suggested adding a synthetic __lld_buildid symbol to lld-link pointing to the chunk, so it's easier to locate it. I have a local patch for it. With that, it doesn't matter if buildid is in a separate section or not. So, I think just always put it in .rdata under non-MinGW mode is enough.
  2. Maybe GUID is a better name for the flag option and variable (__lld_guid?), etc? Build id is a term commonly used in ELF.

@mstorsjo
Copy link
Member

  1. I feel like it's not necessary to create a separate .buildid section at all when not using MinGW driver. Initially, I thought if I want to find build id at runtime, I need to somehow find the program header and locate .buildid. With a separate section, it's easier to find it. @rnk suggested adding a synthetic __lld_buildid symbol to lld-link pointing to the chunk, so it's easier to locate it. I have a local patch for it. With that, it doesn't matter if buildid is in a separate section or not. So, I think just always put it in .rdata under non-MinGW mode is enough.

Hmm, if you're using a symbol, how are you locating the symbol later when inspecting things? MSVC style builds normally don't include a symbol table at all - do you build with debug info and resolve it from the PDB file somehow? Or do you use -debug:symtab?

@ZequanWu
Copy link
Contributor Author

  1. I feel like it's not necessary to create a separate .buildid section at all when not using MinGW driver. Initially, I thought if I want to find build id at runtime, I need to somehow find the program header and locate .buildid. With a separate section, it's easier to find it. @rnk suggested adding a synthetic __lld_buildid symbol to lld-link pointing to the chunk, so it's easier to locate it. I have a local patch for it. With that, it doesn't matter if buildid is in a separate section or not. So, I think just always put it in .rdata under non-MinGW mode is enough.

Hmm, if you're using a symbol, how are you locating the symbol later when inspecting things? MSVC style builds normally don't include a symbol table at all - do you build with debug info and resolve it from the PDB file somehow? Or do you use -debug:symtab?

By using symbol, I meant to let lld-link add a synthetic symbol __lld_buildid if /build-id is passed. That points the the chunk contains build id, just like __CTOR_LIST__ symbol

ctx.symtab.addAbsolute(mangle("__CTOR_LIST__"), 0);
.

At the source that references __lld_buildid, it can use extern "C" __declspec(selectany) uint8_t __lld_buildid[24]; to read the build id.
For tools like llvm-readobj, it is already able to read build id even if it's inside .rdata by looking through the program header.

@mstorsjo
Copy link
Member

At the source that references __lld_buildid, it can use extern "C" __declspec(selectany) uint8_t __lld_buildid[24]; to read the build id.

Ah, this was the bit I was missing. Yes, in that case it should be quite straightforward - I thought of how to read it from outside of the binary.

For tools like llvm-readobj, it is already able to read build id even if it's inside .rdata by looking through the program header.

Indeed, from outside of the process, the current approach should be fine.

@ZequanWu
Copy link
Contributor Author

Updated to not generate extra .buildid section under non-MinGW mode. Reflected on description.

@@ -1067,6 +1071,7 @@ void Writer::createMiscChunks() {
for (std::pair<COFF::DebugType, Chunk *> r : debugRecords) {
r.second->setAlignment(4);
debugInfoSec->addChunk(r.second);
debugDirectory->addRecord(r.first, r.second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you no longer can write the build ID to two separate sections, you could revert these changes for the records array, to just keep a reference to the existing debugRecords, simplifying the patch a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Reverted changes on record vectors.

add("-build-id");
}
} else
add("-build-id");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for bringing up more changes to this bit here, but I realized that this change breaks one existing usage pattern slightly. In mstorsjo/llvm-mingw#352 I helped a user to the conclusion that if linking with -s or -S, lld no longer produces a build id automatically.

So for this else case here, when there's no --build-id argument present, we should probably do this:

} else { // no OPT_build_id present
  if (args.hasArg(OPT_strip_debug) || args.hasArg(OPT_strip_all))
    add("-build-id:no");
  else
    add("-build-id");
}

(Plus a corresponding matching testcase update.)
That would preserve that aspect of the existing behaviour.

Sorry for the extra work on this bit, I guess this is what we get from changing the behaviour of the lld-link interface. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your persistence!

Andarwinux referenced this pull request in curl/curl-for-win Dec 3, 2023
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html

- use `-fstrict-flex-arrays=3` for llvm 16 and gcc 13
- use `-mbranch-protection=standard` for arm64
- use `-fstack-clash-protection` for x64/x86 with llvm, and all gcc
- use `-fcf-protection=full` for x64/x86
- use `-Wl,-z,nodlopen`
- set `-Wl,-z,noexecstack`, though it's the default according to earlier tests.

OpenSFF recommends `-fstack-protector-strong`. We were already using
`-fstack-protector-all` that supercedes it according to the documentation.

Also:
- Prepare for gcc 14 `-fhardened` option. Subject to future tests.

  https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened

  With the advantage of automatic of `_FORTIFY_SOURCE=3` when glibc
  version allows it. It also uses -fPIE which still needs to be tested
  when gcc 14 is out, because enabling it now manually breaks almost
  all builds:
  https://github.com/curl/curl-for-win/actions/runs/7053671784/job/19201167850#step:3:6484
Copy link
Collaborator

@zmodem zmodem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too, just some minor nits. And are there any docs somewhere that should be updated as well?

lld/COFF/Options.td Outdated Show resolved Hide resolved
@@ -196,6 +196,9 @@ defm guard_longjmp : B<"guard-longjmp",
"Do not enable Control Flow Guard long jump hardening">;
defm error_limit:
EqLong<"error-limit", "Maximum number of errors to emit before stopping (0 = no limit)">;
def build_id: J<"build-id=">, HelpText<"Generate build ID note (pass none to disable)">,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is one supposed to pass after = to enable it?

If we only support -build-id and -build-id=none, perhaps those should just be two flags instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to be consistent with elf's build-id options, though it doesn't matter. The error message "unsupported build id hashing: {} using default hashing." is more user friendly than unknown flags if we handle it inside driver.

lld/test/COFF/rsds.test Outdated Show resolved Hide resolved
@ZequanWu ZequanWu merged commit aaf3a8d into llvm:main Dec 5, 2023
3 of 4 checks passed
@ZequanWu ZequanWu deleted the lld-build-id branch December 5, 2023 19:59
@petrhosek
Copy link
Member

petrhosek commented Dec 5, 2023

Related to __lld_buildid discussion, I'd prefer using a pair of symbols: __lld_buildid_start and __lld_buildid_end, so users of these symbols don't need to make assumptions about the size. This would also let us change the size in the future (if needed) without breaking all users.

In the past, I proposed a similar change D76482 for ld.lld which generates a pair of synthetic symbols __build_id_start and __build_id_end. Ideally, we would revive and land that change, and also implement a similar change for ld64.lld so the code that's intended to support different file formats (like the compiler-rt profile runtime) can use the same logic to read the build ID rather than needing separate logic for each format.

@ZequanWu
Copy link
Contributor Author

ZequanWu commented Dec 5, 2023

Related to __lld_buildid discussion, I'd prefer using a pair of symbols: __lld_buildid_start and __lld_buildid_end, so users of these symbols don't need to make assumptions about the size. This would also let us change the size in the future (if needed) without breaking all users.

For COFF, I think it's okay to assume that build id size is a constant 16 bytes (or 20 bytes if we include the 4 byte age field). Many existing tools that makes uses of build id (or guid in the PDB term) already assume it's 16 bytes and use that to match binary and pdb.

ZequanWu added a commit that referenced this pull request Dec 14, 2023
After #71433, lld-link is able to always generate build id even when PDB
is not generated.

This adds the `__buildid` symbol to points to the start of 16 bytes guid
(which is after `RSDS`) and allows profile runtime to access it and dump
it to raw profile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants