-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLD][COFF] Add -build-id flag to generate .buildid section. #71433
Conversation
@llvm/pr-subscribers-lld-coff @llvm/pr-subscribers-lld Author: Zequan Wu (ZequanWu) ChangesWith this change, lld-link generates
So, it will generate duplicated debug directories when Full diff: https://github.com/llvm/llvm-project/pull/71433.diff 6 Files Affected:
diff --git a/lld/COFF/Config.h b/lld/COFF/Config.h
index 1c338cc63fa87d2..4e4811357d26a46 100644
--- a/lld/COFF/Config.h
+++ b/lld/COFF/Config.h
@@ -317,6 +317,7 @@ struct Configuration {
bool writeCheckSum = false;
EmitKind emit = EmitKind::Obj;
bool allowDuplicateWeak = false;
+ bool buildID = false;
};
} // namespace lld::coff
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 5613c2e6993a5af..0fa6fd6d34cf6a7 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2332,6 +2332,11 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
}
}
+ // Generate build id hash in .buildid section when:
+ // 1. /build-id
+ // 2. /lldmingw + /debug and not generating pdb file.
+ config->buildID = args.hasArg(OPT_build_id) ||
+ (config->mingw && config->debug && !shouldCreatePDB);
// Set default image base if /base is not given.
if (config->imageBase == uint64_t(-1))
config->imageBase = getDefaultImageBase();
diff --git a/lld/COFF/Options.td b/lld/COFF/Options.td
index 977657a433dc581..cc7ce4fdbeea037 100644
--- a/lld/COFF/Options.td
+++ b/lld/COFF/Options.td
@@ -299,6 +299,8 @@ def : Flag<["--"], "time-trace">, Alias<time_trace_eq>,
def time_trace_granularity_eq: Joined<["--"], "time-trace-granularity=">,
HelpText<"Minimum time granularity (in microseconds) traced by time profiler">;
+def build_id: F<"build-id">, HelpText<"Generate build ID">;
+
// Flags for debugging
def lldmap : F<"lldmap">;
def lldmap_file : P_priv<"lldmap">;
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 43d8e7c1d530859..dca148a62e9d171 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1041,14 +1041,14 @@ void Writer::createMiscChunks() {
// Create Debug Information Chunks
OutputSection *debugInfoSec = config->mingw ? buildidSec : rdataSec;
- if (config->debug || config->repro || config->cetCompat) {
+ if (config->buildID || config->debug || config->repro || config->cetCompat) {
debugDirectory =
make<DebugDirectoryChunk>(ctx, debugRecords, config->repro);
debugDirectory->setAlignment(4);
debugInfoSec->addChunk(debugDirectory);
}
- if (config->debug) {
+ if (config->debug || config->buildID) {
// Make a CVDebugRecordChunk even when /DEBUG:CV is not specified. We
// output a PDB no matter what, and this chunk provides the only means of
// allowing a debugger to match a PDB and an executable. So we need it even
@@ -1069,6 +1069,16 @@ void Writer::createMiscChunks() {
debugInfoSec->addChunk(r.second);
}
+ // If .buildid section was not chosen and /build-id is given, create .buildid
+ // section.
+ if (config->buildID && debugInfoSec != buildidSec) {
+ buildidSec->addChunk(debugDirectory);
+ for (std::pair<COFF::DebugType, Chunk *> r : debugRecords) {
+ r.second->setAlignment(4);
+ buildidSec->addChunk(r.second);
+ }
+ }
+
// Create SEH table. x86-only.
if (config->safeSEH)
createSEHTable();
@@ -2028,7 +2038,7 @@ void Writer::writeBuildId() {
// PE contents.
Configuration *config = &ctx.config;
- if (config->debug) {
+ if (config->debug || config->buildID) {
assert(buildId && "BuildId is not set!");
// BuildId->BuildId was filled in when the PDB was written.
}
@@ -2043,16 +2053,14 @@ void Writer::writeBuildId() {
uint32_t timestamp = config->timestamp;
uint64_t hash = 0;
- bool generateSyntheticBuildId =
- config->mingw && config->debug && config->pdbPath.empty();
- if (config->repro || generateSyntheticBuildId)
+ if (config->repro || config->buildID)
hash = xxh3_64bits(outputFileData);
if (config->repro)
timestamp = static_cast<uint32_t>(hash);
- if (generateSyntheticBuildId) {
+ if (config->buildID) {
// For MinGW builds without a PDB file, we still generate a build id
// to allow associating a crash dump to the executable.
buildId->buildId->PDB70.CVSignature = OMF::Signature::PDB70;
diff --git a/lld/MinGW/Options.td b/lld/MinGW/Options.td
index fa4c4ecc75d6543..87b5c4fff68cef6 100644
--- a/lld/MinGW/Options.td
+++ b/lld/MinGW/Options.td
@@ -196,6 +196,7 @@ defm guard_longjmp : B<"guard-longjmp",
"Do not enable Control Flow Guard long jump hardening">;
defm error_limit:
EqLong<"error-limit", "Maximum number of errors to emit before stopping (0 = no limit)">;
+def build_id: F<"build-id">, HelpText<"Generate build ID">;
// Alias
def alias_Bdynamic_call_shared: Flag<["-"], "call_shared">, Alias<Bdynamic>;
@@ -213,7 +214,6 @@ def alias_undefined_u: JoinedOrSeparate<["-"], "u">, Alias<undefined>;
// Ignored options
def: Joined<["-"], "O">;
def: F<"as-needed">;
-def: F<"build-id">;
def: F<"disable-auto-image-base">;
def: F<"enable-auto-image-base">;
def: F<"end-group">;
diff --git a/lld/test/COFF/rsds.test b/lld/test/COFF/rsds.test
index 475249ca4056669..783ac08255c031b 100644
--- a/lld/test/COFF/rsds.test
+++ b/lld/test/COFF/rsds.test
@@ -22,9 +22,23 @@
# RUN: lld-link /Brepro /debug /dll /out:%t.dll /entry:DllMain %t.obj
# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix REPRODEBUG %s
+# Generate .buildid section when /lldmingw + /debug:dwarf are given and not generating pdb file.
# RUN: rm -f %t.dll %t.pdb
# RUN: lld-link /lldmingw /debug:dwarf /dll /out:%t.dll /entry:DllMain %t.obj
-# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix MINGW %s
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /lldmingw /debug:dwarf /build-id /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# Generate .buildid section when /build-id is given.
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /build-id /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
+
+# RUN: rm -f %t.dll %t.pdb
+# RUN: lld-link /build-id /debug /dll /out:%t.dll /entry:DllMain %t.obj
+# RUN: llvm-readobj --coff-debug-directory %t.dll | FileCheck --check-prefix BUILDID %s
# CHECK: File: [[FILE:.*]].dll
# CHECK: DebugDirectory [
@@ -148,25 +162,25 @@
# REPRODEBUG: }
# REPRODEBUG: ]
-# MINGW: File: {{.*}}.dll
-# MINGW: DebugDirectory [
-# MINGW: DebugEntry {
-# MINGW: Characteristics: 0x0
-# MINGW: TimeDateStamp:
-# MINGW: MajorVersion: 0x0
-# MINGW: MinorVersion: 0x0
-# MINGW: Type: CodeView (0x2)
-# MINGW: SizeOfData: 0x{{[^0]}}
-# MINGW: AddressOfRawData: 0x{{[^0]}}
-# MINGW: PointerToRawData: 0x{{[^0]}}
-# MINGW: PDBInfo {
-# MINGW: PDBSignature: 0x53445352
-# MINGW: PDBGUID: [[GUID:\(([A-Za-z0-9]{2} ?){16}\)]]
-# MINGW: PDBAge: 1
-# MINGW: PDBFileName:
-# MINGW: }
-# MINGW: }
-# MINGW: ]
+# BUILDID: File: {{.*}}.dll
+# BUILDID: DebugDirectory [
+# BUILDID: DebugEntry {
+# BUILDID: Characteristics: 0x0
+# BUILDID: TimeDateStamp:
+# BUILDID: MajorVersion: 0x0
+# BUILDID: MinorVersion: 0x0
+# BUILDID: Type: CodeView (0x2)
+# BUILDID: SizeOfData: 0x{{[^0]}}
+# BUILDID: AddressOfRawData: 0x{{[^0]}}
+# BUILDID: PointerToRawData: 0x{{[^0]}}
+# BUILDID: PDBInfo {
+# BUILDID: PDBSignature: 0x53445352
+# BUILDID: PDBGUID: [[GUID:\(([A-Za-z0-9]{2} ?){16}\)]]
+# BUILDID: PDBAge: 1
+# BUILDID: PDBFileName:
+# BUILDID: }
+# BUILDID: }
+# BUILDID: ]
--- !COFF
header:
Machine: IMAGE_FILE_MACHINE_I386
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly reasonable to me, thanks! A couple comments below.
Also, you've got a typo in the subject, "genrate".
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MinGW side parts LGTM, with a few minor comments on the test.
For the COFF side (sorry for forgetting to recheck that one earlier), I think the changes are ok, but I'd appreciate if someone else who is familiar with those parts wants to have a look at those bits as well. If not I guess I can give it a second closer look.
There's some restructurings for how the different hashes/structures are made (I presume since you're not reusing the chunks but creating two separate sets, if needed) - have you checked that the output is binary identical to before (where applicable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some restructurings for how the different hashes/structures are made (I presume since you're not reusing the chunks but creating two separate sets, if needed) - have you checked that the output is binary identical to before (where applicable)?
Oh, I found a bug that it generates .buildid when it shouldn't. It's complicated on how those flags interacting with each other. After the fix, the output binary is identical to before when not generating pdb and with /Brepro
. With PDB, the PDB content hash in debug directory are different which also happens on lld without any changes, maybe executable file name is part of the PDB hashing. (the only difference is executable names when comparing)
Oh, good catches! I think the code in Writer.cpp looks more straightforward/understandable in this case now as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nobody else wants to comment on this, I guess I can approve it.
(OTOH, if potential reviewers had a long weekend with a holiday, perhaps we can wait another day still before landing it?)
Reid is busy this week. Will ping him Friday for reviewing. |
|
Hmm, if you're using a symbol, how are you locating the symbol later when inspecting things? MSVC style builds normally don't include a symbol table at all - do you build with debug info and resolve it from the PDB file somehow? Or do you use |
By using symbol, I meant to let lld-link add a synthetic symbol llvm-project/lld/COFF/Driver.cpp Line 2373 in ebb8ffd
At the source that references |
Ah, this was the bit I was missing. Yes, in that case it should be quite straightforward - I thought of how to read it from outside of the binary.
Indeed, from outside of the process, the current approach should be fine. |
Updated to not generate extra |
lld/COFF/Writer.cpp
Outdated
@@ -1067,6 +1071,7 @@ void Writer::createMiscChunks() { | |||
for (std::pair<COFF::DebugType, Chunk *> r : debugRecords) { | |||
r.second->setAlignment(4); | |||
debugInfoSec->addChunk(r.second); | |||
debugDirectory->addRecord(r.first, r.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you no longer can write the build ID to two separate sections, you could revert these changes for the records array, to just keep a reference to the existing debugRecords
, simplifying the patch a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Reverted changes on record vectors.
lld/MinGW/Driver.cpp
Outdated
add("-build-id"); | ||
} | ||
} else | ||
add("-build-id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for bringing up more changes to this bit here, but I realized that this change breaks one existing usage pattern slightly. In mstorsjo/llvm-mingw#352 I helped a user to the conclusion that if linking with -s
or -S
, lld no longer produces a build id automatically.
So for this else case here, when there's no --build-id
argument present, we should probably do this:
} else { // no OPT_build_id present
if (args.hasArg(OPT_strip_debug) || args.hasArg(OPT_strip_all))
add("-build-id:no");
else
add("-build-id");
}
(Plus a corresponding matching testcase update.)
That would preserve that aspect of the existing behaviour.
Sorry for the extra work on this bit, I guess this is what we get from changing the behaviour of the lld-link interface. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your persistence!
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html - use `-fstrict-flex-arrays=3` for llvm 16 and gcc 13 - use `-mbranch-protection=standard` for arm64 - use `-fstack-clash-protection` for x64/x86 with llvm, and all gcc - use `-fcf-protection=full` for x64/x86 - use `-Wl,-z,nodlopen` - set `-Wl,-z,noexecstack`, though it's the default according to earlier tests. OpenSFF recommends `-fstack-protector-strong`. We were already using `-fstack-protector-all` that supercedes it according to the documentation. Also: - Prepare for gcc 14 `-fhardened` option. Subject to future tests. https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened With the advantage of automatic of `_FORTIFY_SOURCE=3` when glibc version allows it. It also uses -fPIE which still needs to be tested when gcc 14 is out, because enabling it now manually breaks almost all builds: https://github.com/curl/curl-for-win/actions/runs/7053671784/job/19201167850#step:3:6484
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too, just some minor nits. And are there any docs somewhere that should be updated as well?
@@ -196,6 +196,9 @@ defm guard_longjmp : B<"guard-longjmp", | |||
"Do not enable Control Flow Guard long jump hardening">; | |||
defm error_limit: | |||
EqLong<"error-limit", "Maximum number of errors to emit before stopping (0 = no limit)">; | |||
def build_id: J<"build-id=">, HelpText<"Generate build ID note (pass none to disable)">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is one supposed to pass after =
to enable it?
If we only support -build-id
and -build-id=none
, perhaps those should just be two flags instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to be consistent with elf's build-id options, though it doesn't matter. The error message "unsupported build id hashing: {} using default hashing." is more user friendly than unknown flags if we handle it inside driver.
Related to In the past, I proposed a similar change D76482 for |
For COFF, I think it's okay to assume that build id size is a constant 16 bytes (or 20 bytes if we include the 4 byte age field). Many existing tools that makes uses of build id (or guid in the PDB term) already assume it's 16 bytes and use that to match binary and pdb. |
After #71433, lld-link is able to always generate build id even when PDB is not generated. This adds the `__buildid` symbol to points to the start of 16 bytes guid (which is after `RSDS`) and allows profile runtime to access it and dump it to raw profile.
RFC
Before, lld-link only generate the debug directory containing guid when generating PDB with the hash of PDB content.
With this change, lld-link can generate the debug directory when only
/build-id
is given:/build-id
is ignored. Same behaviour as before..rdata
section..buildid
section.