Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [LLD] [COFF] Restructure /debug: option handling, allow controlling features separately #74820

Closed
wants to merge 5 commits into from

Conversation

mstorsjo
Copy link
Member

@mstorsjo mstorsjo commented Dec 8, 2023

This is an RFC for a restructuring of the handling of the /debug: options in LLD/COFF. This consists of a series of small bitesize refactoring commits, to clearly show the procession. For actual review, if there's agreement on the general direction, I probably would split them up into individual PRs for all the steps.

The motive of the refactoring, is to allow controlling individual debug info features. Currently, if producing a PDB, one passes /debug to lld-link. However, if any of the input files also contain DWARF debug info, the output executable will also contain those bits of DWARF - /debug includes any DWARF debug info, which is omitted if /debug is left out. It would be good to be able to control things more granularly, like create a PDB, but skip any DWARF, but still retain a symbol table.

We do have a bunch of /debug:<foo> options, like /debug:dwarf and /debug:symtab, plus a bunch of variants like /debug:ghash and /debug:noghash.

(For reference, MS link.exe has none of these options, it only has /debug, and it passes DWARF sections into the output regardless of whther the /debug option was set.)

The main inflexibility with our current /debug: option handling, is that it is treated as an enum - we inspect the last option provided, and it selects one behaviour out of a number profiles. But if one want to be able to control all of these aspects individually, we're in a combinatorial nightmare, we'd need almost 4x4 potential combinations - PDB on/off, GHash on/off, DWARF on/off, symtab on/off, etc.

This branch tries to refactor the handling of the /debug: options so that instead of just inspecting the last, we iterate over all of them, progressively applying the settings from each of them. This mostly tries to keep the effect of the last flag dictating the chosen behaviour, like before, but e.g. /debug:dwarf or /debug:symtab doesn't disable PDB writing, even if there was an earlier /debug:full parsed before. On top of these, I add new flags /debug:nodwarf and /debug:nosymtab. This allows doing things like /debug:full,nodwarf,symtab to achieve the particular combination I mentioned before.

This allows handling the -s and -S options properly in the MinGW linker frontend, to omit the DWARF debug info (but possibly keep a symbol table) while creating a PDB.

… NFC.

This shouldn't have any user visible effect, but makes the logic
within the linker implementation more explicit.

Note how DWARF debug info sections were retained even if enabling
a link with PDB info only; that behaviour is preserved for now.
Don't treat the options as unique enum items, but more as flags that
can be composed, like the /opt: options.
This is almost an NFC.

Most option handling is like it was before; the last /debug: option
takes effect, however options like /debug:dwarf or /debug:symtab
don't disable writing a PDB.
These allow tweaking what gets implied by /debug and /debug:dwarf.
This allows avoiding including some stray DWARF sections (e.g. from
toolchain provided files), when writing a PDB file.

While that probably could be considered reasonable default behaviour,
PDB writing and including DWARF sections are two entirely orthogonal
concepts, and GNU ld (which can generate PDB files these days)
does include DWARF unless -S/-s is passed, when creating a PDB.
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 8, 2023

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-coff

Author: Martin Storsjö (mstorsjo)

Changes

This is an RFC for a restructuring of the handling of the /debug: options in LLD/COFF. This consists of a series of small bitesize refactoring commits, to clearly show the procession. For actual review, if there's agreement on the general direction, I probably would split them up into individual PRs for all the steps.

The motive of the refactoring, is to allow controlling individual debug info features. Currently, if producing a PDB, one passes /debug to lld-link. However, if any of the input files also contain DWARF debug info, the output executable will also contain those bits of DWARF - /debug includes any DWARF debug info, which is omitted if /debug is left out. It would be good to be able to control things more granularly, like create a PDB, but skip any DWARF, but still retain a symbol table.

We do have a bunch of /debug:&lt;foo&gt; options, like /debug:dwarf and /debug:symtab, plus a bunch of variants like /debug:ghash and /debug:noghash.

(For reference, MS link.exe has none of these options, it only has /debug, and it passes DWARF sections into the output regardless of whther the /debug option was set.)

The main inflexibility with our current /debug: option handling, is that it is treated as an enum - we inspect the last option provided, and it selects one behaviour out of a number profiles. But if one want to be able to control all of these aspects individually, we're in a combinatorial nightmare, we'd need almost 4x4 potential combinations - PDB on/off, GHash on/off, DWARF on/off, symtab on/off, etc.

This branch tries to refactor the handling of the /debug: options so that instead of just inspecting the last, we iterate over all of them, progressively applying the settings from each of them. This mostly tries to keep the effect of the last flag dictating the chosen behaviour, like before, but e.g. /debug:dwarf or /debug:symtab doesn't disable PDB writing, even if there was an earlier /debug:full parsed before. On top of these, I add new flags /debug:nodwarf and /debug:nosymtab. This allows doing things like /debug:full,nodwarf,symtab to achieve the particular combination I mentioned before.

This allows handling the -s and -S options properly in the MinGW linker frontend, to omit the DWARF debug info (but possibly keep a symbol table) while creating a PDB.


Full diff: https://github.com/llvm/llvm-project/pull/74820.diff

8 Files Affected:

  • (modified) lld/COFF/Config.h (+2-2)
  • (modified) lld/COFF/Driver.cpp (+67-57)
  • (modified) lld/COFF/InputFiles.cpp (+2-2)
  • (modified) lld/COFF/Writer.cpp (+1-1)
  • (modified) lld/MinGW/Driver.cpp (+5)
  • (modified) lld/test/COFF/sort-debug.test (+4)
  • (modified) lld/test/COFF/symtab.test (+4)
  • (modified) lld/test/MinGW/driver.test (+7)
diff --git a/lld/COFF/Config.h b/lld/COFF/Config.h
index 24126f635a06f..48c48cefe91d8 100644
--- a/lld/COFF/Config.h
+++ b/lld/COFF/Config.h
@@ -130,9 +130,9 @@ struct Configuration {
   bool forceMultipleRes = false;
   bool forceUnresolved = false;
   bool debug = false;
-  bool debugDwarf = false;
+  bool includeDwarfChunks = false;
   bool debugGHashes = false;
-  bool debugSymtab = false;
+  bool writeSymtab = false;
   bool driver = false;
   bool driverUponly = false;
   bool driverWdm = false;
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 99c1a60735adc..ab14817e635dd 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -858,46 +858,6 @@ static std::string createResponseFile(const opt::InputArgList &args,
   return std::string(data.str());
 }
 
-enum class DebugKind {
-  Unknown,
-  None,
-  Full,
-  FastLink,
-  GHash,
-  NoGHash,
-  Dwarf,
-  Symtab
-};
-
-static DebugKind parseDebugKind(const opt::InputArgList &args) {
-  auto *a = args.getLastArg(OPT_debug, OPT_debug_opt);
-  if (!a)
-    return DebugKind::None;
-  if (a->getNumValues() == 0)
-    return DebugKind::Full;
-
-  DebugKind debug = StringSwitch<DebugKind>(a->getValue())
-                        .CaseLower("none", DebugKind::None)
-                        .CaseLower("full", DebugKind::Full)
-                        .CaseLower("fastlink", DebugKind::FastLink)
-                        // LLD extensions
-                        .CaseLower("ghash", DebugKind::GHash)
-                        .CaseLower("noghash", DebugKind::NoGHash)
-                        .CaseLower("dwarf", DebugKind::Dwarf)
-                        .CaseLower("symtab", DebugKind::Symtab)
-                        .Default(DebugKind::Unknown);
-
-  if (debug == DebugKind::FastLink) {
-    warn("/debug:fastlink unsupported; using /debug:full");
-    return DebugKind::Full;
-  }
-  if (debug == DebugKind::Unknown) {
-    error("/debug: unknown option: " + Twine(a->getValue()));
-    return DebugKind::None;
-  }
-  return debug;
-}
-
 static unsigned parseDebugTypes(const opt::InputArgList &args) {
   unsigned debugTypes = static_cast<unsigned>(DebugType::None);
 
@@ -1647,12 +1607,72 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   if (args.hasArg(OPT_force, OPT_force_multipleres))
     config->forceMultipleRes = true;
 
+  // Don't warn about long section names, such as .debug_info, for mingw (or
+  // when -debug:dwarf is requested, handled below).
+  if (config->mingw)
+    config->warnLongSectionNames = false;
+
+  bool doGC = true;
+
   // Handle /debug
-  DebugKind debug = parseDebugKind(args);
-  if (debug == DebugKind::Full || debug == DebugKind::Dwarf ||
-      debug == DebugKind::GHash || debug == DebugKind::NoGHash) {
-    config->debug = true;
-    config->incremental = true;
+  bool shouldCreatePDB = false;
+  for (auto *arg : args.filtered(OPT_debug, OPT_debug_opt)) {
+    std::string str;
+    if (arg->getOption().getID() == OPT_debug)
+      str = "full";
+    else
+      str = StringRef(arg->getValue()).lower();
+    SmallVector<StringRef, 1> vec;
+    StringRef(str).split(vec, ',');
+    for (StringRef s : vec) {
+      if (s == "fastlink") {
+        warn("/debug:fastlink unsupported; using /debug:full");
+        s = "full";
+      }
+      if (s == "none") {
+        config->debug = false;
+        config->incremental = false;
+        config->includeDwarfChunks = false;
+        config->debugGHashes = false;
+        shouldCreatePDB = false;
+        doGC = true;
+      } else if (s == "full") {
+        config->debug = true;
+        config->incremental = true;
+        config->includeDwarfChunks = true;
+        config->debugGHashes = true;
+        shouldCreatePDB = true;
+        doGC = false;
+      } else if (s == "ghash") { // LLD extensions
+        config->debug = true;
+        config->incremental = true;
+        config->includeDwarfChunks = true;
+        config->debugGHashes = true;
+        shouldCreatePDB = true;
+        doGC = false;
+      } else if (s == "noghash") {
+        config->debug = true;
+        config->incremental = true;
+        config->includeDwarfChunks = true;
+        shouldCreatePDB = true;
+        doGC = false;
+      } else if (s == "dwarf") {
+        config->debug = true;
+        config->incremental = true;
+        config->includeDwarfChunks = true;
+        config->writeSymtab = true;
+        config->warnLongSectionNames = false;
+      } else if (s == "nodwarf") {
+        config->includeDwarfChunks = false;
+      } else if (s == "symtab") {
+        doGC = false;
+        config->writeSymtab = true;
+      } else if (s == "nosymtab") {
+        config->writeSymtab = false;
+      } else {
+        error("/debug: unknown option: " + Twine(arg->getValue()));
+      }
+    }
   }
 
   // Handle /demangle
@@ -1672,9 +1692,6 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
       config->driverUponly || config->driverWdm || args.hasArg(OPT_driver);
 
   // Handle /pdb
-  bool shouldCreatePDB =
-      (debug == DebugKind::Full || debug == DebugKind::GHash ||
-       debug == DebugKind::NoGHash);
   if (shouldCreatePDB) {
     if (auto *arg = args.getLastArg(OPT_pdb))
       config->pdbPath = arg->getValue();
@@ -1837,8 +1854,9 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
 
   config->noimplib = args.hasArg(OPT_noimplib);
 
+  if (args.hasArg(OPT_profile))
+    doGC = true;
   // Handle /opt.
-  bool doGC = debug == DebugKind::None || args.hasArg(OPT_profile);
   std::optional<ICFLevel> icfLevel;
   if (args.hasArg(OPT_profile))
     icfLevel = ICFLevel::None;
@@ -2028,9 +2046,6 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
     parseSwaprun(arg->getValue());
   config->terminalServerAware =
       !config->dll && args.hasFlag(OPT_tsaware, OPT_tsaware_no, true);
-  config->debugDwarf = debug == DebugKind::Dwarf;
-  config->debugGHashes = debug == DebugKind::GHash || debug == DebugKind::Full;
-  config->debugSymtab = debug == DebugKind::Symtab;
   config->autoImport =
       args.hasFlag(OPT_auto_import, OPT_auto_import_no, config->mingw);
   config->pseudoRelocs = args.hasFlag(
@@ -2047,11 +2062,6 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   if (args.hasFlag(OPT_inferasanlibs, OPT_inferasanlibs_no, false))
     warn("ignoring '/inferasanlibs', this flag is not supported");
 
-  // Don't warn about long section names, such as .debug_info, for mingw or
-  // when -debug:dwarf is requested.
-  if (config->mingw || config->debugDwarf)
-    config->warnLongSectionNames = false;
-
   if (config->incremental && args.hasArg(OPT_profile)) {
     warn("ignoring '/incremental' due to '/profile' specification");
     config->incremental = false;
diff --git a/lld/COFF/InputFiles.cpp b/lld/COFF/InputFiles.cpp
index dd2e1419bb10a..cd744800cb0de 100644
--- a/lld/COFF/InputFiles.cpp
+++ b/lld/COFF/InputFiles.cpp
@@ -236,8 +236,8 @@ SectionChunk *ObjFile::readSection(uint32_t sectionNumber,
   // CodeView needs linker support. We need to interpret debug info,
   // and then write it to a separate .pdb file.
 
-  // Ignore DWARF debug info unless /debug is given.
-  if (!ctx.config.debug && name.starts_with(".debug_"))
+  // Ignore DWARF debug info unless requested to be included.
+  if (!ctx.config.includeDwarfChunks && name.starts_with(".debug_"))
     return nullptr;
 
   if (sec->Characteristics & llvm::COFF::IMAGE_SCN_LNK_REMOVE)
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 7b1ff8071e2e3..ec5d10ed99032 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1376,7 +1376,7 @@ void Writer::createSymbolAndStringTable() {
     sec->setStringTableOff(addEntryToStringTable(sec->name));
   }
 
-  if (ctx.config.debugDwarf || ctx.config.debugSymtab) {
+  if (ctx.config.writeSymtab) {
     for (ObjFile *file : ctx.objFileInstances) {
       for (Symbol *b : file->getSymbols()) {
         auto *d = dyn_cast_or_null<Defined>(b);
diff --git a/lld/MinGW/Driver.cpp b/lld/MinGW/Driver.cpp
index d22b617cf2f01..94f0ae7993e62 100644
--- a/lld/MinGW/Driver.cpp
+++ b/lld/MinGW/Driver.cpp
@@ -297,6 +297,11 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
     StringRef v = a->getValue();
     if (!v.empty())
       add("-pdb:" + v);
+    if (args.hasArg(OPT_strip_all)) {
+      add("-debug:nodwarf,nosymtab");
+    } else if (args.hasArg(OPT_strip_debug)) {
+      add("-debug:nodwarf,symtab");
+    }
   } else if (args.hasArg(OPT_strip_debug)) {
     add("-debug:symtab");
   } else if (!args.hasArg(OPT_strip_all)) {
diff --git a/lld/test/COFF/sort-debug.test b/lld/test/COFF/sort-debug.test
index bbe2ecd0efd8d..30cad39466548 100644
--- a/lld/test/COFF/sort-debug.test
+++ b/lld/test/COFF/sort-debug.test
@@ -7,6 +7,10 @@
 # RUN: llvm-readobj --sections %t.exe | FileCheck -check-prefix=NODEBUG %s
 # RUN: lld-link /debug:symtab /out:%t.exe /entry:main %t.obj
 # RUN: llvm-readobj --sections %t.exe | FileCheck -check-prefix=NODEBUG %s
+# RUN: lld-link /debug:full,nodwarf /out:%t.exe /entry:main %t.obj
+# RUN: llvm-readobj --sections %t.exe | FileCheck -check-prefix=NODEBUG %s
+# RUN: lld-link /debug:full /debug:nodwarf /out:%t.exe /entry:main %t.obj
+# RUN: llvm-readobj --sections %t.exe | FileCheck -check-prefix=NODEBUG %s
 
 # CHECK: Name: .text
 # CHECK: Name: .reloc
diff --git a/lld/test/COFF/symtab.test b/lld/test/COFF/symtab.test
index 48c749957a422..45e8ed39737a4 100644
--- a/lld/test/COFF/symtab.test
+++ b/lld/test/COFF/symtab.test
@@ -5,9 +5,13 @@
 # RUN: llvm-readobj --symbols %t.exe | FileCheck %s
 # RUN: lld-link /debug:symtab /opt:noref /out:%t.exe /entry:main %t.obj %p/Inputs/std64.lib
 # RUN: llvm-readobj --symbols %t.exe | FileCheck %s
+# RUN: lld-link /debug:full,symtab /opt:noref /out:%t.exe /entry:main %t.obj %p/Inputs/std64.lib
+# RUN: llvm-readobj --symbols %t.exe | FileCheck %s
 
 # RUN: lld-link /debug /out:%t.exe /entry:main %t.obj %p/Inputs/std64.lib
 # RUN: llvm-readobj --symbols %t.exe | FileCheck -check-prefix=NO %s
+# RUN: lld-link /debug:dwarf,nosymtab /out:%t.exe /entry:main %t.obj %p/Inputs/std64.lib
+# RUN: llvm-readobj --symbols %t.exe | FileCheck -check-prefix=NO %s
 
 # CHECK:      Symbols [
 # CHECK-NEXT:   Symbol {
diff --git a/lld/test/MinGW/driver.test b/lld/test/MinGW/driver.test
index d08c64258be89..74a21f20fc28f 100644
--- a/lld/test/MinGW/driver.test
+++ b/lld/test/MinGW/driver.test
@@ -143,6 +143,13 @@ PDB-NOT: -debug:dwarf
 RUN: ld.lld -### -m i386pep foo.o -pdb= 2>&1 | FileCheck -check-prefix PDB-DEFAULT %s
 PDB-DEFAULT: -debug
 PDB-DEFAULT-NOT: -pdb:{{.*}}
+PDB-DEFAULT-NOT: -debug:
+
+RUN: ld.lld -### -m i386pep foo.o -pdb= -S 2>&1 | FileCheck -check-prefix PDB-NODWARF %s
+PDB-NODWARF: -debug -debug:nodwarf,symtab
+
+RUN: ld.lld -### -m i386pep foo.o -pdb= -s 2>&1 | FileCheck -check-prefix PDB-NODWARF-NOSYMTAB %s
+PDB-NODWARF-NOSYMTAB: -debug -debug:nodwarf,nosymtab
 
 RUN: ld.lld -### -m i386pep foo.o --large-address-aware 2>&1 | FileCheck -check-prefix LARGE-ADDRESS-AWARE %s
 LARGE-ADDRESS-AWARE: -largeaddressaware

.CaseLower("full", DebugKind::Full)
.CaseLower("fastlink", DebugKind::FastLink)
// LLD extensions
.CaseLower("ghash", DebugKind::GHash)
Copy link
Member

@aganea aganea Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all options like 'full', 'fastlink', 'ghash', 'noghash' should just act in the same way and enable full debug with GHASH merging. We have GHASH now for several years now and it works well, I think we could disable selection of the traditional, non-multithreaded, type merging. Maybe warn on 'noghash' like we do for 'fastlink' and say it's no longer supported. +@rnk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aganea I guess that sounds reasonable, although I'd prefer not to take on that kind of a change in behaviour as part of this patchset, it's tricky enough as it is (even if I guess it would make the code a bit simpler in some places) - I'd leave that to someone more familiar with PDB/ghash overall.

Copy link
Contributor

@ZequanWu ZequanWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it. This makes it much more readable and easier reason about the interaction of different debug flags. LTGM.

Copy link
Collaborator

@tru tru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice cleanup! Makes the code much more readable!

bool debugGHashes = false;
bool debugSymtab = false;
bool writeSymtab = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't writeDebugSymtab a better name here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno - the symtab itself is entirely separate from debug info (even if one can't really use it for much anything else than help for inspecting things and debugging).

@@ -130,9 +130,9 @@ struct Configuration {
bool forceMultipleRes = false;
bool forceUnresolved = false;
bool debug = false;
bool debugDwarf = false;
bool includeDwarfChunks = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While discarding DWARF sections (.debug_*) was and is the default for lld-link, it's not the default for mingw, is it? I think the "obvious", non-format-aware thing for the linker to do is to treat .debug_* sections as normal sections.

Perhaps it would be more straightforward to invert the sense of the boolean, rename this to discardDwarfChunks, default it to false, and have the driver set it to true in non-mingw mode to preserve the existing behavior of lld-link. And then the question becomes, is that even the right behavior anymore? I think this default existed to defend against incorrectly emitted DWARF, but I'm not sure we really need it. I believe MSVC links debug info normally, although it truncates the section names.

Feel free to proceed with incremental patches, this is just an idea. If we can make these changes, then we can remove many of the includeDwarfChunks flag updates when parsing /debug:.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, MSVC link.exe just includes any .debug_whatever sections as is regardless of /debug.

Indeed, for mingw mode, the default is to include any .debug_* sections, but it can be overridden if linking with -S or -s (--strip-debug or --strip-all). But I'm not really keen on changing this behaviour here on the lld-link level, this is already handled quite nicely by the mingw frontend passing in /debug:dwarf or /debug:symtab as needed.

I don't think we gain anything really from changing the default of includeDwarfChunks to true for mingw mode though. If it would be good for clarity, we could of course invert the meaning of the variable from includeDwarfChunks to discardDwarfChunks, but that would just be an internal variable rename and negation.

In the end, from the mingw frontend, we need to be able to flag whether to include or dicard dwarf debug info, separately from whether to make a PDB.

@mstorsjo
Copy link
Member Author

Closing this one; these were reviewed and merged separately in #75172, #75175, #75178, #75180 and #75181.

@mstorsjo mstorsjo closed this Dec 15, 2023
@mstorsjo mstorsjo deleted the lld-debug-type branch December 15, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants