Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][lld] Add lld warning for checking discarded sections via COMDAT #74151

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PiJoules
Copy link
Contributor

@PiJoules PiJoules commented Dec 1, 2023

(Just an experiment, not sure if this is worth landing.)

The warning asserts that all sections that would be discarded from the rules for comdat groups have the same contents and sections as the ones retained in the final binary. Specifically, for a given comdat group signature, this checks:

  • The number of sections in that group is the same for all comdat groups with that signature.
  • The names of all sections in that group are the same for all other comdat groups with that signature.
  • The contents of all sections with the same name in all groups for that signature are the same.

The idea here was to detect ODR violations at the link-level. On rare occasions we've run into instances where completely different implementations for a function or data structure win at link time which this might help catch. Note all instances caught by this may actually be indicative of bugs. Due to interprocedural optimizations, it can be entirely possible for the same function in different object files to have slightly different assembly. This wouldn't be practical for relocation sections as well. This comes with an exclude list for disallowing checks on those sections.

Practically, the most I can see this being used for is checking that comdat groups accross all object files have the same named sections in them, and that data sections in those groups have the same contents.

(Just an experiment, not sure if this is worth landing.)

The warning asserts that all sections that would be discarded from the
rules for comdat groups have the same contents and sections as the ones
retained in the final binary. Specifically, for a given comdat group
signature, this checks:
- The number of sections in that group is the same for all comdat groups
  with that signature.
- The names of all sections in that group are the same for all other
  comdat groups with that signature.
- The contents of all sections with the same name in all groups for that
  signature are the same.

The idea here was to detect ODR violations at the link-level. On rare
occasions we've run into instances where completely different
implementations for a function or data structure win at link time which
this might help catch. Note all instances caught by this may actually be
indicative of bugs. Due to interprocedural optimizations, it can be
entirely possible for the same function in different object files to
have slightly different assembly. This wouldn't be practical for
relocation sections as well. This comes with an exclude list for
disallowing checks on those sections.

Practically, the most I can see this being used for is checking that
comdat groups accross all object files have the same named sections in
them, and that data sections in those groups have the same contents.
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 1, 2023

@llvm/pr-subscribers-lld-elf

Author: None (PiJoules)

Changes

(Just an experiment, not sure if this is worth landing.)

The warning asserts that all sections that would be discarded from the rules for comdat groups have the same contents and sections as the ones retained in the final binary. Specifically, for a given comdat group signature, this checks:

  • The number of sections in that group is the same for all comdat groups with that signature.
  • The names of all sections in that group are the same for all other comdat groups with that signature.
  • The contents of all sections with the same name in all groups for that signature are the same.

The idea here was to detect ODR violations at the link-level. On rare occasions we've run into instances where completely different implementations for a function or data structure win at link time which this might help catch. Note all instances caught by this may actually be indicative of bugs. Due to interprocedural optimizations, it can be entirely possible for the same function in different object files to have slightly different assembly. This wouldn't be practical for relocation sections as well. This comes with an exclude list for disallowing checks on those sections.

Practically, the most I can see this being used for is checking that comdat groups accross all object files have the same named sections in them, and that data sections in those groups have the same contents.


Full diff: https://github.com/llvm/llvm-project/pull/74151.diff

13 Files Affected:

  • (modified) lld/ELF/Config.h (+3)
  • (modified) lld/ELF/Driver.cpp (+14)
  • (modified) lld/ELF/InputFiles.cpp (+87-4)
  • (modified) lld/ELF/InputFiles.h (+4)
  • (modified) lld/ELF/Options.td (+9)
  • (modified) lld/ELF/SymbolTable.h (+10)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s (+4)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s (+17)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s (+12)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s (+12)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s (+13)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s (+9)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s (+11)
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 56229334f9a44ae..ed0aacb8ae755e3 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -295,6 +295,9 @@ struct Config {
   bool useAndroidRelrTags = false;
   bool warnBackrefs;
   llvm::SmallVector<llvm::GlobPattern, 0> warnBackrefsExclude;
+  bool warnMismatchSectionsInComdatGroups;
+  llvm::SmallVector<llvm::GlobPattern, 0>
+      warnMismatchSectionsInComdatGroupsExclude;
   bool warnCommon;
   bool warnMissingEntry;
   bool warnSymbolOrdering;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 6bef09eeca015aa..0042ad555d43925 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1437,6 +1437,9 @@ static void readConfigs(opt::InputArgList &args) {
       OPT_use_android_relr_tags, OPT_no_use_android_relr_tags, false);
   config->warnBackrefs =
       args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);
+  config->warnMismatchSectionsInComdatGroups =
+      args.hasFlag(OPT_warn_mismatch_sections_in_comdat_groups,
+                   OPT_no_warn_mismatch_sections_in_comdat_groups, false);
   config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);
   config->warnSymbolOrdering =
       args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);
@@ -1709,6 +1712,17 @@ static void readConfigs(opt::InputArgList &args) {
             pattern);
   }
 
+  for (opt::Arg *arg :
+       args.filtered(OPT_warn_mismatch_sections_in_comdat_groups_exclude)) {
+    StringRef pattern(arg->getValue());
+    if (Expected<GlobPattern> pat = GlobPattern::create(pattern))
+      config->warnMismatchSectionsInComdatGroupsExclude.push_back(
+          std::move(*pat));
+    else
+      error(arg->getSpelling() + ": " + toString(pat.takeError()) + ": " +
+            pattern);
+  }
+
   // For -no-pie and -pie, --export-dynamic-symbol specifies defined symbols
   // which should be exported. For -shared, references to matched non-local
   // STV_DEFAULT symbols are not bound to definitions within the shared object,
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index 06a3d565deb765e..0cde39ef1fad655 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -561,6 +561,84 @@ uint32_t ObjFile<ELFT>::getSectionIndex(const Elf_Sym &sym) const {
       this);
 }
 
+template <class ELFT>
+void ObjFile<ELFT>::checkComdatGroups(bool recordedNewSection,
+                                      const InputFile *otherFile,
+                                      ArrayRef<Elf_Word> sectionsInGroup,
+                                      StringRef signature) {
+  ArrayRef<Elf_Shdr> objSections = this->getELFShdrs<ELFT>();
+  object::ELFFile<ELFT> obj = this->getObj();
+  StringRef shstrtab = CHECK(obj.getSectionStringTable(objSections), this);
+  const auto &exclude = config->warnMismatchSectionsInComdatGroupsExclude;
+
+  // First check for a given comdat group, we have all the necessary sections.
+  if (recordedNewSection) {
+    // The signature was recorded - we need to save the section names for this
+    // group.
+    for (uint32_t secIdx : sectionsInGroup) {
+      const Elf_Shdr &section = objSections[secIdx];
+      StringRef sectionName =
+          CHECK(obj.getSectionName(section, shstrtab), this);
+      symtab.comdatGroupSectionNames[CachedHashStringRef(signature)].insert(
+          sectionName);
+    }
+  } else {
+    // The signature was not recorded - this comdat group has a set of section
+    // names associated with it. Check the sets match.
+    const llvm::StringSet<> &cachedSectionNames =
+        symtab.comdatGroupSectionNames.at(CachedHashStringRef(signature));
+    size_t cachedSize = cachedSectionNames.size();
+    size_t foundSize = sectionsInGroup.size();
+    if (cachedSize != foundSize) {
+      warn("comdat group with signature '" + signature + "' in '" +
+           otherFile->getName() + "' has " + Twine(cachedSize) +
+           " section(s) while group in '" + this->getName() + "' has " +
+           Twine(foundSize) + " section(s)");
+      return;
+    }
+
+    // Group sizes match, but may have different section names.
+    for (uint32_t secIdx : sectionsInGroup) {
+      const Elf_Shdr &section = objSections[secIdx];
+      StringRef sectionName =
+          CHECK(obj.getSectionName(section, shstrtab), this);
+      if (!cachedSectionNames.contains(sectionName)) {
+        warn("comdat group with signature '" + signature + "' in '" +
+             otherFile->getName() + "' does not have section '" + sectionName +
+             "' which is part of comdat group in '" + this->getName() + "'");
+        return;
+      }
+    }
+  }
+
+  // Get all sections part of the group seen here.
+  for (uint32_t secIdx : sectionsInGroup) {
+    const Elf_Shdr &section = objSections[secIdx];
+    ArrayRef<uint8_t> contents =
+        CHECK(obj.template getSectionContentsAsArray<uint8_t>(section), this);
+    StringRef sectionName = CHECK(obj.getSectionName(section, shstrtab), this);
+
+    auto pair = symtab.comdatGroupSectionContents.try_emplace(
+        CachedHashStringRef(sectionName), contents);
+    bool added = pair.second;
+    ArrayRef<uint8_t> cachedContents = pair.first->second;
+
+    bool ignoreSection =
+        std::any_of(exclude.begin(), exclude.end(),
+                    [&sectionName](const llvm::GlobPattern &pat) {
+                      return pat.match(sectionName);
+                    });
+
+    if (!added && !ignoreSection && this != otherFile &&
+        !contents.equals(cachedContents)) {
+      warn("section '" + sectionName + "' for comdat group with signature '" +
+           signature + "' has different contents between '" + this->getName() +
+           "' and '" + otherFile->getName() + "'");
+      return;
+    }
+  }
+}
+
 template <class ELFT> void ObjFile<ELFT>::parse(bool ignoreComdats) {
   object::ELFFile<ELFT> obj = this->getObj();
   // Read a section table. justSymbols is usually false.
@@ -646,10 +724,15 @@ template <class ELFT> void ObjFile<ELFT>::parse(bool ignoreComdats) {
     if (flag && flag != GRP_COMDAT)
       fatal(toString(this) + ": unsupported SHT_GROUP format");
 
-    bool keepGroup =
-        (flag & GRP_COMDAT) == 0 || ignoreComdats ||
-        symtab.comdatGroups.try_emplace(CachedHashStringRef(signature), this)
-            .second;
+    bool keepGroup = (flag & GRP_COMDAT) == 0 || ignoreComdats;
+    if (!keepGroup) {
+      auto pair =
+          symtab.comdatGroups.try_emplace(CachedHashStringRef(signature), this);
+      keepGroup = pair.second;
+      if (config->warnMismatchSectionsInComdatGroups)
+        checkComdatGroups(pair.second, pair.first->second, entries.slice(1),
+                          signature);
+    }
     if (keepGroup) {
       if (config->relocatable)
         this->sections[i] = createInputSection(
diff --git a/lld/ELF/InputFiles.h b/lld/ELF/InputFiles.h
index ab98d78fcf1455a..a2ad394d4adf18f 100644
--- a/lld/ELF/InputFiles.h
+++ b/lld/ELF/InputFiles.h
@@ -304,6 +304,10 @@ template <class ELFT> class ObjFile : public ELFFileBase {
 
   bool shouldMerge(const Elf_Shdr &sec, StringRef name);
 
+  void checkComdatGroups(bool recordedNewSection, const InputFile *otherFile,
+                         ArrayRef<Elf_Word> sectionsInGroup,
+                         StringRef signature);
+
   // Each ELF symbol contains a section index which the symbol belongs to.
   // However, because the number of bits dedicated for that is limited, a
   // symbol can directly point to a section only when the section index is
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index c2c9cabc92a4da4..4fd6ac3735e513e 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -510,6 +510,15 @@ defm warn_backrefs_exclude
          "which should be ignored for --warn-backrefs.">,
       MetaVarName<"<glob>">;
 
+defm warn_mismatch_sections_in_comdat_groups: BB<"warn-mismatch-sections-in-comdat-groups",
+    "Warn about discarded sections in a comdat group that have different contents from a retained section in a comdat group with the same key",
+    "Do not warn about discarded sections in a comdat group that have different contents from a retained section in a comdat group with the same key">;
+
+defm warn_mismatch_sections_in_comdat_groups_exclude
+    : EEq<"warn-mismatch-sections-in-comdat-groups-exclude",
+         "Glob describing sections to ignore checking for --warn-mismatch-sections-in-comdat-groups.">,
+      MetaVarName<"<glob>">;
+
 defm warn_common: B<"warn-common",
     "Warn about duplicate common symbols",
     "Do not warn about duplicate common symbols (default)">;
diff --git a/lld/ELF/SymbolTable.h b/lld/ELF/SymbolTable.h
index 37e31d23729637b..1b26bae9c8f81c6 100644
--- a/lld/ELF/SymbolTable.h
+++ b/lld/ELF/SymbolTable.h
@@ -65,6 +65,16 @@ class SymbolTable {
   // is used to uniquify them.
   llvm::DenseMap<llvm::CachedHashStringRef, const InputFile *> comdatGroups;
 
+  // Map from a section name for a comdat section to its contents. These
+  // sections are the ones not discarded.
+  llvm::DenseMap<llvm::CachedHashStringRef, ArrayRef<uint8_t>>
+      comdatGroupSectionContents;
+
+  // Map from a comdat group signature to the names of sections part of the
+  // group.
+  llvm::DenseMap<llvm::CachedHashStringRef, llvm::StringSet<>>
+      comdatGroupSectionNames;
+
   // The Map of __acle_se_<sym>, <sym> pairs found in the input objects.
   // Key is the <sym> name.
   llvm::SmallMapVector<StringRef, ArmCmseEntryFunction, 1> cmseSymMap;
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s
new file mode 100644
index 000000000000000..767d52f0f47a80f
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s
@@ -0,0 +1,4 @@
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s
new file mode 100644
index 000000000000000..2f39b5ff7e93779
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s
@@ -0,0 +1,17 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: comdat group with signature 'func' in '{{.*}}0.o' has 1 section(s) while group in '{{.*}}1.o' has 2 section(s)
+// REVERSE: warning: comdat group with signature 'func' in '{{.*}}1.o' has 2 section(s) while group in '{{.*}}0.o' has 1 section(s)
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7
+
+  .global func2
+  .section .data.func2,"awG",@progbits,func,comdat
+func2:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s
new file mode 100644
index 000000000000000..2313921021ecc8b
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s
@@ -0,0 +1,12 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: section '.data.func' for comdat group with signature 'func' has different contents between '{{.*}}1.o' and '{{.*}}0.o'
+// REVERSE: warning: section '.data.func' for comdat group with signature 'func' has different contents between '{{.*}}0.o' and '{{.*}}1.o'
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s
new file mode 100644
index 000000000000000..d21c33354dc9a3d
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s
@@ -0,0 +1,12 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: comdat group with signature 'func' in '{{.*}}0.o' does not have section '.data.func2' which is part of comdat group in '{{.*}}1.o'
+// REVERSE: warning: comdat group with signature 'func' in '{{.*}}1.o' does not have section '.data.func' which is part of comdat group in '{{.*}}0.o'
+
+  .global func
+  .section .data.func2,"awG",@progbits,func,comdat
+func:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s
new file mode 100644
index 000000000000000..09ebdc798ef694b
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s
@@ -0,0 +1,13 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --no-warn-mismatch-sections-in-comdat-groups
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --no-warn-mismatch-sections-in-comdat-groups
+
+// No error since the warning is disabled, despite the mismatch.
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s
new file mode 100644
index 000000000000000..a5a59f447799dd5
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s
@@ -0,0 +1,9 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups --warn-mismatch-sections-in-comdat-groups-exclude=.data.f*
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups --warn-mismatch-sections-in-comdat-groups-exclude=.data.f*
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s
new file mode 100644
index 000000000000000..f80ea070e358ae9
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s
@@ -0,0 +1,11 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups
+
+// No error since the sections in the comdat groups for `func` in both object files are the same.
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 1, 2023

@llvm/pr-subscribers-lld

Author: None (PiJoules)

Changes

(Just an experiment, not sure if this is worth landing.)

The warning asserts that all sections that would be discarded from the rules for comdat groups have the same contents and sections as the ones retained in the final binary. Specifically, for a given comdat group signature, this checks:

  • The number of sections in that group is the same for all comdat groups with that signature.
  • The names of all sections in that group are the same for all other comdat groups with that signature.
  • The contents of all sections with the same name in all groups for that signature are the same.

The idea here was to detect ODR violations at the link-level. On rare occasions we've run into instances where completely different implementations for a function or data structure win at link time which this might help catch. Note all instances caught by this may actually be indicative of bugs. Due to interprocedural optimizations, it can be entirely possible for the same function in different object files to have slightly different assembly. This wouldn't be practical for relocation sections as well. This comes with an exclude list for disallowing checks on those sections.

Practically, the most I can see this being used for is checking that comdat groups accross all object files have the same named sections in them, and that data sections in those groups have the same contents.


Full diff: https://github.com/llvm/llvm-project/pull/74151.diff

13 Files Affected:

  • (modified) lld/ELF/Config.h (+3)
  • (modified) lld/ELF/Driver.cpp (+14)
  • (modified) lld/ELF/InputFiles.cpp (+87-4)
  • (modified) lld/ELF/InputFiles.h (+4)
  • (modified) lld/ELF/Options.td (+9)
  • (modified) lld/ELF/SymbolTable.h (+10)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s (+4)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s (+17)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s (+12)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s (+12)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s (+13)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s (+9)
  • (added) lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s (+11)
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 56229334f9a44ae..ed0aacb8ae755e3 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -295,6 +295,9 @@ struct Config {
   bool useAndroidRelrTags = false;
   bool warnBackrefs;
   llvm::SmallVector<llvm::GlobPattern, 0> warnBackrefsExclude;
+  bool warnMismatchSectionsInComdatGroups;
+  llvm::SmallVector<llvm::GlobPattern, 0>
+      warnMismatchSectionsInComdatGroupsExclude;
   bool warnCommon;
   bool warnMissingEntry;
   bool warnSymbolOrdering;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 6bef09eeca015aa..0042ad555d43925 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1437,6 +1437,9 @@ static void readConfigs(opt::InputArgList &args) {
       OPT_use_android_relr_tags, OPT_no_use_android_relr_tags, false);
   config->warnBackrefs =
       args.hasFlag(OPT_warn_backrefs, OPT_no_warn_backrefs, false);
+  config->warnMismatchSectionsInComdatGroups =
+      args.hasFlag(OPT_warn_mismatch_sections_in_comdat_groups,
+                   OPT_no_warn_mismatch_sections_in_comdat_groups, false);
   config->warnCommon = args.hasFlag(OPT_warn_common, OPT_no_warn_common, false);
   config->warnSymbolOrdering =
       args.hasFlag(OPT_warn_symbol_ordering, OPT_no_warn_symbol_ordering, true);
@@ -1709,6 +1712,17 @@ static void readConfigs(opt::InputArgList &args) {
             pattern);
   }
 
+  for (opt::Arg *arg :
+       args.filtered(OPT_warn_mismatch_sections_in_comdat_groups_exclude)) {
+    StringRef pattern(arg->getValue());
+    if (Expected<GlobPattern> pat = GlobPattern::create(pattern))
+      config->warnMismatchSectionsInComdatGroupsExclude.push_back(
+          std::move(*pat));
+    else
+      error(arg->getSpelling() + ": " + toString(pat.takeError()) + ": " +
+            pattern);
+  }
+
   // For -no-pie and -pie, --export-dynamic-symbol specifies defined symbols
   // which should be exported. For -shared, references to matched non-local
   // STV_DEFAULT symbols are not bound to definitions within the shared object,
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index 06a3d565deb765e..0cde39ef1fad655 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -561,6 +561,84 @@ uint32_t ObjFile<ELFT>::getSectionIndex(const Elf_Sym &sym) const {
       this);
 }
 
+template <class ELFT>
+void ObjFile<ELFT>::checkComdatGroups(bool recordedNewSection,
+                                      const InputFile *otherFile,
+                                      ArrayRef<Elf_Word> sectionsInGroup,
+                                      StringRef signature) {
+  ArrayRef<Elf_Shdr> objSections = this->getELFShdrs<ELFT>();
+  object::ELFFile<ELFT> obj = this->getObj();
+  StringRef shstrtab = CHECK(obj.getSectionStringTable(objSections), this);
+  const auto &exclude = config->warnMismatchSectionsInComdatGroupsExclude;
+
+  // First check for a given comdat group, we have all the necessary sections.
+  if (recordedNewSection) {
+    // The signature was recorded - we need to save the section names for this
+    // group.
+    for (uint32_t secIdx : sectionsInGroup) {
+      const Elf_Shdr &section = objSections[secIdx];
+      StringRef sectionName =
+          CHECK(obj.getSectionName(section, shstrtab), this);
+      symtab.comdatGroupSectionNames[CachedHashStringRef(signature)].insert(
+          sectionName);
+    }
+  } else {
+    // The signature was not recorded - this comdat group has a set of section
+    // names associated with it. Check the sets match.
+    const llvm::StringSet<> &cachedSectionNames =
+        symtab.comdatGroupSectionNames.at(CachedHashStringRef(signature));
+    size_t cachedSize = cachedSectionNames.size();
+    size_t foundSize = sectionsInGroup.size();
+    if (cachedSize != foundSize) {
+      warn("comdat group with signature '" + signature + "' in '" +
+           otherFile->getName() + "' has " + Twine(cachedSize) +
+           " section(s) while group in '" + this->getName() + "' has " +
+           Twine(foundSize) + " section(s)");
+      return;
+    }
+
+    // Group sizes match, but may have different section names.
+    for (uint32_t secIdx : sectionsInGroup) {
+      const Elf_Shdr &section = objSections[secIdx];
+      StringRef sectionName =
+          CHECK(obj.getSectionName(section, shstrtab), this);
+      if (!cachedSectionNames.contains(sectionName)) {
+        warn("comdat group with signature '" + signature + "' in '" +
+             otherFile->getName() + "' does not have section '" + sectionName +
+             "' which is part of comdat group in '" + this->getName() + "'");
+        return;
+      }
+    }
+  }
+
+  // Get all sections part of the group seen here.
+  for (uint32_t secIdx : sectionsInGroup) {
+    const Elf_Shdr &section = objSections[secIdx];
+    ArrayRef<uint8_t> contents =
+        CHECK(obj.template getSectionContentsAsArray<uint8_t>(section), this);
+    StringRef sectionName = CHECK(obj.getSectionName(section, shstrtab), this);
+
+    auto pair = symtab.comdatGroupSectionContents.try_emplace(
+        CachedHashStringRef(sectionName), contents);
+    bool added = pair.second;
+    ArrayRef<uint8_t> cachedContents = pair.first->second;
+
+    bool ignoreSection =
+        std::any_of(exclude.begin(), exclude.end(),
+                    [&sectionName](const llvm::GlobPattern &pat) {
+                      return pat.match(sectionName);
+                    });
+
+    if (!added && !ignoreSection && this != otherFile &&
+        !contents.equals(cachedContents)) {
+      warn("section '" + sectionName + "' for comdat group with signature '" +
+           signature + "' has different contents between '" + this->getName() +
+           "' and '" + otherFile->getName() + "'");
+      return;
+    }
+  }
+}
+
 template <class ELFT> void ObjFile<ELFT>::parse(bool ignoreComdats) {
   object::ELFFile<ELFT> obj = this->getObj();
   // Read a section table. justSymbols is usually false.
@@ -646,10 +724,15 @@ template <class ELFT> void ObjFile<ELFT>::parse(bool ignoreComdats) {
     if (flag && flag != GRP_COMDAT)
       fatal(toString(this) + ": unsupported SHT_GROUP format");
 
-    bool keepGroup =
-        (flag & GRP_COMDAT) == 0 || ignoreComdats ||
-        symtab.comdatGroups.try_emplace(CachedHashStringRef(signature), this)
-            .second;
+    bool keepGroup = (flag & GRP_COMDAT) == 0 || ignoreComdats;
+    if (!keepGroup) {
+      auto pair =
+          symtab.comdatGroups.try_emplace(CachedHashStringRef(signature), this);
+      keepGroup = pair.second;
+      if (config->warnMismatchSectionsInComdatGroups)
+        checkComdatGroups(pair.second, pair.first->second, entries.slice(1),
+                          signature);
+    }
     if (keepGroup) {
       if (config->relocatable)
         this->sections[i] = createInputSection(
diff --git a/lld/ELF/InputFiles.h b/lld/ELF/InputFiles.h
index ab98d78fcf1455a..a2ad394d4adf18f 100644
--- a/lld/ELF/InputFiles.h
+++ b/lld/ELF/InputFiles.h
@@ -304,6 +304,10 @@ template <class ELFT> class ObjFile : public ELFFileBase {
 
   bool shouldMerge(const Elf_Shdr &sec, StringRef name);
 
+  void checkComdatGroups(bool recordedNewSection, const InputFile *otherFile,
+                         ArrayRef<Elf_Word> sectionsInGroup,
+                         StringRef signature);
+
   // Each ELF symbol contains a section index which the symbol belongs to.
   // However, because the number of bits dedicated for that is limited, a
   // symbol can directly point to a section only when the section index is
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index c2c9cabc92a4da4..4fd6ac3735e513e 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -510,6 +510,15 @@ defm warn_backrefs_exclude
          "which should be ignored for --warn-backrefs.">,
       MetaVarName<"<glob>">;
 
+defm warn_mismatch_sections_in_comdat_groups: BB<"warn-mismatch-sections-in-comdat-groups",
+    "Warn about discarded sections in a comdat group that have different contents from a retained section in a comdat group with the same key",
+    "Do not warn about discarded sections in a comdat group that have different contents from a retained section in a comdat group with the same key">;
+
+defm warn_mismatch_sections_in_comdat_groups_exclude
+    : EEq<"warn-mismatch-sections-in-comdat-groups-exclude",
+         "Glob describing sections to ignore checking for --warn-mismatch-sections-in-comdat-groups.">,
+      MetaVarName<"<glob>">;
+
 defm warn_common: B<"warn-common",
     "Warn about duplicate common symbols",
     "Do not warn about duplicate common symbols (default)">;
diff --git a/lld/ELF/SymbolTable.h b/lld/ELF/SymbolTable.h
index 37e31d23729637b..1b26bae9c8f81c6 100644
--- a/lld/ELF/SymbolTable.h
+++ b/lld/ELF/SymbolTable.h
@@ -65,6 +65,16 @@ class SymbolTable {
   // is used to uniquify them.
   llvm::DenseMap<llvm::CachedHashStringRef, const InputFile *> comdatGroups;
 
+  // Map from a section name for a comdat section to its contents. These
+  // sections are the ones not discarded.
+  llvm::DenseMap<llvm::CachedHashStringRef, ArrayRef<uint8_t>>
+      comdatGroupSectionContents;
+
+  // Map from a comdat group signature to the names of sections part of the
+  // group.
+  llvm::DenseMap<llvm::CachedHashStringRef, llvm::StringSet<>>
+      comdatGroupSectionNames;
+
   // The Map of __acle_se_<sym>, <sym> pairs found in the input objects.
   // Key is the <sym> name.
   llvm::SmallMapVector<StringRef, ArmCmseEntryFunction, 1> cmseSymMap;
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s
new file mode 100644
index 000000000000000..767d52f0f47a80f
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/Inputs/func.s
@@ -0,0 +1,4 @@
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s
new file mode 100644
index 000000000000000..2f39b5ff7e93779
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-group-sizes.s
@@ -0,0 +1,17 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: comdat group with signature 'func' in '{{.*}}0.o' has 1 section(s) while group in '{{.*}}1.o' has 2 section(s)
+// REVERSE: warning: comdat group with signature 'func' in '{{.*}}1.o' has 2 section(s) while group in '{{.*}}0.o' has 1 section(s)
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7
+
+  .global func2
+  .section .data.func2,"awG",@progbits,func,comdat
+func2:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s
new file mode 100644
index 000000000000000..2313921021ecc8b
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-contents.s
@@ -0,0 +1,12 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: section '.data.func' for comdat group with signature 'func' has different contents between '{{.*}}1.o' and '{{.*}}0.o'
+// REVERSE: warning: section '.data.func' for comdat group with signature 'func' has different contents between '{{.*}}0.o' and '{{.*}}1.o'
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s
new file mode 100644
index 000000000000000..d21c33354dc9a3d
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/different-section-names.s
@@ -0,0 +1,12 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --warn-mismatch-sections-in-comdat-groups 2>&1 | FileCheck %s --check-prefix=REVERSE
+
+// CHECK:   warning: comdat group with signature 'func' in '{{.*}}0.o' does not have section '.data.func2' which is part of comdat group in '{{.*}}1.o'
+// REVERSE: warning: comdat group with signature 'func' in '{{.*}}1.o' does not have section '.data.func' which is part of comdat group in '{{.*}}0.o'
+
+  .global func
+  .section .data.func2,"awG",@progbits,func,comdat
+func:
+  .word 7
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s
new file mode 100644
index 000000000000000..09ebdc798ef694b
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/disabled-warning.s
@@ -0,0 +1,13 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --no-warn-mismatch-sections-in-comdat-groups
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --no-warn-mismatch-sections-in-comdat-groups
+
+// No error since the warning is disabled, despite the mismatch.
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s
new file mode 100644
index 000000000000000..a5a59f447799dd5
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/ignore-section.s
@@ -0,0 +1,9 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups --warn-mismatch-sections-in-comdat-groups-exclude=.data.f*
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups --warn-mismatch-sections-in-comdat-groups-exclude=.data.f*
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 6
diff --git a/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s
new file mode 100644
index 000000000000000..f80ea070e358ae9
--- /dev/null
+++ b/lld/test/ELF/mismatch-sections-in-comdat-group-warning/no-warning.s
@@ -0,0 +1,11 @@
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %p/Inputs/func.s -o %t0.o
+// RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t1.o
+// RUN: ld.lld -shared %t0.o %t1.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups
+// RUN: ld.lld -shared %t1.o %t0.o -o /dev/null --fatal-warnings --warn-mismatch-sections-in-comdat-groups
+
+// No error since the sections in the comdat groups for `func` in both object files are the same.
+
+  .global func
+  .section .data.func,"awG",@progbits,func,comdat
+func:
+  .word 7

@MaskRay
Copy link
Member

MaskRay commented Dec 2, 2023

I haven't looked at the patch in detail. Some thoughts:

(It is valid to have two translation units, where a.o has C++ deleting and base object destructors (D0/D2) in their own COMDAT groups, while b.o has D0/D2 in the D5 COMDAT group (constructor/destructor alias). So, we cannot say that a COMDAT discarded section only uses one COMDAT section signature. However, this approach this patch utilizes, should work.)

@smithp35
Copy link
Collaborator

smithp35 commented Dec 4, 2023

Not commenting directly on the code, just some experience from Arm's proprietary toolchain which, in its earlier form had a compiler and linker developed together so we had a bit more control over the incoming groups.

From the description your definition of ODR violation is very strict, I think the only way it could be guaranteed to be satisfied is if every group where both identical in source, and compiled by the same compiler using the same compliation flages.

To summarise:

  • Group contents can vary for legitimate reasons without being an ODR violation.
  • LTO can trip these up, particularly for some C++ virtual methods at least with the libLLVMLTO code generator interface.
  • The interface of what makes up a group is not well defined in ELF, I don't think section names matter at all.

The compiler also made much heavier use of groups for debug information to avoid dangling references when sections are removed (good for avoiding dangling references, bad for small object size and link time as it results in a lot more duplicate debug information).

As we understood ELF, the contents of comdat groups must be substitutable, in that the linker may choose any of the groups without affecting the program behaviour. There can be many situations where the contents of groups are different in contents for legitimate reasons. For example different optimisation levels, exceptions support (on Arm we have .ARM.exidx sections in groups), different debug (when debug is present in comdat group). Admittedly this isn't likely to be a problem on a system with homogenous command line choice, but it can happen when mixing binary only static libraries from different object producers.

We've also found instances where LTO (via the libLLVMLTO code generator which has a somewhat limited interface) will remove unused functions from groups which can defeat an ODR checker looking for the same "interface" as defined by global symbols.

In ELF, as far as we can tell, the only significant parts of a group are the global symbols that it defines. As these must be marked as undefined when the group is discarded. It is not permitted to make references to local symbols from outside the group so these are just implementation detail.

@frobtech
Copy link
Contributor

frobtech commented Dec 5, 2023

I concur with Peter's assessment. This is far more strict than the actual requirement, and I'm skeptical it's useful even as an opt-in check because it's so strict. Something like verifying that all alternate groups define all the same non-local symbols is I think the strongest check that can be enforced in the general case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants