Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ELF] Change .debug_names tombstone value to UINT32_MAX/UINT64_MAX #74686

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Dec 7, 2023

clang -g -gpubnames -fdebug-types-section now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

.section        .debug_info,"G",@progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0

When .Ltu_begin0 is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

cat > a.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat > b.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most .debug_* sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

clang -g -gpubnames -fdebug-types-section now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

.section        .debug_info,"G",@<!-- -->progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@<!-- -->progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0

When .Ltu_begin0 is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

cat &gt; a.cc &lt;&lt;e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat &gt; b.cc &lt;&lt;e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most .debug_* sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.


Full diff: https://github.com/llvm/llvm-project/pull/74686.diff

3 Files Affected:

  • (modified) lld/ELF/InputSection.cpp (+17-7)
  • (modified) lld/test/ELF/debug-dead-reloc-32.s (+11)
  • (modified) lld/test/ELF/debug-dead-reloc.s (+22-2)
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index 145d55d9a0a4b..5a62b14f4c973 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -898,10 +898,16 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
   const TargetInfo &target = *elf::target;
   const auto emachine = config->emachine;
   const bool isDebug = isDebugSection(*this);
-  const bool isDebugLocOrRanges =
-      isDebug && (name == ".debug_loc" || name == ".debug_ranges");
   const bool isDebugLine = isDebug && name == ".debug_line";
-  std::optional<uint64_t> tombstone;
+  std::optional<uint64_t> tombstone, debugTombstone;
+  if (isDebug) {
+    if (name == ".debug_loc" || name == ".debug_ranges")
+      debugTombstone = 1;
+    else if (name == ".debug_names")
+      debugTombstone = UINT64_MAX; // DWARF Issue 231013.1
+    else
+      debugTombstone = 0;
+  }
   for (const auto &patAndValue : llvm::reverse(config->deadRelocInNonAlloc))
     if (patAndValue.first.match(this->name)) {
       tombstone = patAndValue.second;
@@ -954,8 +960,7 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
       return;
     }
 
-    if (tombstone ||
-        (isDebug && (type == target.symbolicRel || expr == R_DTPREL))) {
+    if (tombstone || (isDebug && (expr == R_ABS || expr == R_DTPREL))) {
       // Resolve relocations in .debug_* referencing (discarded symbols or ICF
       // folded section symbols) to a tombstone value. Resolving to addend is
       // unsatisfactory because the result address range may collide with a
@@ -986,8 +991,13 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
       // value. Enable -1 in a future release.
       if (!sym.getOutputSection() || (ds && ds->folded && !isDebugLine)) {
         // If -z dead-reloc-in-nonalloc= is specified, respect it.
-        const uint64_t value = tombstone ? SignExtend64<bits>(*tombstone)
-                                         : (isDebugLocOrRanges ? 1 : 0);
+        uint64_t value;
+        if (tombstone)
+          value = SignExtend64<bits>(*tombstone);
+        else if (type == target.symbolicRel)
+          value = *debugTombstone;
+        else // .debug_names uses 32-bit local TU offsets for DWARF32
+          value = static_cast<uint32_t>(*debugTombstone);
         target.relocateNoSym(bufLoc, type, value);
         continue;
       }
diff --git a/lld/test/ELF/debug-dead-reloc-32.s b/lld/test/ELF/debug-dead-reloc-32.s
index b2708a744f288..7bf30b98e6890 100644
--- a/lld/test/ELF/debug-dead-reloc-32.s
+++ b/lld/test/ELF/debug-dead-reloc-32.s
@@ -13,6 +13,8 @@
 # CHECK-NEXT:  0000 01000000
 # CHECK-NEXT: Contents of section .debug_addr:
 # CHECK-NEXT:  0000 00000000
+# CHECK-NEXT: Contents of section .debug_names:
+# CHECK-NEXT:  0000 ffffffff
 
 .section .text.1,"axe"
   .byte 0
@@ -27,3 +29,12 @@
 ## Resolved to UINT32_C(0), with the addend ignored.
 .section .debug_addr
   .long .text.1+8
+
+.section  .debug_info,"eG",@progbits,5657452045627120676,comdat
+.Ltu_begin0:
+
+.section .debug_names
+## .debug_names may reference a local type unit defined in a COMDAT .debug_info
+## section (-g -gpubnames -fdebug-types-section). If the referenced section is
+## non-prevailing, resolve to UINT32_MAX.
+.long .Ltu_begin0
diff --git a/lld/test/ELF/debug-dead-reloc.s b/lld/test/ELF/debug-dead-reloc.s
index fcf53205079ed..627513c2672c4 100644
--- a/lld/test/ELF/debug-dead-reloc.s
+++ b/lld/test/ELF/debug-dead-reloc.s
@@ -16,9 +16,12 @@
 # CHECK:      Contents of section .debug_addr:
 # CHECK-NEXT:  0000 {{.*}}000 00000000 {{.*}}000 00000000
 # CHECK-NEXT:  0010 00000000  00000000 {{.*}}000 00000000
+# CHECK:      Contents of section .debug_names:
+# CHECK-NEXT:  0000 00000000 00000000 00000000 ffffffff .
+# CHECK-NEXT:  0010 ffffffff ffffffff                   .
 # CHECK:      Contents of section .debug_foo:
-# CHECK-NEXT:  0000 00000000 00000000 08000000 00000000
-# CHECK-NEXT:  0010 00000000 00000000 08000000 00000000
+# CHECK-NEXT:  0000 00000000 00000000 00000000 00000000
+# CHECK-NEXT:  0010 00000000 00000000 00000000 00000000
 
 # REL:      Relocations [
 # REL-NEXT:   .rela.text {
@@ -38,6 +41,12 @@
 # REL-NEXT:     0x10 R_X86_64_NONE - 0x18
 # REL-NEXT:     0x18 R_X86_64_64 group 0x20
 # REL-NEXT:   }
+# REL-NEXT:   .rela.debug_names {
+# REL-NEXT:     0x0 R_X86_64_32 .debug_info 0x0
+# REL-NEXT:     0x4 R_X86_64_64 .debug_info 0x0
+# REL-NEXT:     0xC R_X86_64_NONE - 0x0
+# REL-NEXT:     0x10 R_X86_64_NONE - 0x0
+# REL-NEXT:   }
 # REL-NEXT:   .rela.debug_foo {
 # REL-NEXT:     0x0 R_X86_64_NONE - 0x8
 # REL-NEXT:     0x8 R_X86_64_NONE - 0x8
@@ -77,6 +86,17 @@ group:
 ## resolved to the prevailing copy.
   .quad group+32
 
+.section  .debug_info,"G",@progbits,5657452045627120676,comdat
+.Ltu_begin0:
+
+.section .debug_names
+## .debug_names may reference a local type unit defined in a COMDAT .debug_info
+## section (-g -gpubnames -fdebug-types-section). If the referenced section is
+## non-prevailing, resolve to UINT32_MAX.
+.long .Ltu_begin0
+## ... or UINT64_MAX for DWARF64.
+.quad .Ltu_begin0
+
 .section .debug_foo
   .quad .text.1+8
 

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 7, 2023

@llvm/pr-subscribers-lld-elf

Author: Fangrui Song (MaskRay)

Changes

clang -g -gpubnames -fdebug-types-section now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

.section        .debug_info,"G",@<!-- -->progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@<!-- -->progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0

When .Ltu_begin0 is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

cat &gt; a.cc &lt;&lt;e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat &gt; b.cc &lt;&lt;e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most .debug_* sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.


Full diff: https://github.com/llvm/llvm-project/pull/74686.diff

3 Files Affected:

  • (modified) lld/ELF/InputSection.cpp (+17-7)
  • (modified) lld/test/ELF/debug-dead-reloc-32.s (+11)
  • (modified) lld/test/ELF/debug-dead-reloc.s (+22-2)
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index 145d55d9a0a4b..5a62b14f4c973 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -898,10 +898,16 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
   const TargetInfo &target = *elf::target;
   const auto emachine = config->emachine;
   const bool isDebug = isDebugSection(*this);
-  const bool isDebugLocOrRanges =
-      isDebug && (name == ".debug_loc" || name == ".debug_ranges");
   const bool isDebugLine = isDebug && name == ".debug_line";
-  std::optional<uint64_t> tombstone;
+  std::optional<uint64_t> tombstone, debugTombstone;
+  if (isDebug) {
+    if (name == ".debug_loc" || name == ".debug_ranges")
+      debugTombstone = 1;
+    else if (name == ".debug_names")
+      debugTombstone = UINT64_MAX; // DWARF Issue 231013.1
+    else
+      debugTombstone = 0;
+  }
   for (const auto &patAndValue : llvm::reverse(config->deadRelocInNonAlloc))
     if (patAndValue.first.match(this->name)) {
       tombstone = patAndValue.second;
@@ -954,8 +960,7 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
       return;
     }
 
-    if (tombstone ||
-        (isDebug && (type == target.symbolicRel || expr == R_DTPREL))) {
+    if (tombstone || (isDebug && (expr == R_ABS || expr == R_DTPREL))) {
       // Resolve relocations in .debug_* referencing (discarded symbols or ICF
       // folded section symbols) to a tombstone value. Resolving to addend is
       // unsatisfactory because the result address range may collide with a
@@ -986,8 +991,13 @@ void InputSection::relocateNonAlloc(uint8_t *buf, ArrayRef<RelTy> rels) {
       // value. Enable -1 in a future release.
       if (!sym.getOutputSection() || (ds && ds->folded && !isDebugLine)) {
         // If -z dead-reloc-in-nonalloc= is specified, respect it.
-        const uint64_t value = tombstone ? SignExtend64<bits>(*tombstone)
-                                         : (isDebugLocOrRanges ? 1 : 0);
+        uint64_t value;
+        if (tombstone)
+          value = SignExtend64<bits>(*tombstone);
+        else if (type == target.symbolicRel)
+          value = *debugTombstone;
+        else // .debug_names uses 32-bit local TU offsets for DWARF32
+          value = static_cast<uint32_t>(*debugTombstone);
         target.relocateNoSym(bufLoc, type, value);
         continue;
       }
diff --git a/lld/test/ELF/debug-dead-reloc-32.s b/lld/test/ELF/debug-dead-reloc-32.s
index b2708a744f288..7bf30b98e6890 100644
--- a/lld/test/ELF/debug-dead-reloc-32.s
+++ b/lld/test/ELF/debug-dead-reloc-32.s
@@ -13,6 +13,8 @@
 # CHECK-NEXT:  0000 01000000
 # CHECK-NEXT: Contents of section .debug_addr:
 # CHECK-NEXT:  0000 00000000
+# CHECK-NEXT: Contents of section .debug_names:
+# CHECK-NEXT:  0000 ffffffff
 
 .section .text.1,"axe"
   .byte 0
@@ -27,3 +29,12 @@
 ## Resolved to UINT32_C(0), with the addend ignored.
 .section .debug_addr
   .long .text.1+8
+
+.section  .debug_info,"eG",@progbits,5657452045627120676,comdat
+.Ltu_begin0:
+
+.section .debug_names
+## .debug_names may reference a local type unit defined in a COMDAT .debug_info
+## section (-g -gpubnames -fdebug-types-section). If the referenced section is
+## non-prevailing, resolve to UINT32_MAX.
+.long .Ltu_begin0
diff --git a/lld/test/ELF/debug-dead-reloc.s b/lld/test/ELF/debug-dead-reloc.s
index fcf53205079ed..627513c2672c4 100644
--- a/lld/test/ELF/debug-dead-reloc.s
+++ b/lld/test/ELF/debug-dead-reloc.s
@@ -16,9 +16,12 @@
 # CHECK:      Contents of section .debug_addr:
 # CHECK-NEXT:  0000 {{.*}}000 00000000 {{.*}}000 00000000
 # CHECK-NEXT:  0010 00000000  00000000 {{.*}}000 00000000
+# CHECK:      Contents of section .debug_names:
+# CHECK-NEXT:  0000 00000000 00000000 00000000 ffffffff .
+# CHECK-NEXT:  0010 ffffffff ffffffff                   .
 # CHECK:      Contents of section .debug_foo:
-# CHECK-NEXT:  0000 00000000 00000000 08000000 00000000
-# CHECK-NEXT:  0010 00000000 00000000 08000000 00000000
+# CHECK-NEXT:  0000 00000000 00000000 00000000 00000000
+# CHECK-NEXT:  0010 00000000 00000000 00000000 00000000
 
 # REL:      Relocations [
 # REL-NEXT:   .rela.text {
@@ -38,6 +41,12 @@
 # REL-NEXT:     0x10 R_X86_64_NONE - 0x18
 # REL-NEXT:     0x18 R_X86_64_64 group 0x20
 # REL-NEXT:   }
+# REL-NEXT:   .rela.debug_names {
+# REL-NEXT:     0x0 R_X86_64_32 .debug_info 0x0
+# REL-NEXT:     0x4 R_X86_64_64 .debug_info 0x0
+# REL-NEXT:     0xC R_X86_64_NONE - 0x0
+# REL-NEXT:     0x10 R_X86_64_NONE - 0x0
+# REL-NEXT:   }
 # REL-NEXT:   .rela.debug_foo {
 # REL-NEXT:     0x0 R_X86_64_NONE - 0x8
 # REL-NEXT:     0x8 R_X86_64_NONE - 0x8
@@ -77,6 +86,17 @@ group:
 ## resolved to the prevailing copy.
   .quad group+32
 
+.section  .debug_info,"G",@progbits,5657452045627120676,comdat
+.Ltu_begin0:
+
+.section .debug_names
+## .debug_names may reference a local type unit defined in a COMDAT .debug_info
+## section (-g -gpubnames -fdebug-types-section). If the referenced section is
+## non-prevailing, resolve to UINT32_MAX.
+.long .Ltu_begin0
+## ... or UINT64_MAX for DWARF64.
+.quad .Ltu_begin0
+
 .section .debug_foo
   .quad .text.1+8
 

lld/ELF/InputSection.cpp Outdated Show resolved Hide resolved
lld/ELF/InputSection.cpp Outdated Show resolved Hide resolved
MaskRay added a commit that referenced this pull request Dec 7, 2023
Add 32-bit test for -z dead-reloc-in-nonalloc= and add tests for a
non-x86 64-bit (x86-64 is unique in discerning signed/unsigned 32-bit
absolute relocations (R_X86_64_32/R_X86_64_32S).
AArch64/PPC64/RISC-V/etc don't have the distinction). Having a test will
improve coverage for #74686
`clang -g -gpubnames -fdebug-types-section` now emits .debug_names
section with references to local type unit entries defined in COMDAT
.debug_info sections.

```
.section        .debug_info,"G",@progbits,5657452045627120676,comdat
.Ltu_begin0:
...

.section        .debug_names,"",@progbits
...
// DWARF32
.long   .Ltu_begin0                     # Type unit 0
// DWARF64
// .long   .Ltu_begin0                     # Type unit 0
```

When `.Ltu_begin0` is relative to a non-prevailing .debug_info section,
the relocation resolves to 0, which is a valid offset within the
.debug_info section.

```
cat > a.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
int main() { foo(); }
e
cat > b.cc <<e
struct A { int x; };
inline A foo() { return {1}; }
void use() { foo(); }
e
clang++ -g -gpubnames -fdebug-types-section -fuse-ld=lld a.cc b.cc -o old
```
```
% llvm-dwarfdump old
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000
  ]
...
  Local Type Unit offsets [
    LocalTU[0]: 0x00000000  // indistinguishable from a valid offset within .debug_info
  ]
```

https://dwarfstd.org/issues/231013.1.html proposes that we use a
tombstone value instead to inform consumers. This patch implements the
idea. The second LocalTU entry will now use 0xffffffff.

https://reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most `.debug_*` sections to UINT64_MAX. We have
postponed the change for more than three years for consumers to migrate.
At some point we shall make the change, so that .debug_names is no long
different from other debug section that is not .debug_loc/.debug_ranges.
@MaskRay
Copy link
Member Author

MaskRay commented Dec 7, 2023

Rebase to resolve a test conflict. No other change to the two commits.

@ayermolo
Copy link
Contributor

ayermolo commented Dec 8, 2023

Thank you for putting up a PR. Looking at it I am not sure what the fundamental differences are between your patch and mine. It looks like it is somewhat where my patch has started where additional relocation checks remained, and consequently removed per reviewers suggestions. I thought my PR was going in the right direction per your comment #70701 (review)
Can you explain why the direction you chose to go with is better?

Thanks.

@MaskRay
Copy link
Member Author

MaskRay commented Dec 8, 2023

Thank you for putting up a PR. Looking at it I am not sure what the fundamental differences are between your patch and mine. It looks like it is somewhat where my patch has started where additional relocation checks remained, and consequently removed per reviewers suggestions. I thought my PR was going in the right direction per your comment #70701 (review) Can you explain why the direction you chose to go with is better?

Thanks.

Hi, yes. I started as trying to preserve the RelExpr constraint so that if there is a R_PC (erroneous in a non-ALLOC section), -z dead-reloc-in-nonalloc= does not lose the error. I think other than this difference, functionality-wise the two patches are the same.

@ayermolo
Copy link
Contributor

ayermolo commented Dec 15, 2023

Thank you for putting up a PR. Looking at it I am not sure what the fundamental differences are between your patch and mine. It looks like it is somewhat where my patch has started where additional relocation checks remained, and consequently removed per reviewers suggestions. I thought my PR was going in the right direction per your comment #70701 (review) Can you explain why the direction you chose to go with is better?
Thanks.

Hi, yes. I started as trying to preserve the RelExpr constraint so that if there is a R_PC (erroneous in a non-ALLOC section), -z dead-reloc-in-nonalloc= does not lose the error. I think other than this difference, functionality-wise the two patches are the same.

Thank you for your explanation. If the two patches are functionally equivalent, I'd suggest us continuing in the original PR instead.

@MaskRay
Copy link
Member Author

MaskRay commented Dec 15, 2023

reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most .debug_* sections to UINT64_MAX.

Your input has been valuable in guiding this enhancement. I truly appreciate the .debug_names work you initiated and put in this linker patch.
Now, the code here isn't too complex but the two implementations have some nuances, though not every aspect is tested.

  1. What shall we do for a non-R_ABS tombsone value? [LLD] Tombstone LocalTU entry in .debug_names #70701 will apply the tombstone value and [ELF] Change .debug_names tombstone value to UINT32_MAX/UINT64_MAX #74686 won't.

I feel that we can reject it. My initial -z dead-reloc-in-nonalloc= patch (which I do want to support non-symbolic R_ABS relocations) did not consider it as it would complicate the code.
Now we have to handle it. I feel that if (tombstone && (expr == R_ABS || expr == R_DTPREL)) { as in this patch is better.

  1. What shall we do for a short R_ABS tombstone value?

#70701 uses if (!tombstone && type == target.symbolicRel).
With some thoughts, I believe we should use if (emachine == EM_X86_64 && type == R_X86_64_32) to make it clear it's x86-64 specific to differentiate 32-bit signed/unsigned relocation types.
If we eventually make relocateNonAlloc target-specific (as I did for relocateAlloc), the R_X86_64_32 code can be make X86_64.cpp specific.

Considering the necessity for certain improvements, I've put efforts to write a detailed description and tests (including the test enhancement to debug-dead-reloc.s and a 32-bit test (which I omitted for my initial -z dead-reloc-in-nonalloc= patch)) that align better with the existing convention of these modified tests.
For example, I think it is important to call out the TODO from reviews.llvm.org/D84825 in the description.

I'd be more than happy to include you as a co-author acknowledging your contribution.
If you'd prefer any specific recognition or involvement moving forward, please let me know. Your input has been valuable in guiding this enhancement.

@ayermolo
Copy link
Contributor

reviews.llvm.org/D84825 has a TODO that we should switch the
tombstone value for most .debug_* sections to UINT64_MAX.

Your input has been valuable in guiding this enhancement. I truly appreciate the .debug_names work you initiated and put in this linker patch. Now, the code here isn't too complex but the two implementations have some nuances, though not every aspect is tested.

  1. What shall we do for a non-R_ABS tombsone value? [LLD] Tombstone LocalTU entry in .debug_names #70701 will apply the tombstone value and [ELF] Change .debug_names tombstone value to UINT32_MAX/UINT64_MAX #74686 won't.

I feel that we can reject it. My initial -z dead-reloc-in-nonalloc= patch (which I do want to support non-symbolic R_ABS relocations) did not consider it as it would complicate the code. Now we have to handle it. I feel that if (tombstone && (expr == R_ABS || expr == R_DTPREL)) { as in this patch is better.

  1. What shall we do for a short R_ABS tombstone value?

#70701 uses if (!tombstone && type == target.symbolicRel). With some thoughts, I believe we should use if (emachine == EM_X86_64 && type == R_X86_64_32) to make it clear it's x86-64 specific to differentiate 32-bit signed/unsigned relocation types. If we eventually make relocateNonAlloc target-specific (as I did for relocateAlloc), the R_X86_64_32 code can be make X86_64.cpp specific.

Considering the necessity for certain improvements, I've put efforts to write a detailed description and tests (including the test enhancement to debug-dead-reloc.s and a 32-bit test (which I omitted for my initial -z dead-reloc-in-nonalloc= patch)) that align better with the existing convention of these modified tests. For example, I think it is important to call out the TODO from reviews.llvm.org/D84825 in the description.

I'd be more than happy to include you as a co-author acknowledging your contribution. If you'd prefer any specific recognition or involvement moving forward, please let me know. Your input has been valuable in guiding this enhancement.

Thank you for explanation. I think it would be valuable for me, and for the health of the project, if next time we could continue discussion in the original patch I think it would benefit everyone. :)

@ayermolo
Copy link
Contributor

@dwblaikie Are you good with how this patch looks? It would be great to close on it soon. It's the last piece missing for full enablement of .debug_names + types. :)

Copy link
Contributor

@ayermolo ayermolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 64 bit test?

@MaskRay
Copy link
Member Author

MaskRay commented Dec 22, 2023

Thank you for explanation. I think it would be valuable for me, and for the health of the project, if next time we could continue discussion in the original patch I think it would benefit everyone. :)

Thank you for the review and your input. Quality and cohesion are paramount for the project's success.
I consider how we can balance both aspects moving forward to ensure a smoother collaborative process while upholding the project's standards.
I sometimes suggest the description but sometimes it might be better to have a different version.

Do we need 64 bit test?

This version includes both 64-bit and 32-bit tests:)

@MaskRay MaskRay merged commit 26ddf4e into llvm:main Dec 22, 2023
4 checks passed
@MaskRay MaskRay deleted the lld-debug-names branch December 22, 2023 02:59
@bevin-hansson
Copy link
Contributor

Hi @MaskRay ! This patch seems to have uncovered some odd behavior in our downstream, and I'm having a hard time understanding how this tombstoning is supposed to work.

Given the following conditions:

if (tombstone && (expr == R_ABS || expr == R_DTPREL)) {
  if (!sym.getOutputSection() || (ds && ds->folded && !isDebugLine)) {

It seems to me that relocations in debug sections for which the symbols are defined and absolute will always be tombstoned. Is this really the correct behavior? This means that if you have a symbol definition in the debug info from a source file, but where the symbol is instead defined to an absolute value through a linker script or command line option, the debug info relocation will be tombstoned even though the symbol existed. This situation can happen if the symbol in the source file is weak.

How is this intended to work, exactly?

@MaskRay
Copy link
Member Author

MaskRay commented Jan 23, 2024

Hi @MaskRay ! This patch seems to have uncovered some odd behavior in our downstream, and I'm having a hard time understanding how this tombstoning is supposed to work.

Given the following conditions:

if (tombstone && (expr == R_ABS || expr == R_DTPREL)) {
  if (!sym.getOutputSection() || (ds && ds->folded && !isDebugLine)) {

It seems to me that relocations in debug sections for which the symbols are defined and absolute will always be tombstoned. Is this really the correct behavior? This means that if you have a symbol definition in the debug info from a source file, but where the symbol is instead defined to an absolute value through a linker script or command line option, the debug info relocation will be tombstoned even though the symbol existed. This situation can happen if the symbol in the source file is weak.

How is this intended to work, exactly?

Do you have more information about a .debug_* section referencing a SHN_ABS symbol?

I believe a SHN_ABS symbol has never been considered for InputSection::relocateNonAlloc. I personally haven't seen such uses.

Before this patch, the code did made it work in the absence of -z dead-reloc-in-nonalloc=. However, that seems unintentional. I think it makes sense to support it, but we need to move away from !sym.getOutputSection() to discern SHN_ABS and a symbol whose section has been discarded.

@bevin-hansson
Copy link
Contributor

The case where it happens (at least for us) is something like

__attribute__((weak) int symbol;

int main() {
  return (int)&symbol;
}

$ clang foo.c -fuse-ld=lld -Xlinker --defsym=symbol=1

In this case, the value of the symbol becomes 1, but the relocation for the value of the symbol in the .debug_info section gets a 0.

MaskRay added a commit that referenced this pull request Jan 24, 2024
…on-SHF_ALLOC sections (#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Jan 24, 2024
…on-SHF_ALLOC sections (llvm#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before llvm#74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(llvm#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
tstellar pushed a commit that referenced this pull request Jan 26, 2024
…on-SHF_ALLOC sections (#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…on-SHF_ALLOC sections (llvm#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before llvm#74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(llvm#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…on-SHF_ALLOC sections (llvm#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before llvm#74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(llvm#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…on-SHF_ALLOC sections (llvm#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before llvm#74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(llvm#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
tstellar pushed a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…on-SHF_ALLOC sections (llvm#79238)

A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before llvm#74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(llvm#74686 (comment))
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants