Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DWARFVerifier] Fix debug_str_offsets DWARF version detection #81210

Conversation

felipepiovezan
Copy link
Contributor

The DWARF 5 debug_str_offsets section starts with a header, which must be skipped in order to access the underlying strps.

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

How does The DWARF verifier figure out which version to use? It manually reads the first header in debug_info and uses that. This is wrong when multiple debug_str_offset sections have been linked together, in particular it is wrong in the following two cases:

  1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a standard DWARF 5 object file.
  2. A non-standard DWARF 4 object file (i.e. containing the header-less debug_str_offsets section) linked with a standard DWARF 5 object file.

This patch provides a quick to fix case (1): we use the MaxVersion from the DWARFContext instead of reading it from the debug_info section manually. Since this is dealing with standard-conforming formats, which should be linked together without issues, the verifier must handle it.

Fixing case 2 would require a lot of rework, restructuring how each piece of the debug_str_offsets is visited and, since this is dealing with a non-standard format, is left for the future in case anyone cares enough about this case.

The DWARF 5 debug_str_offsets section starts with a header, which must be
skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section
(with the same section name), which does not have a header. In this case, the
offsets start on the first byte of the section, although it's not clear where
this is documented.

How does The DWARF verifier figure out which version to use? It manually reads
the **first** header in debug_info and uses that. This is wrong when multiple
debug_str_offset sections have been linked together, in particular it is wrong
in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a
standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

This patch provides a quick to fix case (1): we use the `MaxVersion` from the
DWARFContext instead of reading it from the debug_info section manually. Since
this is dealing with standard-conforming formats, which should be linked
together without issues, the verifier must handle it.

Fixing case 2 would require a lot of rework, restructuring how each piece of the
debug_str_offsets is visited and, since this is dealing with a non-standard
format, is left for the future in case anyone cares enough about this case.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 9, 2024

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

The DWARF 5 debug_str_offsets section starts with a header, which must be skipped in order to access the underlying strps.

However, the verifier supports some pre-standardization version of this section (with the same section name), which does not have a header. In this case, the offsets start on the first byte of the section, although it's not clear where this is documented.

How does The DWARF verifier figure out which version to use? It manually reads the first header in debug_info and uses that. This is wrong when multiple debug_str_offset sections have been linked together, in particular it is wrong in the following two cases:

  1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a standard DWARF 5 object file.
  2. A non-standard DWARF 4 object file (i.e. containing the header-less debug_str_offsets section) linked with a standard DWARF 5 object file.

This patch provides a quick to fix case (1): we use the MaxVersion from the DWARFContext instead of reading it from the debug_info section manually. Since this is dealing with standard-conforming formats, which should be linked together without issues, the verifier must handle it.

Fixing case 2 would require a lot of rework, restructuring how each piece of the debug_str_offsets is visited and, since this is dealing with a non-standard format, is left for the future in case anyone cares enough about this case.


Full diff: https://github.com/llvm/llvm-project/pull/81210.diff

3 Files Affected:

  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h (+2-1)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp (+14-11)
  • (added) llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml (+57)
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
index ea73664b1e46ca..6c5df409fe6de8 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFVerifier.h
@@ -361,7 +361,8 @@ class DWARFVerifier {
   /// \returns true if the .debug_line verifies successfully, false otherwise.
   bool handleDebugStrOffsets();
   bool verifyDebugStrOffsets(
-      StringRef SectionName, const DWARFSection &Section, StringRef StrData,
+      uint8_t MaxVersion, StringRef SectionName, const DWARFSection &Section,
+      StringRef StrData,
       void (DWARFObject::*)(function_ref<void(const DWARFSection &)>) const);
 
   /// Emits any aggregate information collected, depending on the dump options
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
index 2124ff835c5727..805f40af217e1e 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
@@ -1882,29 +1882,31 @@ bool DWARFVerifier::handleDebugStrOffsets() {
   const DWARFObject &DObj = DCtx.getDWARFObj();
   bool Success = true;
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets.dwo", DObj.getStrOffsetsDWOSection(),
-      DObj.getStrDWOSection(), &DWARFObject::forEachInfoDWOSections);
+      DCtx.getMaxDWOVersion(), ".debug_str_offsets.dwo",
+      DObj.getStrOffsetsDWOSection(), DObj.getStrDWOSection(),
+      &DWARFObject::forEachInfoDWOSections);
   Success &= verifyDebugStrOffsets(
-      ".debug_str_offsets", DObj.getStrOffsetsSection(), DObj.getStrSection(),
-      &DWARFObject::forEachInfoSections);
+      DCtx.getMaxVersion(), ".debug_str_offsets", DObj.getStrOffsetsSection(),
+      DObj.getStrSection(), &DWARFObject::forEachInfoSections);
   return Success;
 }
 
 bool DWARFVerifier::verifyDebugStrOffsets(
-    StringRef SectionName, const DWARFSection &Section, StringRef StrData,
+    uint8_t MaxVersion, StringRef SectionName, const DWARFSection &Section,
+    StringRef StrData,
     void (DWARFObject::*VisitInfoSections)(
         function_ref<void(const DWARFSection &)>) const) {
   const DWARFObject &DObj = DCtx.getDWARFObj();
-  uint16_t InfoVersion = 0;
-  DwarfFormat InfoFormat = DwarfFormat::DWARF32;
+
+  std::optional<DwarfFormat> MaybeInfoFormat;
   (DObj.*VisitInfoSections)([&](const DWARFSection &S) {
-    if (InfoVersion)
+    if (MaybeInfoFormat)
       return;
     DWARFDataExtractor DebugInfoData(DObj, S, DCtx.isLittleEndian(), 0);
     uint64_t Offset = 0;
-    InfoFormat = DebugInfoData.getInitialLength(&Offset).second;
-    InfoVersion = DebugInfoData.getU16(&Offset);
+    MaybeInfoFormat = DebugInfoData.getInitialLength(&Offset).second;
   });
+  DwarfFormat InfoFormat = MaybeInfoFormat.value_or(DwarfFormat::DWARF32);
 
   DWARFDataExtractor DA(DObj, Section, DCtx.isLittleEndian(), 0);
 
@@ -1915,7 +1917,8 @@ bool DWARFVerifier::verifyDebugStrOffsets(
     DwarfFormat Format;
     uint64_t Length;
     uint64_t StartOffset = C.tell();
-    if (InfoVersion == 4) {
+    if (MaxVersion == 4) {
+      // Pre-standardization debug_str_offsets had no header.
       Format = InfoFormat;
       Length = DA.getData().size();
       NextUnit = C.tell() + Length;
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
new file mode 100644
index 00000000000000..d10460896171d6
--- /dev/null
+++ b/llvm/test/tools/llvm-dwarfdump/X86/debug-str-offsets-mixed-dwarf-4-5.yaml
@@ -0,0 +1,57 @@
+# RUN: yaml2obj %s -o %t.o
+# RUN: llvm-dwarfdump -debug-str-offsets -verify %t.o | FileCheck %s
+
+# CHECK: Verifying .debug_str_offsets...
+# CHECK: No errors
+
+# Check that when mixing standard DWARF 4 debug information with standard DWARF
+# 5 debug information, the verifier correctly interprets the debug_str_offsets
+# section as a standards-conforming DWARF 5 section.
+
+--- !ELF
+FileHeader:
+  Class: ELFCLASS64
+  Data:  ELFDATA2LSB
+  Type:  ET_EXEC
+DWARF:
+  debug_str:
+    - 'cu1'
+    - 'cu2'
+  debug_str_offsets:
+    - Offsets:
+        - 0x0
+  debug_abbrev:
+    - Table:
+        - Code:            0x1
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strp
+        - Code:            0x2
+          Tag:             DW_TAG_compile_unit
+          Children:        DW_CHILDREN_no
+          Attributes:
+            - Attribute:       DW_AT_name
+              Form:            DW_FORM_strx1
+            - Attribute:       DW_AT_str_offsets_base
+              Form:            DW_FORM_sec_offset
+  debug_info:
+    - Version:         4
+      AbbrevTableID:   0
+      AbbrOffset:      0x0
+      AddrSize:        8
+      Entries:
+        - AbbrCode:        0x1
+          Values:
+            - Value:           0x4
+    - Version:         5
+      UnitType:        DW_UT_compile
+      AbbrOffset:      0x0
+      AddrSize:        8
+      AbbrevTableID:   0
+      Entries:
+        - AbbrCode:        0x2
+          Values:
+            - Value:           0x0
+            - Value:           0x8 # str offsets base

@felipepiovezan
Copy link
Contributor Author

@dwblaikie I believe you originally added this piece of the verifier.

We have lots of downstream tests failing because they link "plain old dwarf 4" with dwarf 5, and the verifier is incorrectly interpreting the debug_str_offsets section as DWARF 4, so I'd appreciate it if you could have a look at this.

@dwblaikie
Copy link
Collaborator

The legacy DWARFv4 Split DWARF mode str offsets format can only appear in .dwo files (it was/is never emitted into .o files - many of the Split DWARF features that were generalized to be applicable to .o files (& to non-split DWARF entirely) were not done in the prototype/legacy version, only in the DWARFv5 standard version)

And a DWP file can't contain mixed 4/5 Split DWARF anyway - so I think we could leave the "use the first CU version" in place, but only do it for .debug_str_offsets.dwo - and always use the header-ful parsing of .debug_str_offsets (nondwo) - it'll be future proof and can mix/match versions because it'll always have the length+version encoding.

Does that make sense/work?

@felipepiovezan
Copy link
Contributor Author

The legacy DWARFv4 Split DWARF mode str offsets format can only appear in .dwo files (it was/is never emitted into .o files - many of the Split DWARF features that were generalized to be applicable to .o files (& to non-split DWARF entirely) were not done in the prototype/legacy version, only in the DWARFv5 standard version)

And a DWP file can't contain mixed 4/5 Split DWARF anyway - so I think we could leave the "use the first CU version" in place, but only do it for .debug_str_offsets.dwo - and always use the header-ful parsing of .debug_str_offsets (nondwo) - it'll be future proof and can mix/match versions because it'll always have the length+version encoding.

Does that make sense/work?

I think this makes sense, thank you, let me give this a try and update the patch!
DWARFVerifier::handleDebugStrOffsets calls DWARFVerifier::verifyDebugStrOffsets twice, once for the dwo version and once for the non-dwo version; that might be the place we can make this distinction.

@felipepiovezan
Copy link
Contributor Author

Since I am essentially going to implement a new approach and will need to redo the commit message / pr message, I will close this and open a new PR.

felipepiovezan added a commit to felipepiovezan/llvm-project that referenced this pull request Feb 9, 2024
The DWARF 5 debug_str_offsets section starts with a header, which must be
skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this section
(with the same section name), which does not have a header. In this case, the
offsets start on the first byte of the section, although it's not clear where
this is documented.

How does The DWARF verifier figure out which version to use? It manually reads
the **first** header in debug_info and uses that. This is wrong when multiple
debug_str_offset sections have been linked together, in particular it is wrong
in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked with a
standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in llvm#81210,
the legacy version is only possible with dwo files, and dwo files cannot mix the
legacy version with the dwarf 5 version. As such, we change the verifier to only
check the debug_info header in the case of dwo files. If it sees a dwarf 4
version, it handles it the legacy way.

Note: the modified test _never_ worked to being with. It was emitting the error
message by accident, because it treated the "legacy" dwarf 4 section as a dwarf
5 section. To see why, simply note that the test contained no `debug_info.dwo`
sections, so the call to DWARFObject::forEachInfoDWOSections was doing nothing.
felipepiovezan added a commit that referenced this pull request Feb 12, 2024
The DWARF 5 debug_str_offsets section starts with a header, which must
be skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this
section (with the same section name), which does not have a header. In
this case, the offsets start on the first byte of the section. More in
[1] and [2] about this legacy section.

How does The DWARF verifier figure out which version to use? It manually
reads the **first** header in debug_info and uses that. This is wrong
when multiple debug_str_offset sections have been linked together, in
particular it is wrong in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked
with a standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in #81210,
the legacy version is only possible with dwo files, and dwo files cannot
mix the legacy version with the dwarf 5 version. As such, we change the
verifier to only check the debug_info header in the case of dwo files.
If it sees a dwarf 4 version, it handles it the legacy way.

Note: the modified test was technically testing an unsupported
combination of dwarf version + non-dwo sections. To see why, simply note
that the test contained no `debug_info.dwo` sections, so the call to
DWARFObject::forEachInfoDWOSections was doing nothing. We were finding
the error through the "standard version", which shouldn't happen.

[1]: https://gcc.gnu.org/wiki/DebugFission 
[2]: https://gcc.gnu.org/wiki/DebugFissionDWP
felipepiovezan added a commit to felipepiovezan/llvm-project that referenced this pull request Feb 12, 2024
…1303)

The DWARF 5 debug_str_offsets section starts with a header, which must
be skipped in order to access the underlying `strp`s.

However, the verifier supports some pre-standardization version of this
section (with the same section name), which does not have a header. In
this case, the offsets start on the first byte of the section. More in
[1] and [2] about this legacy section.

How does The DWARF verifier figure out which version to use? It manually
reads the **first** header in debug_info and uses that. This is wrong
when multiple debug_str_offset sections have been linked together, in
particular it is wrong in the following two cases:

1. A standard DWARF 4 object file (i.e. no debug_str_offsets) linked
with a standard DWARF 5 object file.
2. A non-standard DWARF 4 object file (i.e. containing the header-less
debug_str_offsets section) linked with a standard DWARF 5 object file.

Based on discussions in llvm#81210,
the legacy version is only possible with dwo files, and dwo files cannot
mix the legacy version with the dwarf 5 version. As such, we change the
verifier to only check the debug_info header in the case of dwo files.
If it sees a dwarf 4 version, it handles it the legacy way.

Note: the modified test was technically testing an unsupported
combination of dwarf version + non-dwo sections. To see why, simply note
that the test contained no `debug_info.dwo` sections, so the call to
DWARFObject::forEachInfoDWOSections was doing nothing. We were finding
the error through the "standard version", which shouldn't happen.

[1]: https://gcc.gnu.org/wiki/DebugFission
[2]: https://gcc.gnu.org/wiki/DebugFissionDWP

(cherry picked from commit 20948df)
@felipepiovezan felipepiovezan deleted the felipe/dwarf-verifier-debug-str branch February 14, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants