[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

ayermolo · 2024-02-01T01:49:21Z

This stemps from conversatin in: #77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.

llvmbot · 2024-02-01T01:49:51Z

@llvm/pr-subscribers-debuginfo

Author: Alexander Yermolovich (ayermolo)

Changes

This stemps from conversatin in: #77457 (comment).
Right now Abbrev code for abbrev is combination of DIE TAG and other attributes.
In the future it will be changed to be an index. Since DenseSet does not
preserve an order, added a sort based on abbrev code. Once change to index is
made, it will print out abbrevs in the order they are stored.

Full diff: https://github.com/llvm/llvm-project/pull/80229.diff

3 Files Affected:

(modified) llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp (+9-2)
(modified) llvm/test/DebugInfo/X86/debug-names-dwarf64.ll (+4-4)
(modified) llvm/test/DebugInfo/X86/debug-names-types.ll (+12-12)

diff --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index a427dd604ade7..1a51c2354dc29 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -847,8 +847,15 @@ void DWARFDebugNames::NameIndex::dumpForeignTUs(ScopedPrinter &W) const {
 
 void DWARFDebugNames::NameIndex::dumpAbbreviations(ScopedPrinter &W) const {
   ListScope AbbrevsScope(W, "Abbreviations");
-  for (const auto &Abbr : Abbrevs)
-    Abbr.dump(W);
+  std::vector<const Abbrev *> AbbrevsVect;
+  for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs)
+    AbbrevsVect.push_back(&Abbr);
+  std::sort(AbbrevsVect.begin(), AbbrevsVect.end(),
+            [](const Abbrev *LHS, const Abbrev *RHS) {
+              return LHS->Code < RHS->Code;
+            });
+  for (const llvm::DWARFDebugNames::Abbrev *Abbr : AbbrevsVect)
+    Abbr->dump(W);
 }
 
 void DWARFDebugNames::NameIndex::dumpBucket(ScopedPrinter &W,
diff --git a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
index c15e2ad1d56b0..62ab8de44f0a2 100644
--- a/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-dwarf64.ll
@@ -40,13 +40,13 @@
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_variable
+; CHECK-NEXT:     Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_subprogram
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV_SP:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_subprogram
+; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_variable
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
diff --git a/llvm/test/DebugInfo/X86/debug-names-types.ll b/llvm/test/DebugInfo/X86/debug-names-types.ll
index f41bb5524b9c3..ed32e56fa71b6 100644
--- a/llvm/test/DebugInfo/X86/debug-names-types.ll
+++ b/llvm/test/DebugInfo/X86/debug-names-types.ll
@@ -37,14 +37,13 @@
 ; CHECK-NEXT:        LocalTU[0]: 0x00000000
 ; CHECK-NEXT:      ]
 ; CHECK:        Abbreviations [
-; CHECK-NEXT:     Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
 ; CHECK-NEXT:       Tag: DW_TAG_structure_type
-; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_base_type
+; CHECK-NEXT:     Abbreviation [[ABBREV3:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_structure_type
 ; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
@@ -54,8 +53,9 @@
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
-; CHECK-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
-; CHECK-NEXT:       Tag: DW_TAG_structure_type
+; CHECK-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-NEXT:       Tag: DW_TAG_base_type
+; CHECK-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-NEXT:     }
@@ -140,14 +140,13 @@
 ; CHECK-SPLIT-NEXT:     ForeignTU[0]: 0x675d23e4f33235f2
 ; CHECK-SPLIT-NEXT:   ]
 ; CHECK-SPLIT-NEXT:   Abbreviations [
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV:0x[0-9a-f]*]] {
 ; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
-; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT:       Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV1:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
 ; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
@@ -157,8 +156,9 @@
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }
-; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV:0x[0-9a-f]*]] {
-; CHECK-SPLIT-NEXT:       Tag: DW_TAG_structure_type
+; CHECK-SPLIT-NEXT:     Abbreviation [[ABBREV4:0x[0-9a-f]*]] {
+; CHECK-SPLIT-NEXT:       Tag: DW_TAG_base_type
+; CHECK-SPLIT-NEXT:       DW_IDX_type_unit: DW_FORM_data1
 ; CHECK-SPLIT-NEXT:       DW_IDX_die_offset: DW_FORM_ref4
 ; CHECK-SPLIT-NEXT:       DW_IDX_parent: DW_FORM_flag_present
 ; CHECK-SPLIT-NEXT:     }

ayermolo · 2024-02-01T02:01:37Z

Alternative is to keep a vector as abbrevs read in. I looked into SetVector but doesn't look like it has find api. Used in NameIndex::getEntry. Also seems a tad wasteful since it only matters for printing out.

This stemps from conversatin in: llvm#77457 (comment). Right now Abbrev code for abbrev is combination of DIE TAG and other attributes. In the future it will be changed to be an index. Since DenseSet does not preserve an order, added a sort based on abbrev code. Once change to index is made, it will print out abbrevs in the order they are stored.

clayborg

When we dump things we should dump them in the order in which they appear in the .debug_xxx section. Here we are sorting and dumping them in some order. I realize eventually the abbrevs will have an index and that will make the order more natural, but when dumping the contents of the section, we should do it in the order in which they are defined in the file at all times since we are wanting to see the contents of the section we asked to dump. There aren't usually that many abbreviation combinations right? So maybe the dump function can parse the info again and dump each entry it finds in the right order and throw away the temp llvm::DWARFDebugNames::Abbrev item as it dumps each one? Or we can maintain an offset in each llvm::DWARFDebugNames::Abbrev entry and then sort by that when dumping?

ayermolo · 2024-02-01T22:07:36Z

When we dump things we should dump them in the order in which they appear in the .debug_xxx section. Here we are sorting and dumping them in some order. I realize eventually the abbrevs will have an index and that will make the order more natural, but when dumping the contents of the section, we should do it in the order in which they are defined in the file at all times since we are wanting to see the contents of the section we asked to dump. There aren't usually that many abbreviation combinations right? So maybe the dump function can parse the info again and dump each entry it finds in the right order and throw away the temp llvm::DWARFDebugNames::Abbrev item as it dumps each one? Or we can maintain an offset in each llvm::DWARFDebugNames::Abbrev entry and then sort by that when dumping?

I agree we should print in the same order as it's in section.
Changed to sort by offset.

clayborg

A few quick things in the inline comments and this will be good to go.

llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

dwblaikie · 2024-02-01T22:19:24Z

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

clayborg · 2024-02-01T23:35:18Z

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

The abbrevs currently makes a DWARFAbbreviationDeclarationSet which contains a std::vector<DWARFAbbreviationDeclaration> Decls; and it also maintains a value call FirstAbbrCode which is set to UINT32_MAX if the abbrevs were not indexed starting with some number. Compilers like to use 1 and the starting number. Then when we ask for an abbrev from the DWARFAbbreviationDeclarationSet, it sees is FirstAbbrCode is not set to UINT32_MAX, and if it isn't it returns a direct access using:

return &Decls[AbbrCode - FirstAbbrCode];

So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.

The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

felipepiovezan

LGTM!

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

felipepiovezan · 2024-02-01T23:36:28Z

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

+  std::vector<const Abbrev *> AbbrevsVect;
+  for (const llvm::DWARFDebugNames::Abbrev &Abbr : Abbrevs)
+    AbbrevsVect.push_back(&Abbr);
+  std::sort(AbbrevsVect.begin(), AbbrevsVect.end(),


We should never use std::sort, instead using the range-based (with some shuffling magic for expensive asserts) sort from stl extras:

sort(AbbrevsVect, [] (...){});

https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

My interpretation that only applies if we have equal elements. Which shouldn't be the case in this case. Abbrev offsets are monotonically increasing.

ayermolo · 2024-02-01T23:51:24Z

Can we revisit this after llvm switches to indices?

clayborg · 2024-02-02T00:22:54Z

Can we revisit this after llvm switches to indices?

I am fine with taking care of switching to a more DebugAbbrev like solution later.

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

dwblaikie · 2024-02-05T21:51:14Z

How's this compare to how we handle .debug_abbrevs? (perhaps we could be sharing some parsing infrastructure, the same as I'm suggesting/hoping we share some generation infrastructure - but even if not shared code, bringing two different implementations into alignment so they do/express things more similarly would be good)

The abbrevs currently makes a DWARFAbbreviationDeclarationSet which contains a std::vector<DWARFAbbreviationDeclaration> Decls; and it also maintains a value call FirstAbbrCode which is set to UINT32_MAX if the abbrevs were not indexed starting with some number. Compilers like to use 1 and the starting number. Then when we ask for an abbrev from the DWARFAbbreviationDeclarationSet, it sees is FirstAbbrCode is not set to UINT32_MAX, and if it isn't it returns a direct access using:
return &Decls[AbbrCode - FirstAbbrCode];
So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.

The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

I'm not sure I understand - what do you mean by "the abbrev codes are not indexed"? Because of LLVM's current output that uses the weird bit fiddling?

I'd consider that a bug/suboptimality, and I'd be fine with llvm-dwarfdump devolving to a linear search through abbrevs in the case where the abbrevs are not monotonically increasing.

clayborg · 2024-02-05T22:01:22Z

So O(1) lookups. Else it falls back to a very costly linear search. But most compilers emit the abbrevs with and index so this works well for 99% of the cases.
The DWARFDebugNames::Entry is quite different and stored in a hash map as the abbrev codes are not indexed.

I'm not sure I understand - what do you mean by "the abbrev codes are not indexed"? Because of LLVM's current output that uses the weird bit fiddling?

The abbreviation codes are not 1 based indexes like they are for .debug_abbrev:

  Abbreviations [
    Abbreviation 0x1718 {
      Tag: DW_TAG_subprogram
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0xb18 {
      Tag: DW_TAG_typedef
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0x998 {
      Tag: DW_TAG_structure_type
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
    Abbreviation 0x1218 {
      Tag: DW_TAG_base_type
      DW_IDX_die_offset: DW_FORM_ref4
      DW_IDX_parent: DW_FORM_flag_present
    }
  ]

I'd consider that a bug/suboptimality, and I'd be fine with llvm-dwarfdump devolving to a linear search through abbrevs in the case where the abbrevs are not monotonically increasing.

I was mostly commenting that .debug_abbrev code lookup is O(1) where if we don't use indexes for the abbrev codes we must do some sort of search and if this happens in .debug_abbrev the search is linear. And of course any code that tries to find abbrevs should work (linear or direct access) depending on what the input is. The main point is it is much more efficient to use 1 based indexes since zero is reserved so that we can do effecient lookups. Most of what I was pointing out was around what .debug_abbrev does right and how it is different from what we are doing for .debug_names abbrev codes.

) This stemps from conversatin in: llvm#77457 (comment). Right now Abbrev code for abbrev is combination of DIE TAG and other attributes. In the future it will be changed to be an index. Since DenseSet does not preserve an order, added a sort based on abbrev code. Once change to index is made, it will print out abbrevs in the order they are stored.

dwblaikie · 2024-02-07T00:33:30Z

I was mostly commenting that .debug_abbrev code lookup is O(1) where if we don't use indexes for the abbrev codes we must do some sort of search and if this happens in .debug_abbrev the search is linear. And of course any code that tries to find abbrevs should work (linear or direct access) depending on what the input is. The main point is it is much more efficient to use 1 based indexes since zero is reserved so that we can do effecient lookups. Most of what I was pointing out was around what .debug_abbrev does right and how it is different from what we are doing for .debug_names abbrev codes.

Fair - thanks for helping me understand what we're doing today (on the LLVM emission side - we aren't emitting monotonically increasing debug_names abbrevation numbers, compared to debug_abbrev where we do use monotonically increasing abbrev numbers).

I think that's a mistake in LLVM's emission code, and not one we should worry about when designing llvm-dwarfdump. Both sides (LLVM DWARF emission, and llvm-dwarfdump parsing) should be fixed (to behave similarly to - but I don't think we need to wait for the emission to get better/fixed before we make llvm-dwarfdump handle things well)

I'd prefer to see the llvm-dwarfdump code not use a hash map, ideally reuse a generalized form of the debug_abbrev handling code - that's efficient for monotonically increasing abbrev numbers, and falls back to a linear search otherwise - and the printing should print from that list that's stored in the same order it's read from the input - and that list can be directly indexed if it's monotonically increasing, or linearly searched if it's not.

ayermolo · 2024-02-08T00:33:22Z

OK, put up a PR for .debug_names in BOLT (no parent index support for now). Let me circle back to this, and change implementation to be sequential on LLVM side.

Based on the discussion in llvm#80229 changed implementation to align with how .debug_abbrev is handled. So that .debug_names abbrev tag is a monotonically increasing index. This allows for tools like LLDB to access it in constant time.

Based on the discussion in #80229 changed implementation to align with how .debug_abbrev is handled. So that .debug_names abbrev tag is a monotonically increasing index. This allows for tools like LLDB to access it in constant time using array like data structure. clang-19 debug build before change  [41] .debug_names PROGBITS 0000000000000000 8f9e0350 137fdbe0 00 0 0 4 after change [41] .debug_names PROGBITS 0000000000000000 8f9e0350 125bfdec 00 0 0 4 Reduction ~19.1MB

llvmbot added the debuginfo label Feb 1, 2024

ayermolo requested a review from felipepiovezan February 1, 2024 01:59

ayermolo force-pushed the dwarfDumpAbbrevInOrder branch from 219fce8 to da9e927 Compare February 1, 2024 19:16

ayermolo requested a review from clayborg February 1, 2024 19:59

clayborg reviewed Feb 1, 2024

View reviewed changes

Sort by offset

7682bb9

clayborg reviewed Feb 1, 2024

View reviewed changes

llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h Show resolved Hide resolved

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp Outdated Show resolved Hide resolved

moved struct members around, marked const

deb32f4

felipepiovezan approved these changes Feb 1, 2024

View reviewed changes

removed llvm

23795c5

switched to llvm::sort

e7f39c3

felipepiovezan reviewed Feb 2, 2024

View reviewed changes

llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp Outdated Show resolved Hide resolved

removed begin/end

a98f430

ayermolo merged commit 095367a into llvm:main Feb 2, 2024
3 of 4 checks passed

ayermolo deleted the dwarfDumpAbbrevInOrder branch February 2, 2024 20:36

ayermolo mentioned this pull request Feb 8, 2024

[LLVM][DWARF] Change .debug_names abbrev to be an index #81200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

ayermolo commented Feb 1, 2024

llvmbot commented Feb 1, 2024

ayermolo commented Feb 1, 2024

clayborg left a comment

ayermolo commented Feb 1, 2024

clayborg left a comment

dwblaikie commented Feb 1, 2024

clayborg commented Feb 1, 2024

felipepiovezan left a comment

felipepiovezan Feb 1, 2024 •

edited

ayermolo Feb 1, 2024

ayermolo commented Feb 1, 2024

clayborg commented Feb 2, 2024

dwblaikie commented Feb 5, 2024

clayborg commented Feb 5, 2024

dwblaikie commented Feb 7, 2024

ayermolo commented Feb 8, 2024

[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

[LLVM][DWARF] Chnage order for .debug_names abbrev print out #80229

Conversation

ayermolo commented Feb 1, 2024

llvmbot commented Feb 1, 2024

ayermolo commented Feb 1, 2024

clayborg left a comment

Choose a reason for hiding this comment

ayermolo commented Feb 1, 2024

clayborg left a comment

Choose a reason for hiding this comment

dwblaikie commented Feb 1, 2024

clayborg commented Feb 1, 2024

felipepiovezan left a comment

Choose a reason for hiding this comment

felipepiovezan Feb 1, 2024 • edited

Choose a reason for hiding this comment

ayermolo Feb 1, 2024

Choose a reason for hiding this comment

ayermolo commented Feb 1, 2024

clayborg commented Feb 2, 2024

dwblaikie commented Feb 5, 2024

clayborg commented Feb 5, 2024

dwblaikie commented Feb 7, 2024

ayermolo commented Feb 8, 2024

felipepiovezan Feb 1, 2024 •

edited