[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

cmtice · 2024-02-20T18:00:58Z

[LLVM][DWARF] Refactor code for generating DWARF v5 .debug_names

Refactor the code that uniques the entries and computes the bucket count for the DWARF V5 .debug_names accelerator table.

…lerator table. Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.

llvmbot · 2024-02-20T18:01:31Z

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-debuginfo

Author: None (cmtice)

Changes

…lerator table.

Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.

Full diff: https://github.com/llvm/llvm-project/pull/82394.diff

2 Files Affected:

(modified) llvm/include/llvm/BinaryFormat/Dwarf.h (+19)
(modified) llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp (+3-12)

diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 869352b35e3235..c926b34f9d9048 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -613,6 +613,25 @@ enum AcceleratorTable {
   DW_hash_function_djb = 0u
 };
 
+// Uniquify the string hashes and calculate the bucket count for the
+// DWARF v5 Accelerator Table.
+inline uint32_t
+computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes,
+                              uint32_t &uniqueHashCount) {
+  uint32_t BucketCount = 0;
+
+  sort(hashes);
+  uniqueHashCount = llvm::unique(hashes) - hashes.begin();
+  if (uniqueHashCount > 1024)
+    BucketCount = uniqueHashCount / 4;
+  else if (uniqueHashCount > 16)
+    BucketCount = uniqueHashCount / 2;
+  else
+    BucketCount = std::max<uint32_t>(uniqueHashCount, 1);
+
+  return BucketCount;
+}
+
 // Constants for the GNU pubnames/pubtypes extensions supporting gdb index.
 enum GDBIndexEntryKind {
   GIEK_NONE,
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 22d995a9cc3c56..f8528b489c3c0b 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -33,22 +33,13 @@ using namespace llvm;
 
 void AccelTableBase::computeBucketCount() {
   // First get the number of unique hashes.
-  std::vector<uint32_t> Uniques;
+  SmallVector<uint32_t, 0> Uniques;
   Uniques.reserve(Entries.size());
   for (const auto &E : Entries)
     Uniques.push_back(E.second.HashValue);
-  array_pod_sort(Uniques.begin(), Uniques.end());
-  std::vector<uint32_t>::iterator P =
-      std::unique(Uniques.begin(), Uniques.end());
 
-  UniqueHashCount = std::distance(Uniques.begin(), P);
-
-  if (UniqueHashCount > 1024)
-    BucketCount = UniqueHashCount / 4;
-  else if (UniqueHashCount > 16)
-    BucketCount = UniqueHashCount / 2;
-  else
-    BucketCount = std::max<uint32_t>(UniqueHashCount, 1);
+  BucketCount = llvm::dwarf::computeDebugNamesUniqueHashes(Uniques,
+                                                           UniqueHashCount);
 }
 
 void AccelTableBase::finalize(AsmPrinter *Asm, StringRef Prefix) {

github-actions · 2024-02-20T18:03:27Z

✅ With the latest revision this PR passed the C/C++ code formatter.

cmtice · 2024-02-20T18:04:10Z

Could not fix git push/pull/rebase issues with #82264 so had to create this new pull request. I've tried to address all the review comments from the previous PR:

Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.

Fix clang format issues.

dwblaikie · 2024-02-20T20:46:37Z

llvm/include/llvm/BinaryFormat/Dwarf.h

+  uint32_t BucketCount = 0;
+
+  sort(hashes);
+  uniqueHashCount = llvm::unique(hashes) - hashes.begin();


Without returning the iterator from llvm::unique or otherwise communicating info about hashes out to the caller - I guess the caller isn't expected to use their copy of the data structure again? Might be worth detailing in the doc comment that this function effectively consumes the hashes array? Or is it somehow usable afterwards?

I'm not sure I understand your comment? Yes, I want the changes to hashes to make it back to the caller.

I thought making it a MutableArray meant that the changes would get back to the user? Originally I had it as a SmallVector &, but felipepivezan asked me to change it to a MutableArray instead. LLVM would not allow me to declare the MutableArray as a reference?

LLVM would not allow me to declare the MutableArray as a reference

MutableArrays are meant to be passed by copy, they are just a "non const ptr" + "size" abstraction.

I thought making it a MutableArray meant that the changes would get back to the user?

The changes make it back to the user, but this is not the important distinction between MutableArrayRef and SmallVector. The SmallVector API exposes resizing capabilities, which we are not using inside this function. So you communicate this intent by using the more restricted API.

That said, David's point is that the erase functions only moves elements around, it doesn't actually erase them (contrary to what the name suggests). Because of that, the caller of this function can't possibly have a useful vector after this function returns.

Ok, I went back and double checked the callers; they do NOT expect a usable vector after this call, so that is fine. However I'm getting a compile time error using the MutableArrayRef, which I could use to help figuring out how to fix:

third_party/llvm/llvm-project/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp:43:50: error: non-const lvalue reference to type 'MutableArrayRef<uint32_t>' (aka 'MutableArrayRef') cannot bind to a value of unrelated type 'SmallVector<uint32_t, 0>' (aka 'SmallVector<unsigned int, 0>')
43 | llvm::dwarf::computeDebugNamesUniqueHashes(Uniques, UniqueHashCount);
| ^~~~~~~
third_party/llvm/llvm-project/llvm/include/llvm/BinaryFormat/Dwarf.h:618:74: note: passing argument to parameter 'hashes' here
618 | inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> &hashes,
| ^
1 error generated.

('Uniques', at the call site, line 43, is a SmallVector...)

618 | inline uint32_t computeDebugNamesUniqueHashes( MutableArrayRef<uint32_t> &hashes,

Did you make it a reference? It should not be a reference

No, this function calculates and returns two different values: the bucket count (for which the function is named) and the unique hash count.

In hindsight, we should have just returned a pair

This could be adjusted post-merge? With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

FWIW I 100% agree with you, but every now and then this runs into the LLVM's community general hesitancy to use auto more aggressively.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

Can't think of any example this early, but you could probably do something like:

inline std::pair<uint32_t, uint32_t> getDebugNamesBucketAndHashCount((MutableArrayRef<uint32_t> hashes) { ... return {BucketCount, HashCount}; }

And then on the call site you'd normally do something like:

auto [BuckerCount, HashCount] = getDebugNamesBucketAndHashCount(...)

However, since those are actually member variables, you can't really use a declaration like that. You'll need to store the pair in a temporary variable and assign to the member variables using .first and .second

dwblaikie · 2024-02-20T20:47:19Z

llvm/include/llvm/BinaryFormat/Dwarf.h

@@ -613,6 +613,24 @@ enum AcceleratorTable {
  DW_hash_function_djb = 0u
 };

+// Uniquify the string hashes and calculate the bucket count for the
+// DWARF v5 Accelerator Table.
+inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes,


looks like this maybe should be getDebugNamesBucketCount? (because it doesn't return the unique number of hashes, it returns the number of buckets)

Well it does return the unique number of hashes, in the second parameter, but I'm Ok with renaming the function...

Ah. Yeah, naming it after what it returns seems the most legible.

If the intent is to use the uniqued hashes array after the call (not relevant to the current caller, but possibly relevant to the new usage you want to add) maybe the simplest thing is to have the function take a SmallVectorImpl<uint32_t>&, and after calling uique, use that result to shorten the vector down to the size of the uniqued range (by calling the vector's resize function) - then the caller can inspect the vector's size after calling, and doesn't need the secondary out parameter?

I've renamed the function as suggested, and added a note in the function comment that it consumes the array parameter.

Change function name from 'computeDebugNamesUniqueHashes' to 'getDebugNamesBucketCount', and add coment that this function consumes its input parameter.

cmtice · 2024-02-20T23:08:01Z

I think I've addressed all the review comments now? Please let me know if there's anything else I should do for this pull request.

Clang format fixes.

cmtice requested a review from felipepiovezan February 20, 2024 18:01

llvmbot added debuginfo llvm:binary-utilities labels Feb 20, 2024

cmtice requested review from dwblaikie and MaskRay February 20, 2024 18:01

[LLVM][DWARF] Refactor code for generating DWARF V5.debug_names

3ca2172

Fix clang format issues.

cmtice changed the title ~~[LLVM][DWARF] Refactor code for generating DWARF V4 .debug_names acce…~~ [LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names Feb 20, 2024

cmtice mentioned this pull request Feb 20, 2024

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names acce… #82264

Closed

dwblaikie reviewed Feb 20, 2024

View reviewed changes

[LLVM][DWARF] Refactor code for generating DWARF v5 .debug_names

1877363

Change function name from 'computeDebugNamesUniqueHashes' to 'getDebugNamesBucketCount', and add coment that this function consumes its input parameter.

[LLVM][DWARF] Refactor code for generating DWARF v5 .debug_names

fc2fbf8

Clang format fixes.

dwblaikie approved these changes Feb 21, 2024

View reviewed changes

felipepiovezan approved these changes Feb 21, 2024

View reviewed changes

cmtice merged commit 453b1a2 into llvm:main Feb 21, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

cmtice commented Feb 20, 2024 •

edited

llvmbot commented Feb 20, 2024 •

edited

github-actions bot commented Feb 20, 2024 •

edited

cmtice commented Feb 20, 2024

dwblaikie Feb 20, 2024

cmtice Feb 20, 2024 •

edited

felipepiovezan Feb 20, 2024

cmtice Feb 20, 2024 •

edited

felipepiovezan Feb 20, 2024 •

edited

cmtice Feb 21, 2024

felipepiovezan Feb 21, 2024

jh7370 Feb 22, 2024

cmtice Feb 22, 2024

felipepiovezan Feb 22, 2024

dwblaikie Feb 20, 2024

cmtice Feb 20, 2024

dwblaikie Feb 20, 2024

cmtice Feb 20, 2024

cmtice commented Feb 20, 2024

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

Conversation

cmtice commented Feb 20, 2024 • edited

llvmbot commented Feb 20, 2024 • edited

github-actions bot commented Feb 20, 2024 • edited

cmtice commented Feb 20, 2024

Choose a reason for hiding this comment

cmtice Feb 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmtice Feb 20, 2024 • edited

Choose a reason for hiding this comment

felipepiovezan Feb 20, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmtice commented Feb 20, 2024

cmtice commented Feb 20, 2024 •

edited

llvmbot commented Feb 20, 2024 •

edited

github-actions bot commented Feb 20, 2024 •

edited

cmtice Feb 20, 2024 •

edited

cmtice Feb 20, 2024 •

edited

felipepiovezan Feb 20, 2024 •

edited