Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394

Merged
merged 4 commits into from
Feb 21, 2024

Conversation

cmtice
Copy link
Contributor

@cmtice cmtice commented Feb 20, 2024

[LLVM][DWARF] Refactor code for generating DWARF v5 .debug_names

Refactor the code that uniques the entries and computes the bucket count for the DWARF V5 .debug_names accelerator table.

…lerator table.

Update computeDebugNamesUniqueHashes function to start with lowercase letter,
to use a MutableArrayRef for the hashes parameter instead of SmallVector,
and made uniqueHashCount a reference parameter, to allow the accelerator
table class member to be properly updated. Also removed redundant
namespace qualifier.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 20, 2024

@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-debuginfo

Author: None (cmtice)

Changes

…lerator table.

Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.


Full diff: https://github.com/llvm/llvm-project/pull/82394.diff

2 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Dwarf.h (+19)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp (+3-12)
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 869352b35e3235..c926b34f9d9048 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -613,6 +613,25 @@ enum AcceleratorTable {
   DW_hash_function_djb = 0u
 };
 
+// Uniquify the string hashes and calculate the bucket count for the
+// DWARF v5 Accelerator Table.
+inline uint32_t
+computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes,
+                              uint32_t &uniqueHashCount) {
+  uint32_t BucketCount = 0;
+
+  sort(hashes);
+  uniqueHashCount = llvm::unique(hashes) - hashes.begin();
+  if (uniqueHashCount > 1024)
+    BucketCount = uniqueHashCount / 4;
+  else if (uniqueHashCount > 16)
+    BucketCount = uniqueHashCount / 2;
+  else
+    BucketCount = std::max<uint32_t>(uniqueHashCount, 1);
+
+  return BucketCount;
+}
+
 // Constants for the GNU pubnames/pubtypes extensions supporting gdb index.
 enum GDBIndexEntryKind {
   GIEK_NONE,
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 22d995a9cc3c56..f8528b489c3c0b 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -33,22 +33,13 @@ using namespace llvm;
 
 void AccelTableBase::computeBucketCount() {
   // First get the number of unique hashes.
-  std::vector<uint32_t> Uniques;
+  SmallVector<uint32_t, 0> Uniques;
   Uniques.reserve(Entries.size());
   for (const auto &E : Entries)
     Uniques.push_back(E.second.HashValue);
-  array_pod_sort(Uniques.begin(), Uniques.end());
-  std::vector<uint32_t>::iterator P =
-      std::unique(Uniques.begin(), Uniques.end());
 
-  UniqueHashCount = std::distance(Uniques.begin(), P);
-
-  if (UniqueHashCount > 1024)
-    BucketCount = UniqueHashCount / 4;
-  else if (UniqueHashCount > 16)
-    BucketCount = UniqueHashCount / 2;
-  else
-    BucketCount = std::max<uint32_t>(UniqueHashCount, 1);
+  BucketCount = llvm::dwarf::computeDebugNamesUniqueHashes(Uniques,
+                                                           UniqueHashCount);
 }
 
 void AccelTableBase::finalize(AsmPrinter *Asm, StringRef Prefix) {

Copy link

github-actions bot commented Feb 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@cmtice
Copy link
Contributor Author

cmtice commented Feb 20, 2024

Could not fix git push/pull/rebase issues with #82264 so had to create this new pull request. I've tried to address all the review comments from the previous PR:

Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.

@cmtice cmtice changed the title [LLVM][DWARF] Refactor code for generating DWARF V4 .debug_names acce… [LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names Feb 20, 2024
uint32_t BucketCount = 0;

sort(hashes);
uniqueHashCount = llvm::unique(hashes) - hashes.begin();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without returning the iterator from llvm::unique or otherwise communicating info about hashes out to the caller - I guess the caller isn't expected to use their copy of the data structure again? Might be worth detailing in the doc comment that this function effectively consumes the hashes array? Or is it somehow usable afterwards?

Copy link
Contributor Author

@cmtice cmtice Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment? Yes, I want the changes to hashes to make it back to the caller.

I thought making it a MutableArray meant that the changes would get back to the user? Originally I had it as a SmallVector &, but felipepivezan asked me to change it to a MutableArray instead. LLVM would not allow me to declare the MutableArray as a reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM would not allow me to declare the MutableArray as a reference

MutableArrays are meant to be passed by copy, they are just a "non const ptr" + "size" abstraction.

I thought making it a MutableArray meant that the changes would get back to the user?

The changes make it back to the user, but this is not the important distinction between MutableArrayRef and SmallVector. The SmallVector API exposes resizing capabilities, which we are not using inside this function. So you communicate this intent by using the more restricted API.

That said, David's point is that the erase functions only moves elements around, it doesn't actually erase them (contrary to what the name suggests). Because of that, the caller of this function can't possibly have a useful vector after this function returns.

Copy link
Contributor Author

@cmtice cmtice Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went back and double checked the callers; they do NOT expect a usable vector after this call, so that is fine. However I'm getting a compile time error using the MutableArrayRef, which I could use to help figuring out how to fix:

third_party/llvm/llvm-project/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp:43:50: error: non-const lvalue reference to type 'MutableArrayRef<uint32_t>' (aka 'MutableArrayRef') cannot bind to a value of unrelated type 'SmallVector<uint32_t, 0>' (aka 'SmallVector<unsigned int, 0>')
43 | llvm::dwarf::computeDebugNamesUniqueHashes(Uniques, UniqueHashCount);
| ^~~~~~~
third_party/llvm/llvm-project/llvm/include/llvm/BinaryFormat/Dwarf.h:618:74: note: passing argument to parameter 'hashes' here
618 | inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> &hashes,
| ^
1 error generated.

('Uniques', at the call site, line 43, is a SmallVector...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

618 | inline uint32_t computeDebugNamesUniqueHashes(
    MutableArrayRef<uint32_t> &hashes,

Did you make it a reference? It should not be a reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this function calculates and returns two different values: the bucket count (for which the function is named) and the unique hash count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, we should have just returned a pair

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be adjusted post-merge? With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.

FWIW I 100% agree with you, but every now and then this runs into the LLVM's community general hesitancy to use auto more aggressively.

I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?

Can't think of any example this early, but you could probably do something like:

inline std::pair<uint32_t, uint32_t> getDebugNamesBucketAndHashCount((MutableArrayRef<uint32_t> hashes) {
   ...
   return {BucketCount, HashCount};
}

And then on the call site you'd normally do something like:

auto [BuckerCount, HashCount] = getDebugNamesBucketAndHashCount(...)

However, since those are actually member variables, you can't really use a declaration like that. You'll need to store the pair in a temporary variable and assign to the member variables using .first and .second



@@ -613,6 +613,24 @@ enum AcceleratorTable {
DW_hash_function_djb = 0u
};

// Uniquify the string hashes and calculate the bucket count for the
// DWARF v5 Accelerator Table.
inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this maybe should be getDebugNamesBucketCount? (because it doesn't return the unique number of hashes, it returns the number of buckets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it does return the unique number of hashes, in the second parameter, but I'm Ok with renaming the function...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yeah, naming it after what it returns seems the most legible.

If the intent is to use the uniqued hashes array after the call (not relevant to the current caller, but possibly relevant to the new usage you want to add) maybe the simplest thing is to have the function take a SmallVectorImpl<uint32_t>&, and after calling uique, use that result to shorten the vector down to the size of the uniqued range (by calling the vector's resize function) - then the caller can inspect the vector's size after calling, and doesn't need the secondary out parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the function as suggested, and added a note in the function comment that it consumes the array parameter.

Change function name from 'computeDebugNamesUniqueHashes' to
'getDebugNamesBucketCount', and add coment that this function
consumes its input parameter.
@cmtice
Copy link
Contributor Author

cmtice commented Feb 20, 2024

I think I've addressed all the review comments now? Please let me know if there's anything else I should do for this pull request.

@cmtice cmtice merged commit 453b1a2 into llvm:main Feb 21, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants