-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM][DWARF] Refactor code for generating DWARF V5 .debug_names #82394
Conversation
…lerator table. Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier.
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-debuginfo Author: None (cmtice) Changes…lerator table. Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier. Full diff: https://github.com/llvm/llvm-project/pull/82394.diff 2 Files Affected:
diff --git a/llvm/include/llvm/BinaryFormat/Dwarf.h b/llvm/include/llvm/BinaryFormat/Dwarf.h
index 869352b35e3235..c926b34f9d9048 100644
--- a/llvm/include/llvm/BinaryFormat/Dwarf.h
+++ b/llvm/include/llvm/BinaryFormat/Dwarf.h
@@ -613,6 +613,25 @@ enum AcceleratorTable {
DW_hash_function_djb = 0u
};
+// Uniquify the string hashes and calculate the bucket count for the
+// DWARF v5 Accelerator Table.
+inline uint32_t
+computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes,
+ uint32_t &uniqueHashCount) {
+ uint32_t BucketCount = 0;
+
+ sort(hashes);
+ uniqueHashCount = llvm::unique(hashes) - hashes.begin();
+ if (uniqueHashCount > 1024)
+ BucketCount = uniqueHashCount / 4;
+ else if (uniqueHashCount > 16)
+ BucketCount = uniqueHashCount / 2;
+ else
+ BucketCount = std::max<uint32_t>(uniqueHashCount, 1);
+
+ return BucketCount;
+}
+
// Constants for the GNU pubnames/pubtypes extensions supporting gdb index.
enum GDBIndexEntryKind {
GIEK_NONE,
diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 22d995a9cc3c56..f8528b489c3c0b 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -33,22 +33,13 @@ using namespace llvm;
void AccelTableBase::computeBucketCount() {
// First get the number of unique hashes.
- std::vector<uint32_t> Uniques;
+ SmallVector<uint32_t, 0> Uniques;
Uniques.reserve(Entries.size());
for (const auto &E : Entries)
Uniques.push_back(E.second.HashValue);
- array_pod_sort(Uniques.begin(), Uniques.end());
- std::vector<uint32_t>::iterator P =
- std::unique(Uniques.begin(), Uniques.end());
- UniqueHashCount = std::distance(Uniques.begin(), P);
-
- if (UniqueHashCount > 1024)
- BucketCount = UniqueHashCount / 4;
- else if (UniqueHashCount > 16)
- BucketCount = UniqueHashCount / 2;
- else
- BucketCount = std::max<uint32_t>(UniqueHashCount, 1);
+ BucketCount = llvm::dwarf::computeDebugNamesUniqueHashes(Uniques,
+ UniqueHashCount);
}
void AccelTableBase::finalize(AsmPrinter *Asm, StringRef Prefix) {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Could not fix git push/pull/rebase issues with #82264 so had to create this new pull request. I've tried to address all the review comments from the previous PR: Update computeDebugNamesUniqueHashes function to start with lowercase letter, to use a MutableArrayRef for the hashes parameter instead of SmallVector, and made uniqueHashCount a reference parameter, to allow the accelerator table class member to be properly updated. Also removed redundant namespace qualifier. |
Fix clang format issues.
uint32_t BucketCount = 0; | ||
|
||
sort(hashes); | ||
uniqueHashCount = llvm::unique(hashes) - hashes.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without returning the iterator from llvm::unique
or otherwise communicating info about hashes
out to the caller - I guess the caller isn't expected to use their copy of the data structure again? Might be worth detailing in the doc comment that this function effectively consumes the hashes
array? Or is it somehow usable afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand your comment? Yes, I want the changes to hashes to make it back to the caller.
I thought making it a MutableArray meant that the changes would get back to the user? Originally I had it as a SmallVector &, but felipepivezan asked me to change it to a MutableArray instead. LLVM would not allow me to declare the MutableArray as a reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM would not allow me to declare the MutableArray as a reference
MutableArrays are meant to be passed by copy, they are just a "non const ptr" + "size" abstraction.
I thought making it a MutableArray meant that the changes would get back to the user?
The changes make it back to the user, but this is not the important distinction between MutableArrayRef and SmallVector. The SmallVector API exposes resizing capabilities, which we are not using inside this function. So you communicate this intent by using the more restricted API.
That said, David's point is that the erase
functions only moves elements around, it doesn't actually erase them (contrary to what the name suggests). Because of that, the caller of this function can't possibly have a useful vector after this function returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I went back and double checked the callers; they do NOT expect a usable vector after this call, so that is fine. However I'm getting a compile time error using the MutableArrayRef, which I could use to help figuring out how to fix:
third_party/llvm/llvm-project/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp:43:50: error: non-const lvalue reference to type 'MutableArrayRef<uint32_t>' (aka 'MutableArrayRef') cannot bind to a value of unrelated type 'SmallVector<uint32_t, 0>' (aka 'SmallVector<unsigned int, 0>')
43 | llvm::dwarf::computeDebugNamesUniqueHashes(Uniques, UniqueHashCount);
| ^~~~~~~
third_party/llvm/llvm-project/llvm/include/llvm/BinaryFormat/Dwarf.h:618:74: note: passing argument to parameter 'hashes' here
618 | inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> &hashes,
| ^
1 error generated.
('Uniques', at the call site, line 43, is a SmallVector...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
618 | inline uint32_t computeDebugNamesUniqueHashes(
MutableArrayRef<uint32_t> &hashes,
Did you make it a reference? It should not be a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this function calculates and returns two different values: the bucket count (for which the function is named) and the unique hash count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hindsight, we should have just returned a pair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be adjusted post-merge? With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With C++17 structured bindings, returning multiple values via pairs/tuples should really be the norm, rather than using reference parameters as outputs, in my opinion.
FWIW I 100% agree with you, but every now and then this runs into the LLVM's community general hesitancy to use auto
more aggressively.
I'll try to create another PR (probably in a day or two) to fix this. Do you know of any examples I could look at to see how this is done well?
Can't think of any example this early, but you could probably do something like:
inline std::pair<uint32_t, uint32_t> getDebugNamesBucketAndHashCount((MutableArrayRef<uint32_t> hashes) {
...
return {BucketCount, HashCount};
}
And then on the call site you'd normally do something like:
auto [BuckerCount, HashCount] = getDebugNamesBucketAndHashCount(...)
However, since those are actually member variables, you can't really use a declaration like that. You'll need to store the pair in a temporary variable and assign to the member variables using .first
and .second
@@ -613,6 +613,24 @@ enum AcceleratorTable { | |||
DW_hash_function_djb = 0u | |||
}; | |||
|
|||
// Uniquify the string hashes and calculate the bucket count for the | |||
// DWARF v5 Accelerator Table. | |||
inline uint32_t computeDebugNamesUniqueHashes(MutableArrayRef<uint32_t> hashes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this maybe should be getDebugNamesBucketCount
? (because it doesn't return the unique number of hashes, it returns the number of buckets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it does return the unique number of hashes, in the second parameter, but I'm Ok with renaming the function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Yeah, naming it after what it returns seems the most legible.
If the intent is to use the uniqued hashes array after the call (not relevant to the current caller, but possibly relevant to the new usage you want to add) maybe the simplest thing is to have the function take a SmallVectorImpl<uint32_t>&
, and after calling uique
, use that result to shorten the vector down to the size of the uniqued range (by calling the vector's resize
function) - then the caller can inspect the vector's size after calling, and doesn't need the secondary out parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed the function as suggested, and added a note in the function comment that it consumes the array parameter.
Change function name from 'computeDebugNamesUniqueHashes' to 'getDebugNamesBucketCount', and add coment that this function consumes its input parameter.
I think I've addressed all the review comments now? Please let me know if there's anything else I should do for this pull request. |
Clang format fixes.
[LLVM][DWARF] Refactor code for generating DWARF v5 .debug_names
Refactor the code that uniques the entries and computes the bucket count for the DWARF V5 .debug_names accelerator table.