Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][ADT] Convert llvm::hash_code to unsigned explicitly in DenseMapInfo #77743

Merged
merged 1 commit into from
Jan 18, 2024

Conversation

andrey-golubev
Copy link
Contributor

The getHashValue() signature returns a value of type 'unsigned' while the hash_code could only be implicitly converted to 'size_t'. Depending on the C++ implementation, this may or may not be a narrowing conversion.

On some platform/compiler combination, this becomes a warning. To avoid the warning (and better highlight the narrowing), do an explicit conversion instead.

…pInfo

The getHashValue() signature returns a value of type 'unsigned' while
the hash_code could only be implicitly converted to 'size_t'. Depending
on the C++ implementation, this may or may not be a narrowing conversion.

On some platform/compiler combination, this becomes a warning. To
avoid the warning (and better highlight the narrowing), do an explicit
conversion instead.

Co-authored-by: Orest Chura <orest.chura@intel.com>
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 11, 2024

@llvm/pr-subscribers-llvm-adt

Author: Andrei Golubev (andrey-golubev)

Changes

The getHashValue() signature returns a value of type 'unsigned' while the hash_code could only be implicitly converted to 'size_t'. Depending on the C++ implementation, this may or may not be a narrowing conversion.

On some platform/compiler combination, this becomes a warning. To avoid the warning (and better highlight the narrowing), do an explicit conversion instead.


Full diff: https://github.com/llvm/llvm-project/pull/77743.diff

1 Files Affected:

  • (modified) llvm/include/llvm/ADT/Hashing.h (+3-1)
diff --git a/llvm/include/llvm/ADT/Hashing.h b/llvm/include/llvm/ADT/Hashing.h
index 4e82ba9a6fba2a..a5477362a50793 100644
--- a/llvm/include/llvm/ADT/Hashing.h
+++ b/llvm/include/llvm/ADT/Hashing.h
@@ -677,7 +677,9 @@ template <typename T> hash_code hash_value(const std::optional<T> &arg) {
 template <> struct DenseMapInfo<hash_code, void> {
   static inline hash_code getEmptyKey() { return hash_code(-1); }
   static inline hash_code getTombstoneKey() { return hash_code(-2); }
-  static unsigned getHashValue(hash_code val) { return val; }
+  static unsigned getHashValue(hash_code val) {
+    return static_cast<unsigned>(size_t(val));
+  }
   static bool isEqual(hash_code LHS, hash_code RHS) { return LHS == RHS; }
 };
 

@andrey-golubev
Copy link
Contributor Author

@lattner @dwblaikie @joker-eph @River707 @silvasean judging from your "recent" activity in the area, you might be good reviewers here. Please feel free to add any other person who might be more suitable.

@joker-eph
Copy link
Collaborator

Shouldn't this return size_t instead maybe? Why is the implicit narrowing OK here?

@andrey-golubev
Copy link
Contributor Author

andrey-golubev commented Jan 11, 2024

Shouldn't this return size_t instead maybe? Why is the implicit narrowing OK here?

I agree but I think the unsigned return type is by design as the "main" template specialization expects the unsigned value (unsigned long long specialization seems like a noteworthy candidate as well).

between changing the type (~signature) everywhere and doing the explicit thing for the case I picked the latter. Guessing overall this is a bigger topic for discussion on whether the hashing should really return (potentially) 32-bit values when e.g. 64-bit values are used as input (and current API also contradicts std::hash which, I believe, is unfortunate).

P.S.: this problem only appeared in our project due to this specialization participating in some MLIR code, would've gone completely unnoticed otherwise.

@dwblaikie
Copy link
Collaborator

Yeah, if getHashValue returns size_t, we'd have the same narrowing issue inside DenseMap itself.

But I think the right solution would be to do the same thing as is done for unsigned long long - maybe even just have the size_t specialization delegate to the unsigned long long implementation (maybe only if sizeo(size_t) > sizeof(unsigned)?)?

@andrey-golubev
Copy link
Contributor Author

So from what I understand, there's:
a) a bunch of functionality around hash_code. we're mainly interested in hash_value that seems to produce actual 64-bit values (e.g. it actually utilizes both low and high 32 bits in integer hashing) - does "proper" hashing to begin with.
b) DenseMapInfo<> that seems to deal with just keys into DenseMap. For simpler cases (e.g. unsigned long long and friends) it, in a nutshell, performs a 1-to-1 mapping by multiplying the given value by some constant (this might potentially overflow but we don't care for unsigned) so that we get nice hashing that is also cheap to compute? (But then we trim half of the bytes before returning the hash?)
c) Then, for DenseMapInfo<hash_code> the current behavior just takes "first" sizeof(unsigned) bytes.

Now,

But I think the right solution would be to do the same thing as is done for unsigned long long - maybe even just have the size_t specialization delegate to the unsigned long long implementation (maybe only if sizeo(size_t) > sizeof(unsigned)?)?

this would essentially mean smth like:

  // constexpr auto Constant = 37ULL;
  static unsigned getHashValue(hash_code val) { return (unsigned)(size_t(val) * Constant); }

my gut feeling is that we don't really need constant multiplication: given that the hash_code was computed using some proper hashing mechanism, multiplying by a constant does produce equivalent value distribution? (it's the original but just "shifted" by that multiplication?). BUT: this is valid if we don't "trim" the bytes afterwards, which we do by taking only unsigned part.

So the bulk of the discussion is how * 37 affects the end result and whether we care.

@dwblaikie
Copy link
Collaborator

So from what I understand, there's: a) a bunch of functionality around hash_code. we're mainly interested in hash_value that seems to produce actual 64-bit values (e.g. it actually utilizes both low and high 32 bits in integer hashing) - does "proper" hashing to begin with. b) DenseMapInfo<> that seems to deal with just keys into DenseMap. For simpler cases (e.g. unsigned long long and friends) it, in a nutshell, performs a 1-to-1 mapping by multiplying the given value by some constant (this might potentially overflow but we don't care for unsigned) so that we get nice hashing that is also cheap to compute? (But then we trim half of the bytes before returning the hash?) c) Then, for DenseMapInfo<hash_code> the current behavior just takes "first" sizeof(unsigned) bytes.

Now,

But I think the right solution would be to do the same thing as is done for unsigned long long - maybe even just have the size_t specialization delegate to the unsigned long long implementation (maybe only if sizeo(size_t) > sizeof(unsigned)?)?

this would essentially mean smth like:

  // constexpr auto Constant = 37ULL;
  static unsigned getHashValue(hash_code val) { return (unsigned)(size_t(val) * Constant); }

my gut feeling is that we don't really need constant multiplication: given that the hash_code was computed using some proper hashing mechanism, multiplying by a constant does produce equivalent value distribution? (it's the original but just "shifted" by that multiplication?). BUT: this is valid if we don't "trim" the bytes afterwards, which we do by taking only unsigned part.

So the bulk of the discussion is how * 37 affects the end result and whether we care.

Fair enough.

@andrey-golubev
Copy link
Contributor Author

@dwblaikie do you mind clicking the merge as well? (I don't have the rights at the moment!)

@dwblaikie dwblaikie merged commit 15c1c85 into llvm:main Jan 18, 2024
5 checks passed
@andrey-golubev andrey-golubev deleted the dense_map_info_hash_conversion branch January 18, 2024 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants