Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADT] Update hash function of uint64_t for DenseMap #95734

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

ChuanqiXu9
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 commented Jun 17, 2024

(Background: See the comment:
#92083 (comment))

It looks like the hash function for 64bits integers are not very good:

  static unsigned getHashValue(const unsigned long long& Val) {
    return (unsigned)(Val * 37ULL);
  }

Since the result is truncated to 32 bits. It looks like the higher 32 bits won't contribute to the result. So that 0x1'00000001 will have the the same results to 0x2'00000001, 0x3'00000001, ...

Then we may meet a lot collisions in such cases. I feel it should generally good to include higher 32 bits for hashing functions.

Not sure who's the appropriate reviewer, adding some people by impressions.

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-llvm-adt

Author: Chuanqi Xu (ChuanqiXu9)

Changes

(Background: See the comment:
#92083 (comment))

It looks like the hash function for 64bits integers are not very good:

  static unsigned getHashValue(const unsigned long long& Val) {
    return (unsigned)(Val * 37ULL);
  }

Since the result is truncated to 32 bits. It looks like the higher 32 bits won't contribute to the result. So that 0x1'00000001 will have the the same results to 0x2'00000001, 0x3'00000001, ...

Then we may meet a lot collisions in such cases. I feel it should generally good to include higher 32 bits for hashing functions.


Full diff: https://github.com/llvm/llvm-project/pull/95734.diff

1 Files Affected:

  • (modified) llvm/include/llvm/ADT/DenseMapInfo.h (+1-1)
diff --git a/llvm/include/llvm/ADT/DenseMapInfo.h b/llvm/include/llvm/ADT/DenseMapInfo.h
index 5b7dce7b53c62..61869d8e7fbb0 100644
--- a/llvm/include/llvm/ADT/DenseMapInfo.h
+++ b/llvm/include/llvm/ADT/DenseMapInfo.h
@@ -151,7 +151,7 @@ template<> struct DenseMapInfo<unsigned long long> {
   static inline unsigned long long getTombstoneKey() { return ~0ULL - 1ULL; }
 
   static unsigned getHashValue(const unsigned long long& Val) {
-    return (unsigned)(Val * 37ULL);
+    return DenseMapInfo<unsigned>(Val) ^ DenseMapInfo<unsigned>(Val >> 32);
   }
 
   static bool isEqual(const unsigned long long& LHS,

@@ -151,7 +151,7 @@ template<> struct DenseMapInfo<unsigned long long> {
static inline unsigned long long getTombstoneKey() { return ~0ULL - 1ULL; }

static unsigned getHashValue(const unsigned long long& Val) {
return (unsigned)(Val * 37ULL);
return DenseMapInfo<unsigned>(Val) ^ DenseMapInfo<unsigned>(Val >> 32);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use some code from ADT/Hashing.h? It seems to contain some well-mixing routines. Seems like something like llvm::hash_value(Val) would be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good too. But I am not a hashing expert so I don't have an opinion here. Let's see what others propose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with the suggestion to use llvm::hash_value from ADT/Hashing.h. It implements a murmur-like algorithm, which mixes bits enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think that DenseMaps hash functions could use some overhaul, I don't think that switching to Hashing.h in this one place would be appropriate. If we want to do that, we should do so for all the DenseMapInfo hashes, after properly analyzing the cost of the more expensive hash vs the benefit of better hash distribution.

We may also have to refactor Hashing.h to reduce the build-time overhead for places that like this that don't need the full infrastructure and its build overhead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good too. But I am not a hashing expert so I don't have an opinion here. Let's see what others propose.

Here is the easy thing to reason about this just in case. ^ mixes very poor: single bit change of input changes only single bit of output. Ideally when we mix / combine hash values (or do similar things) should produce values that are not dependent on the original hash values being combined. This greatly reduce the possibility of collisions from "related" inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with using detail::combineHashValue but not use other functions from ADT/Hashing.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should switch all DenseMapInfo for all integer types to llvm::hash_value?
@chandlerc suggested to use it in his response and he does not expect this to cause any problems.
The code would be simple and we will get to reuse it in more places.

I am not sure how much worse the build times will get, but this should be easy to address by having declarations with functions hashing integers in a separate header.
I feel it should not be a blocker to using the hash function that we feel is better otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version should also relieve the immediate pain, but I wonder if other people also feel that using murmur-like hashing is a better long-term option anyway.

@@ -151,7 +151,8 @@ template<> struct DenseMapInfo<unsigned long long> {
static inline unsigned long long getTombstoneKey() { return ~0ULL - 1ULL; }

static unsigned getHashValue(const unsigned long long& Val) {
return (unsigned)(Val * 37ULL);
return DenseMapInfo<unsigned>::getHashValue(Val) ^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the easy way is detail::combineHashValue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks conceptually fine, but I think you need to do the same thing for the unsigned long type for the case where it is 64-bits large. On 64-bit Linux targets uint64_t will typically be a typedef for unsigned long, not unsigned long long.

@ilya-biryukov
Copy link
Contributor

Are we worried about implications of changing the hash function for something as widely used as 64 bit ints?
I can see a few potential problems:

  • code sensitive to performance of hash function might become slower,
  • code relying on existing order in hash tables breaks when the order changes (there shouldn't be code like this, but Hyrum's law).
  • the hash function we use may have collisions on some important cases (using hash_combine as suggested by others will alleviate this concern, though).

I wonder if this change should be postponed until after Clang 19? Or am I too pessimistic in my predictions and things will just work out?

Also a few open questions:

  • The C++ standard returns 64 bit hash values (on 64 bit platforms), maybe we should do the same in LLVM? I suspect the code of our hash tables would work just fine with it (although it might add some overhead if we store precomputed hashes somewhere).
  • Are there better hash functions we could use for ints/in general? E.g. I've heard about murmurhash, but I'm not sure if we can or should use it LLVM.

It would be nice to get someone expert in state-of-the-art hash functions and hash tables to review this.

@AaronBallman
Copy link
Collaborator

It would be nice to get someone expert in state-of-the-art hash functions and hash tables to review this.

@chandlerc -- any chance you could weigh in here?

@nikic
Copy link
Contributor

nikic commented Jun 17, 2024

Are we worried about implications of changing the hash function for something as widely used as 64 bit ints?

I don't think there's particular cause to worry.

I can see a few potential problems:

* code sensitive to performance of hash function might become slower,

We would of course confirm this first -- the patch as-is does not have any impact on compile-time, but I think this is mostly because it modifies the wrong overload. I can re-test this after the patch is updated.

* code relying on existing order in hash tables breaks when the order changes (there shouldn't be code like this, but [Hyrum's law](https://www.hyrumslaw.com/)).

There may be some test fallout to deal with, but I think we should be in a fairly good position thanks to reverse-iteration testing.

I wonder if this change should be postponed until after Clang 19? Or am I too pessimistic in my predictions and things will just work out?

Also a few open questions:

* The C++ standard returns 64 bit hash values (on 64 bit platforms), maybe we should do the same in LLVM? I suspect the code of our hash tables would work just fine with it (although it might add some overhead if we store precomputed hashes somewhere).

We currently only use the low bits of the hash, so extending it to 64 bits is not useful at present.

* Are there better hash functions we could use for ints/in general? E.g. I've heard about murmurhash, but I'm not sure if we can or should use it LLVM.

It would be nice to get someone expert in state-of-the-art hash functions and hash tables to review this.

LLVM is definitely behind the state of the art -- if someone finds the time to port DenseMap and SmallPtrSet to something like swiss tables, that would be great. (And not just for performance, we could also get rid of the need to specify explicit tombstone and empty keys for types.)

@ilya-biryukov
Copy link
Contributor

* code relying on existing order in hash tables breaks when the order changes (there shouldn't be code like this, but [Hyrum's law](https://www.hyrumslaw.com/)).

There may be some test fallout to deal with, but I think we should be in a fairly good position thanks to reverse-iteration testing.

Thanks for the pointers, I didn't know we had this.
Looking at the implementation, we only seem to do it for pointers. I expect maps with integer keys to be very frequent too and they don't seem to be covered by existing machinery for reverse iteration. Especially if we go with replacing hash functions for all integer types/all types rather than just 64bit ints, as you suggested in the comment thread.

Another potential problem is downstream uses that aren't covered by the upstream tests we have. I am sure it's manageable, but we would want to announce the change a little in advance so that people know it's coming.

* The C++ standard returns 64 bit hash values (on 64 bit platforms), maybe we should do the same in LLVM? I suspect the code of our hash tables would work just fine with it (although it might add some overhead if we store precomputed hashes somewhere).

We currently only use the low bits of the hash, so extending it to 64 bits is not useful at present.

Sorry for the confusing wording on my part, I referred to not just extending the returned hash function to 64 bit, but also making sure we use all 64bits in our hash table implementation.

LLVM is definitely behind the state of the art -- if someone finds the time to port DenseMap and SmallPtrSet to something like swiss tables, that would be great. (And not just for performance, we could also get rid of the need to specify explicit tombstone and empty keys for types.)

+1, I know that @chandlerc had some new ideas for hash tables in the works too, but I'm not sure how fleshed out they are and whether he has time to land them / describe them in enough detail that someone else can pick them up.

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense on its own and we can improve the hash function without going all in on a better dense map implementation.

However, it would be good to have data on how this actually affects performance on real-world programs. For example, you could manually instrument dense map-ish containers and count the number of key operations (resize here?) when bootstrapping clang or compiling some other larger inputs.

@alexfh
Copy link
Contributor

alexfh commented Jun 17, 2024

However, it would be good to have data on how this actually affects performance on real-world programs.

The patch was prompted by a recent ~5x regression in compilation speed, the corresponding profiles are quoted here: #92083 (comment)

@ilya-biryukov found that changing the hash function effectively resolves the regression (which I would expect to more or less remove the DenseMap methods from the hot code). But if necessary, I can profile clang on the same compilation with and without this PR.

@kuhar
Copy link
Member

kuhar commented Jun 17, 2024

@ilya-biryukov found that changing the hash function effectively resolves the regression (which I would expect to more or less remove the DenseMap methods from the hot code). But if necessary, I can profile clang on the same compilation with and without this PR.

I'd like to confirm not only this improves the overall performance, but that the speedup can be attributed to 'better' hashing instead of, say, the elements being (coincidentally) in a better order. I don't know the low-level details, but wouldn't this reduce the number of collisions and number of resizes?

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jun 18, 2024

What's updated:

  • Adopt @nikic 's opinion to not use interfaces from ADT/Hashing.h.
  • Adopt @nikic 's opinion to add the overload for unsigned long.
  • Adopt @MaskRay 's opinion to use detail::combineHashValue.

For performances, we need benchmarking. I tested local simple workloads but didn't any observable change. I feel this should be conceptually fine too since the previous implementation is clearly not good for 64 bits integer.

@chandlerc
Copy link
Member

It would be nice to get someone expert in state-of-the-art hash functions and hash tables to review this.

@chandlerc -- any chance you could weigh in here?

Happy to, I've been studying this freshly for the past 6 months.

Generally, none of the old multiplication, or the shift are going to work well. But they may get lucky with the current inputs and appear to work well.

Not sure why the concern over ADT/Hashing.h -- that code has held up quite well and remains a reasonably strong balance of strong hashing at reasonable cost.

In particular, for a 64-bit integer, I wouldn't expect it to be much slower than the combineHashValue being called in the current iteration. It might actually be faster.

I have recently developed a hashing function that is only slightly worse than ADT/Hashing.h and is very, very competitive (I suspect faster, but the proof will take time to tell) to the very best. It is open source in Carbon and under the LLVM license:
https://github.com/carbon-language/carbon-lang/blob/trunk/common/hashing.h

This routine is dramatically faster than LLVM's Hashing.h, and equal or faster to essentially everything else I've been able to evaluate for small objects (integers, pointers, tuples of those). For long strings there are a few faster approaches using specialized hardware (AES building blocks), but not enough to matter for a compiler I strongly suspect.

The lower quality hashing should largely be fine as long as the hash table load factor is low enough. I've done a good amount of DenseMap benchmarking with that hash function, you can see code that benchmarks it directly here:
https://github.com/carbon-language/carbon-lang/blob/trunk/common/map_benchmark.cpp#L184

Carbon also just got a new hashtable implementation that tries to be as close to DenseMap as I could possibly make it for small tables and very sparse tables (low load factor), while operating with a very high load factor (7/8) and with superb performance on large tables. It's based on SwissTable, and comparable or better performance. If there are hashtables that are struggling with DenseMap's design, that would be what I would suggest. But it is very, very hard to compete with DenseMap -- the performance is fantastic and almost unbeatable for very small tables and low load factors.

@ChuanqiXu9
Copy link
Member Author

It looks like there are some discussion about how to implement the hash for DenseMap best in practice. This is great and valuable. But I feel I don't have the capability to handle that. So how about opening this in a separate RFC and in this page, let's try to focus on if the patch itself is good to go?

@nikic
Copy link
Contributor

nikic commented Jun 18, 2024

Compile-time for the PR as proposed: http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=9cd0b838265006ff699153bfbb1a1a39ebfb9cdd&stat=instructions:u

Compile-time using xor mixing (that is, 820edb5 applied on top): http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=820edb52ce1abc090f41c5210dcb04edb6203f36&stat=instructions%3Au

It doesn't seem like we get any benefit out of the better mixing using combineHashValue() for average-case compilation, so I think you should stick to your first variant using xor.

@chandlerc
Copy link
Member

Compile-time for the PR as proposed: http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=9cd0b838265006ff699153bfbb1a1a39ebfb9cdd&stat=instructions:u

Compile-time using xor mixing (that is, 820edb5 applied on top): http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=820edb52ce1abc090f41c5210dcb04edb6203f36&stat=instructions%3Au

It doesn't seem like we get any benefit out of the better mixing using combineHashValue() for average-case compilation, so I think you should stick to your first variant using xor.

Not sure the % changes in the stage2 builds are large enough to be a significant worry...

But more importantly, while the XOR mixing looks good with a small test (compiling Clang itself), that doesn't make it robust when used with much broader inputs. FWIW, when working on hashtables, it is easy to have benchmarks and test cases that show simple solutions are faster that then struggle when the wrong input shows up, and I suspect that's how the original use of multiply ran into issues.

@MaskRay
Copy link
Member

MaskRay commented Jun 18, 2024

Compile-time for the PR as proposed: http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=9cd0b838265006ff699153bfbb1a1a39ebfb9cdd&stat=instructions:u

Compile-time using xor mixing (that is, 820edb5 applied on top): http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=820edb52ce1abc090f41c5210dcb04edb6203f36&stat=instructions%3Au

It doesn't seem like we get any benefit out of the better mixing using combineHashValue() for average-case compilation, so I think you should stick to your first variant using xor.

People use metaheuristics to find good bit mixer functions that achieve good ratings on some metrics (e.g. PractRand)
https://jonkagstrom.com/bit-mixer-construction/

combineHashValue is a custom bit mixer from 2008 (5fc8ab6) that is probably not good. I propose to change it in #95970

This patch can still use combineHashValue , but probably change combineHashValue((uint32_t)x, x>>32) to combineHashValue(x>>32, (uint32_t)x) to avoid a ROR operation.


I agree that we should remove the empty and tombstone keys from DenseMap.

~2 years ago I tried different hash map implementations to improve the performance of the global symbol table in lld/ELF. I think I have measured negligible performance difference.

@chandlerc
Copy link
Member

Compile-time for the PR as proposed: http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=9cd0b838265006ff699153bfbb1a1a39ebfb9cdd&stat=instructions:u
Compile-time using xor mixing (that is, 820edb5 applied on top): http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=820edb52ce1abc090f41c5210dcb04edb6203f36&stat=instructions%3Au
It doesn't seem like we get any benefit out of the better mixing using combineHashValue() for average-case compilation, so I think you should stick to your first variant using xor.

People use metaheuristics to find good bit mixer functions that achieve good ratings on some metrics (e.g. PractRand) jonkagstrom.com/bit-mixer-construction

combineHashValue is a custom bit mixer from 2008 (5fc8ab6) that is probably not good. I propose to change it in #95970

This patch can still use combineHashValue , but probably change combineHashValue((uint32_t)x, x>>32) to combineHashValue(x>>32, (uint32_t)x) to avoid a ROR operation.

If you want to improve the hash functions used in LLVM, that seems like an interesting project. Again, I posted up thread a link to a very good hash function I have been developing based on experience with Abseil's, and it works very well with DenseMap as well as other hashtables.

But I don't understand why the goal isn't to use the LLVM hashing library that is currently in the codebase, and if desired, add improvements to it. That seems better than slowly re-creating another hashing library here. =[ The original intent of the ADT/Hashing.h code was to provide a good replacement for DenseMap and other usages. If it needs to be improved to do that, by all means.

@ChuanqiXu9
Copy link
Member Author

This patch can still use combineHashValue , but probably change combineHashValue((uint32_t)x, x>>32) to combineHashValue(x>>32, (uint32_t)x) to avoid a ROR operation.

Done

@MaskRay
Copy link
Member

MaskRay commented Jun 19, 2024

Thanks for these pointers!

DenseMap extracts low bits from a 32-bit getHashValue.
(It would probably be nice to switch to a 64-bit hash, perhaps with a new member function.)
This limits the effectiveness of pure multiplicative hashing. We need one xorshift step, which is done in #95970.

A lot of work can be done to both Hashing.h and DenseMap.
For example, we could still do a better job at discouraging reliance on the iteration order of DenseMap.
While LLVM_ENABLE_REVERSE_ITERATION helps, I had to fix 3 uses cases in llvm/ and clang/ to change getHashValue for std::pair.

Incorporating Hashing.h into DenseMap and switching to hash_value(42) or hash_combine(42, 43) would mix bits in a better way, but increase the code size and cause some slowdown without clear benefits.


I have read some code of carbon-lang/common/hashing.h and absl/hash for integer types and std::pair.
For integer types <= 8 bytes,

  • llvm/include/llvm/ADT/Hashing.h uses a mxmxm variant hash_16_bytes (Murmur-inspired) that has larger latency and probably better avalanche behavior (though likely unnecessarily "strong").
  • absl uses a multiply-xorshift Mix and uint128 on 64-bit pointer machines.
  • Carbon uses a multiply-bswap WeakMix using unsigned _BitInt(128).

Waiting for Hashing.h and DenseMap improvement would take too long.
To address the immediate needs, **this patch might leverage densemap::detail::mix for DenseMapInfo unsigned long and unsigned long long specializations. @ChuanqiXu9


When we are ready to switch more stuff to Carbon style hashing, we can probably use the following multiplication fallback for non-GCC-non-Clang compilers.

std::pair<uint64_t, uint64_t> mul64(uint64_t a, uint64_t b) {
  uint64_t a0 = a & 0xffffffff, a1 = a >> 32;
  uint64_t b0 = b & 0xffffffff, b1 = b >> 32;
  uint64_t t = a0 * b0;
  uint64_t u = t & 0xffffffff;
  t = a1 * b0 + (t >> 32);
  uint64_t v = t >> 32;
  t = (a0 * b1) + (t & 0xffffffff);
  return {(t << 32) + u, a1 * b1 + v + (t >> 32)};
}

@@ -137,7 +137,10 @@ template<> struct DenseMapInfo<unsigned long> {
static inline unsigned long getTombstoneKey() { return ~0UL - 1L; }

static unsigned getHashValue(const unsigned long& Val) {
return (unsigned)(Val * 37UL);
if constexpr (sizeof(Val) == 4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have densemap::detail::mix with very low latency, we can use it unconditionally for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to keep the conditional so you get the same hashing behavior if you use a different spelling for what is essentially the same type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd like to keep the condition too.

@ChuanqiXu9
Copy link
Member Author

Thanks for these pointers!

DenseMap extracts low bits from a 32-bit getHashValue. (It would probably be nice to switch to a 64-bit hash, perhaps with a new member function.) This limits the effectiveness of pure multiplicative hashing. We need one xorshift step, which is done in #95970.

A lot of work can be done to both Hashing.h and DenseMap. For example, we could still do a better job at discouraging reliance on the iteration order of DenseMap. While LLVM_ENABLE_REVERSE_ITERATION helps, I had to fix 3 uses cases in llvm/ and clang/ to change getHashValue for std::pair.

Incorporating Hashing.h into DenseMap and switching to hash_value(42) or hash_combine(42, 43) would mix bits in a better way, but increase the code size and cause some slowdown without clear benefits.

I have read some code of carbon-lang/common/hashing.h and absl/hash for integer types and std::pair. For integer types <= 8 bytes,

  • llvm/include/llvm/ADT/Hashing.h uses a mxmxm variant hash_16_bytes (Murmur-inspired) that has larger latency and probably better avalanche behavior (though likely unnecessarily "strong").
  • absl uses a multiply-xorshift Mix and uint128 on 64-bit pointer machines.
  • Carbon uses a multiply-bswap WeakMix using unsigned _BitInt(128).

Waiting for Hashing.h and DenseMap improvement would take too long. To address the immediate needs, **this patch might leverage densemap::detail::mix for DenseMapInfo unsigned long and unsigned long long specializations. @ChuanqiXu9

When we are ready to switch more stuff to Carbon style hashing, we can probably use the following multiplication fallback for non-GCC-non-Clang compilers.

std::pair<uint64_t, uint64_t> mul64(uint64_t a, uint64_t b) {
  uint64_t a0 = a & 0xffffffff, a1 = a >> 32;
  uint64_t b0 = b & 0xffffffff, b1 = b >> 32;
  uint64_t t = a0 * b0;
  uint64_t u = t & 0xffffffff;
  t = a1 * b0 + (t >> 32);
  uint64_t v = t >> 32;
  t = (a0 * b1) + (t & 0xffffffff);
  return {(t << 32) + u, a1 * b1 + v + (t >> 32)};
}

Thanks for the high quality summary. Looking forward for further improvements!

@ChuanqiXu9 ChuanqiXu9 merged commit ad79a14 into llvm:main Jun 20, 2024
7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 20, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-ppc64le-linux running on ppc64le-sanitizer while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/235

Here is the relevant piece of the build log for the reference:

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
files in cache                    719994
cache size                          17.5 GB
max cache size                      20.0 GB
++ CMAKE_COMMON_OPTIONS+=' -DLLVM_CCACHE_BUILD=ON'
++ ld.lld --version
LLD 16.0.1 (compatible with GNU linkers)
++ CMAKE_COMMON_OPTIONS+=' -DLLVM_USE_LINKER=lld'
++ include_config
++ local P=.
++ true
++ local F=./sanitizer_buildbot_config
++ [[ -f ./sanitizer_buildbot_config ]]
++ [[ . -ef / ]]
++ P=./..
++ true
++ local F=./../sanitizer_buildbot_config
++ [[ -f ./../sanitizer_buildbot_config ]]
++ [[ ./.. -ef / ]]
++ P=./../..
++ true
++ local F=./../../sanitizer_buildbot_config
++ [[ -f ./../../sanitizer_buildbot_config ]]
++ [[ ./../.. -ef / ]]
++ P=./../../..
++ true
++ local F=./../../../sanitizer_buildbot_config
++ [[ -f ./../../../sanitizer_buildbot_config ]]
++ [[ ./../../.. -ef / ]]
++ P=./../../../..
++ true
++ local F=./../../../../sanitizer_buildbot_config
++ [[ -f ./../../../../sanitizer_buildbot_config ]]
++ [[ ./../../../.. -ef / ]]
++ P=./../../../../..
++ true
++ local F=./../../../../../sanitizer_buildbot_config
++ [[ -f ./../../../../../sanitizer_buildbot_config ]]
++ [[ ./../../../../.. -ef / ]]
++ P=./../../../../../..
++ true
++ local F=./../../../../../../sanitizer_buildbot_config
++ [[ -f ./../../../../../../sanitizer_buildbot_config ]]
++ [[ ./../../../../../.. -ef / ]]
++ P=./../../../../../../..
++ true
++ local F=./../../../../../../../sanitizer_buildbot_config
++ [[ -f ./../../../../../../../sanitizer_buildbot_config ]]
++ [[ ./../../../../../../.. -ef / ]]
++ break
++ echo @@@BUILD_STEP Info@@@
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
PASS: ThreadSanitizer-powerpc64le :: vptr_harmful_race3.cpp (49 of 2450)
PASS: Profile-powerpc64le :: Posix/instrprof-shared.test (50 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: Linux/process_vm_readv.cpp (51 of 2450)
PASS: MemorySanitizer-Unit :: ./Msan-powerpc64le-Test/255/281 (52 of 2450)
PASS: ThreadSanitizer-powerpc64le :: exceptions.cpp (53 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: no_sanitize_memory_prop.cpp (54 of 2450)
PASS: ThreadSanitizer-powerpc64le :: mutex_destroy_locked2.cpp (55 of 2450)
PASS: ThreadSanitizer-powerpc64le :: mutexset4.cpp (56 of 2450)
PASS: ThreadSanitizer-powerpc64le :: java_race.cpp (57 of 2450)
PASS: XRay-powerpc64le-linux :: TestCases/Posix/func-id-utils.cpp (58 of 2450)
XFAIL: SanitizerCommon-tsan-powerpc64le-Linux :: Linux/signal_line.cpp (59 of 2450)
PASS: Profile-powerpc64le :: Linux/instrprof-dlopen.test (60 of 2450)
PASS: ThreadSanitizer-powerpc64le :: sleep_sync2.cpp (61 of 2450)
PASS: LeakSanitizer-Standalone-powerpc64le :: TestCases/use_stacks.cpp (62 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: Linux/pthread_join.cpp (63 of 2450)
PASS: ThreadSanitizer-powerpc64le :: cond_race.cpp (64 of 2450)
PASS: ThreadSanitizer-powerpc64le :: longjmp4.cpp (65 of 2450)
PASS: ThreadSanitizer-powerpc64le :: suppressions_race.cpp (66 of 2450)
PASS: LeakSanitizer-Standalone-powerpc64le :: TestCases/register_root_region.cpp (67 of 2450)
PASS: ThreadSanitizer-powerpc64le :: race_on_heap.cpp (68 of 2450)
PASS: SanitizerCommon-lsan-powerpc64le-Linux :: symbolize_pc_inline.cpp (69 of 2450)
PASS: ThreadSanitizer-powerpc64le :: mutexset2.cpp (70 of 2450)
PASS: ThreadSanitizer-powerpc64le :: aligned_vs_unaligned_race.cpp (71 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: dso-origin.cpp (72 of 2450)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: hard_rss_limit_mb_test.cpp (73 of 2450)
PASS: ThreadSanitizer-powerpc64le :: tls_race.cpp (74 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: dtor-multiple-inheritance.cpp (75 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: getaddrinfo-positive.cpp (76 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: Linux/symbolize_stack_fp.cpp (77 of 2450)
PASS: ThreadSanitizer-powerpc64le :: simple_race.cpp (78 of 2450)
PASS: ThreadSanitizer-powerpc64le :: malloc_stack.cpp (79 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: Linux/glob_altdirfunc.cpp (80 of 2450)
PASS: ThreadSanitizer-powerpc64le :: Linux/mutex_robust2.cpp (81 of 2450)
PASS: ThreadSanitizer-powerpc64le :: thread_leak3.c (82 of 2450)
PASS: ThreadSanitizer-powerpc64le :: fd_stdout_race.cpp (83 of 2450)
PASS: ThreadSanitizer-powerpc64le :: java_symbolization_legacy.cpp (84 of 2450)
PASS: LeakSanitizer-Standalone-powerpc64le :: TestCases/Linux/log-path_test.cpp (85 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: iconv.cpp (86 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: Linux/b64.cpp (87 of 2450)
PASS: ScudoStandalone-Unit :: ./ScudoUnitTest-powerpc64le-Test/93/267 (88 of 2450)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Posix/qsort.cpp (89 of 2450)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Posix/getpw_getgr.cpp (90 of 2450)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Linux/aligned_alloc-alignment.cpp (91 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: ifaddrs.cpp (92 of 2450)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Posix/setvbuf.cpp (93 of 2450)
PASS: ThreadSanitizer-powerpc64le :: mmap_stress2.cpp (94 of 2450)
PASS: ThreadSanitizer-powerpc64le :: ignore_lib0.cpp (95 of 2450)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: get_allocated_begin.cpp (96 of 2450)
PASS: MemorySanitizer-POWERPC64LE :: realloc-origin.cpp (97 of 2450)

@ChuanqiXu9
Copy link
Member Author

I feel like the failure may not be related to the one. Since it looks like compiler rt has its own implementation for DenseMap: https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_dense_map.h and hash functions:https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_dense_map_info.h

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 20, 2024

@ChuanqiXu9 I'm seeing warnings on MSVC builds:

E:\llvm\llvm-project\llvm\include\llvm/ADT/DenseMapInfo.h(145): error C2220: the following warning is treated as an error
E:\llvm\llvm-project\llvm\include\llvm/ADT/DenseMapInfo.h(145): warning C4293: '>>': shift count negative or too big, undefined behavior

if constexpr (sizeof(Val) == 4)
return DenseMapInfo<unsigned>::getHashValue(Val);
else
return detail::combineHashValue(Val >> 32, Val);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChuanqiXu9 To silence MSVC warnings, could we change this to:

return detail::combineHashValue(Val >> (4 * sizeof(Val)), Val);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You can land that directly to make the build green. But I don't understand in what cases it is a problem. I think I've already handled the case for its size is 4. I don't feel it is possible that sizeof(Val) may evaluate to 2 here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSVC has a tendency to analyze additional paths such as the non-constexpr else clause - we've hit similar problems before :(

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Jun 20, 2024
…ounds shift warning

Fixes MSVC warning after llvm#95734 - despite it taking the `sizeof(Val) == 4` path, it still warns that the 32-bit unsigned long shift by 32 is out of bounds, so avoid it by converting the hard coded shift amount to be based off sizeof() instead.
RKSimon added a commit that referenced this pull request Jun 20, 2024
…ounds shift warning (#96173)

Fixes MSVC warning after #95734 - despite it taking the `sizeof(Val) == 4` path, it still warns that the 32-bit unsigned long shift by 32 is out of bounds.
@dwblaikie
Copy link
Collaborator

Compile-time for the PR as proposed: http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=9cd0b838265006ff699153bfbb1a1a39ebfb9cdd&stat=instructions:u
Compile-time using xor mixing (that is, 820edb5 applied on top): http://llvm-compile-time-tracker.com/compare.php?from=3ca17443ef4af21bdb1f3b4fbcfff672cbc6176c&to=820edb52ce1abc090f41c5210dcb04edb6203f36&stat=instructions%3Au
It doesn't seem like we get any benefit out of the better mixing using combineHashValue() for average-case compilation, so I think you should stick to your first variant using xor.

People use metaheuristics to find good bit mixer functions that achieve good ratings on some metrics (e.g. PractRand) jonkagstrom.com/bit-mixer-construction
combineHashValue is a custom bit mixer from 2008 (5fc8ab6) that is probably not good. I propose to change it in #95970
This patch can still use combineHashValue , but probably change combineHashValue((uint32_t)x, x>>32) to combineHashValue(x>>32, (uint32_t)x) to avoid a ROR operation.

If you want to improve the hash functions used in LLVM, that seems like an interesting project. Again, I posted up thread a link to a very good hash function I have been developing based on experience with Abseil's, and it works very well with DenseMap as well as other hashtables.

But I don't understand why the goal isn't to use the LLVM hashing library that is currently in the codebase, and if desired, add improvements to it. That seems better than slowly re-creating another hashing library here. =[ The original intent of the ADT/Hashing.h code was to provide a good replacement for DenseMap and other usages. If it needs to be improved to do that, by all means.

+1 to this

Waiting for Hashing.h and DenseMap improvement would take too long.

@MaskRay what do you mean by this? Wouldn't it be a matter of applying these changes to Hashing.h instead of here?

@MaskRay
Copy link
Member

MaskRay commented Jun 22, 2024

If you want to improve the hash functions used in LLVM, that seems like an interesting project. Again, I posted up thread a link to a very good hash function I have been developing based on experience with Abseil's, and it works very well with DenseMap as well as other hashtables.
But I don't understand why the goal isn't to use the LLVM hashing library that is currently in the codebase, and if desired, add improvements to it. That seems better than slowly re-creating another hashing library here. =[ The original intent of the ADT/Hashing.h code was to provide a good replacement for DenseMap and other usages. If it needs to be improved to do that, by all means.

+1 to this

Waiting for Hashing.h and DenseMap improvement would take too long.

@MaskRay what do you mean by this? Wouldn't it be a matter of applying these changes to Hashing.h instead of here?

Current bit mixers in Hashing.h are too expensive for DenseMapInfo<unsigned long>.
We need a simpler mixer that just combines low and high bits from the input. MurmurHash-like strong mixers are overkill due to higher latency.

We've switched to densemap::detail::mix for 64-bit integer hashing which is simpler and avoids including Hashing.h in DenseMapInfo.h.
(Simplifying the bit mixer for DenseMapInfo<pair<X,Y>> yields a noticeable compile time improvement https://llvm-compile-time-tracker.com/compare.php?from=58d7a6e0e6361871442df956bb88798ce602b09d&to=fb17bbce80cf76ce1a31eff463f451f626bc36b5&stat=instructions:u)


Initially, I thought changing Hashing.h would be complex due to Hylum's Law: many clients relied incorrectly on the iteration order of DenseMap<StringRef, A> or the deterministic behavior of hash_value(StringRef), while Hashing.h is designed to be non-deterministic.

I have now fixed these clients (10+ commits) and proposed #96282 for improvement.

Should we include Hashing.h in DenseMapInfo.h eventually? Likely, but it involves more work than it seems.

ChuanqiXu9 added a commit that referenced this pull request Jun 24, 2024
…:getHashValue

The FIXME says to revert this when the underlying issue got fixed. And
now the underlying issue got fixed in
#95734. So I think it should be
fine to rever that one now.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
(Background: See the comment:
llvm#92083 (comment))

It looks like the hash function for 64bits integers are not very good:

```
  static unsigned getHashValue(const unsigned long long& Val) {
    return (unsigned)(Val * 37ULL);
  }
```

Since the result is truncated to 32 bits. It looks like the higher 32
bits won't contribute to the result. So that `0x1'00000001` will have
the the same results to `0x2'00000001`, `0x3'00000001`, ...

Then we may meet a lot collisions in such cases. I feel it should
generally good to include higher 32 bits for hashing functions.

Not sure who's the appropriate reviewer, adding some people by
impressions.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ounds shift warning (llvm#96173)

Fixes MSVC warning after llvm#95734 - despite it taking the `sizeof(Val) == 4` path, it still warns that the 32-bit unsigned long shift by 32 is out of bounds.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…:getHashValue

The FIXME says to revert this when the underlying issue got fixed. And
now the underlying issue got fixed in
llvm#95734. So I think it should be
fine to rever that one now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.