Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Issues with hash-function for float64 version of klib's hash-map #21866
Hash-maps for float64 use the following hash-function
However, in order to guarantee consistent behavior, the following constrains must be met:
Following IEEE-754, floats aren't an equivalence relation, this is fixed defining "equal" as
Thus, apart from trivial equivalence classes, there are the following two:
for which the second constrain doesn't hold:
Due to the way klib uses this hash value, the values "0.0" and "-0.0" end up in different buckets only if there are more than 2^29 (ca. 6e8) elements in the hash-map already. The same holds for
A better approach would be to fix the hash-function of float64-hash-map, a possibility would be:
A little over my head but can you clarify the problem further? Is this something visible from the end user perspective?
I wouldn't really consider the items you've labeled as workarounds to actually be such, as
Because of workarounds, I'm not aware of a way for trigging an error in NAN-case (as long as one doesn't care exactly which NAN it is). There is however a way to trigger inconsistent behavior for
The size of b is
I do understand, that this is quite an esoteric case. My main issue with the implementation of float64-table as it is: There is a trap which obviously already have bitten at least twice and it will struck again in the future.
The problem is not the equal-operator (which is rightly extended with
This SO-question helped me to understand the issue, maybe it is better than my issue description.
Certainly would take an alternative hash function - as the long comment in the code indicates - we used to use python's hash for doubles, but that caused issues due to size truncation, so we're using a generic bit-shuffling one, same that ints use. As you show
But I think our approach for NaNs is fine? It's special cased, yes, but
referenced this issue
Jul 13, 2018
Added my suggestion as PR21904, it fixes both cases NaNs and signed zero. I think both are necessary: Using directly the
PS: don't know the right place for whatsnew entry, I hope, that in case the changes are ok, I will be guided to the right place...