-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Improve make_hash
function
#20464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve make_hash
function
#20464
Conversation
The `make_hash` function is used to prevent hashes of non-empty buckets to collide with `EMPTY_HASH = 0u64`. Ideally this function also preserve the uniform distribution of hashes and is cheap to compute. The new implementation reduces the input hash size by one bit, simply by setting the most significant bit. This obviously prevent output hashes to collide with `EMPTY_HASH` and guarantees that the uniform distribution is preserved. Moreover, the new function is simpler (no comparisons, just an OR) and (under the same assumptions as the old function, i.e. only the least significant bit will contribute to the bucket index) no additional collisions are caused.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see CONTRIBUTING.md for more information. |
I did some benchmarking on x86-64 (by stripping the call to the hasher and comparing minimal
On 32-bits platforms I would expect a possibly bigger improvement, because LLVM should be able to notice that the OR operation on the low word is a no-op (while to compare with |
Do these performance changes actually make it into the actual HashMap benches? |
Functionality-wise this seems like a reasonable change. It should have no practical effect on truncated hash-collisions, but will double the likelihood of a "true" hash collision from "very unlikely" to "still very unlikely". |
The old implementation was basically a single 64-bit cmov. Pros of the new approach:
I agree with @gankro, there are no other pros or cons in practice. |
@gankro I do not expect these performance changes (0.1 ns every time a hash is computed?) to be measurable in practical applications. The main reason I proposed this change is to simplify the function and to make it more obvious that the distribution remains uniform after making the hash "safe". |
Improve `make_hash` function Reviewed-by: Gankro, Gankro
The
make_hash
function is used to prevent hashes of non-emptybuckets to collide with
EMPTY_HASH = 0u64
. Ideally this functionalso preserve the uniform distribution of hashes and is cheap to
compute.
The new implementation reduces the input hash size by one bit, simply
by setting the most significant bit. This obviously prevent output
hashes to collide with
EMPTY_HASH
and guarantees that the uniformdistribution is preserved. Moreover, the new function is simpler (no
comparisons, just an OR) and (under the same assumptions as the old
function, i.e. only the least significant bit will contribute to the
bucket index) no additional collisions are caused.