Skip to content

Improve make_hash function #20464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 5, 2015
Merged

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jan 3, 2015

The make_hash function is used to prevent hashes of non-empty
buckets to collide with EMPTY_HASH = 0u64. Ideally this function
also preserve the uniform distribution of hashes and is cheap to
compute.

The new implementation reduces the input hash size by one bit, simply
by setting the most significant bit. This obviously prevent output
hashes to collide with EMPTY_HASH and guarantees that the uniform
distribution is preserved. Moreover, the new function is simpler (no
comparisons, just an OR) and (under the same assumptions as the old
function, i.e. only the least significant bit will contribute to the
bucket index) no additional collisions are caused.

The `make_hash` function is used to prevent hashes of non-empty
buckets to collide with `EMPTY_HASH = 0u64`. Ideally this function
also preserve the uniform distribution of hashes and is cheap to
compute.

The new implementation reduces the input hash size by one bit, simply
by setting the most significant bit. This obviously prevent output
hashes to collide with `EMPTY_HASH` and guarantees that the uniform
distribution is preserved. Moreover, the new function is simpler (no
comparisons, just an OR) and (under the same assumptions as the old
function, i.e. only the least significant bit will contribute to the
bucket index) no additional collisions are caused.
@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@ranma42
Copy link
Contributor Author

ranma42 commented Jan 3, 2015

I did some benchmarking on x86-64 (by stripping the call to the hasher and comparing minimal make_hash functions that simply use the old vs new policy). Iterating over 0..2047u64 gives:

running 2 tests
test b_make_hash_2048     ... bench:      1129 ns/iter (+/- 45)
test b_make_hash_new_2048 ... bench:       900 ns/iter (+/- 58)

On 32-bits platforms I would expect a possibly bigger improvement, because LLVM should be able to notice that the OR operation on the low word is a no-op (while to compare with EMPTY_BUCKET both words had to be examined).

@alexcrichton
Copy link
Member

cc @gankro, @pczarn, curious about your opinions on this.

@Gankra
Copy link
Contributor

Gankra commented Jan 3, 2015

Do these performance changes actually make it into the actual HashMap benches?

@Gankra
Copy link
Contributor

Gankra commented Jan 3, 2015

Functionality-wise this seems like a reasonable change. It should have no practical effect on truncated hash-collisions, but will double the likelihood of a "true" hash collision from "very unlikely" to "still very unlikely".

@pczarn
Copy link
Contributor

pczarn commented Jan 3, 2015

The old implementation was basically a single 64-bit cmov.

Pros of the new approach:

  • Definitely better on 32-bit platforms.
  • A little simpler.

I agree with @gankro, there are no other pros or cons in practice.

@ranma42
Copy link
Contributor Author

ranma42 commented Jan 4, 2015

@gankro I do not expect these performance changes (0.1 ns every time a hash is computed?) to be measurable in practical applications. The main reason I proposed this change is to simplify the function and to make it more obvious that the distribution remains uniform after making the hash "safe".

bors added a commit that referenced this pull request Jan 4, 2015
Improve `make_hash` function

Reviewed-by: Gankro, Gankro
@bors bors merged commit 28cca28 into rust-lang:master Jan 5, 2015
@ranma42 ranma42 deleted the improve-make-hash branch January 17, 2015 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants