Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Optimize hashing using ahash and multiversion (-30%) #428

Merged
merged 5 commits into from Sep 20, 2021

Conversation

Dandandan
Copy link
Collaborator

@Dandandan Dandandan commented Sep 19, 2021

This uses the T::get_hash, which gives some speedup over the builder.

Also move ahash to the compute feature and add multiversioning to select/specialize on necessary instructions.

@codecov
Copy link

codecov bot commented Sep 19, 2021

Codecov Report

Merging #428 (b8db919) into main (55ff79c) will decrease coverage by 0.01%.
The diff coverage is 91.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #428      +/-   ##
==========================================
- Coverage   80.80%   80.78%   -0.02%     
==========================================
  Files         353      372      +19     
  Lines       22649    22643       -6     
==========================================
- Hits        18302    18293       -9     
- Misses       4347     4350       +3     
Impacted Files Coverage Δ
src/array/ord.rs 64.21% <81.81%> (ø)
src/compute/hash.rs 93.33% <100.00%> (-0.67%) ⬇️
src/compute/arithmetics/time.rs 47.05% <0.00%> (-40.73%) ⬇️
src/compute/arithmetics/mod.rs 23.07% <0.00%> (-26.57%) ⬇️
src/compute/contains.rs 34.31% <0.00%> (-16.43%) ⬇️
src/compute/take/mod.rs 76.47% <0.00%> (-16.22%) ⬇️
src/compute/aggregate/memory.rs 25.00% <0.00%> (-15.00%) ⬇️
src/compute/arithmetics/decimal/div.rs 79.01% <0.00%> (-13.06%) ⬇️
src/compute/arithmetics/decimal/mul.rs 79.01% <0.00%> (-13.03%) ⬇️
src/compute/aggregate/min_max.rs 66.66% <0.00%> (-12.48%) ⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55ff79c...b8db919. Read the comment docs.

},
DataType::UInt64,
)
let state = new_state!();
Copy link
Collaborator

@sundy-li sundy-li Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state is initialized once for one array, will it cause a different hash result in another array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it uses the same seeds each time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but there is a hash builder inside get_hash, this may introduce extra allocate in each time, I did not find a way to improve that.

#[inline]
    fn get_hash<H: Hash + ?Sized, B: BuildHasher>(value: &H, build_hasher: &B) -> u64 {
        let mut hasher = build_hasher.build_hasher();
        value.hash(&mut hasher);
        hasher.finish()
    }

@Dandandan Dandandan changed the title Optimize hashing Optimize hashing using ahash and multiversioning Sep 20, 2021
@jorgecarleitao jorgecarleitao merged commit 38361d2 into jorgecarleitao:main Sep 20, 2021
@jorgecarleitao jorgecarleitao changed the title Optimize hashing using ahash and multiversioning Optimize hashing using ahash and multiversion (-30%) Sep 20, 2021
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Sep 20, 2021
@jorgecarleitao
Copy link
Owner

Awesome, thanks a lot! I updated the title to match the findings to show up in the changelog nicely.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants