Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Murmur3 hash kernel cleanup #10143

Merged
merged 5 commits into from Feb 7, 2022

Conversation

rwlee
Copy link
Contributor

@rwlee rwlee commented Jan 27, 2022

Followup to #9919 -- kernel merging and code cleanup for Murmur3 hash.

Partial fix for #10081.

Benchmarked compute_bytes kernel with aligned read vs unaligned read and saw no difference. Looking into it further to confirm that the uint32_t construction was doing the same thing implicitly.

Due to byte alignment, the string alignment will require the getblock32 function regardless. Regardless, the benchmarks ran with 100, 103, and 104 byte strings had negligible performance differences. This reflects forced misalignment not negatively impacting the hash speed.

@rwlee rwlee requested a review from bdice January 27, 2022 01:27
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 27, 2022
@codecov
Copy link

codecov bot commented Jan 27, 2022

Codecov Report

Merging #10143 (e5e4914) into branch-22.04 (a7d88cd) will decrease coverage by 0.25%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10143      +/-   ##
================================================
- Coverage         10.42%   10.16%   -0.26%     
================================================
  Files               119      122       +3     
  Lines             20603    24693    +4090     
================================================
+ Hits               2148     2511     +363     
- Misses            18455    22182    +3727     
Impacted Files Coverage Δ
python/cudf/cudf/core/_base_index.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/column/timedelta.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column_accessor.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <ø> (ø)
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83accc6...e5e4914. Read the comment docs.

@rwlee rwlee added code quality non-breaking Non-breaking change labels Jan 27, 2022
@rwlee rwlee added this to PR-WIP in v22.04 Release via automation Jan 27, 2022
@rwlee rwlee added the 2 - In Progress Currently a work in progress label Jan 27, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some starting ideas for us to refactor. We can work through these in a pairing session.

cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
constexpr uint32_t c3 = 0xe6546b64;
constexpr uint32_t rot_c1 = 15;
constexpr uint32_t rot_c2 = 13;
auto getblock32 = [] __device__(uint32_t const* p, int i) -> uint32_t {
Copy link
Contributor

@bdice bdice Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pull this lambda out into a separate device function (not define it as an inline lambda) - like rotl32 and fmix32.

See additional comments below about why this isn't safe to take uint32 const* and must instead take std::byte const*. However, then there's no need to have the offset parameter int i because we can do that with pointer arithmetic at the call site.

k1 *= c1;
k1 = rotl32(k1, 15);
k1 = rotl32(k1, rot_c1);
Copy link
Contributor

@bdice bdice Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future PR: We might define common functions and magic values between MurmurHash3_32 and SparkMurmurHash3_32 like rotl32 and fmix32 and getblock32 in a common base class, and only override the Spark-specific bits in a derived class. CRTP might be an even better choice, like I did for the SHA-family functions (draft #9215) - just needs a bit of analysis to decide which way to go.

cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
@bdice bdice added the improvement Improvement / enhancement to an existing function label Jan 28, 2022
@rwlee rwlee marked this pull request as ready for review February 1, 2022 00:01
@rwlee rwlee requested a review from a team as a code owner February 1, 2022 00:01
@rwlee rwlee requested a review from cwharris February 1, 2022 00:01
@rwlee
Copy link
Contributor Author

rwlee commented Feb 1, 2022

original_benchmark.txt
new_benchmark.txt

There's a very tiny performance hit after the most recent set of change, I don't think it's large enough to be a concern.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is beneficial and its scope is reasonable -- I'd hesitate to make it larger since we're at a stopping point and have good benchmarks. I have some ideas for further refactors that I'd like to take on later.

I'm going to apply my own suggestions to renaming variables and will merge branch-22.04 so we can get an updated build time metrics report.

@@ -105,6 +105,14 @@ struct MurmurHash3_32 {
return h;
}

Copy link
Contributor

@bdice bdice Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self for a future PR: Do we need MurmurHash3_32 to be a templated class? Currently the class takes a template parameter Key and has an operator()(Key key) with no template parameters which calls a templated compute(T key). However, the way it's called in row_operators.cuh seems to indicate that we could instead have a plain (non-template) class with a templated operator(). That's the way we typically do type dispatching, and it's reversed here for no clear reason. The calling code uses a type dispatch on element_hasher_with_seed.

(This would probably affect performance and/or compile time but I don't know if it would be better or worse.)

cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
@@ -131,60 +139,69 @@ struct MurmurHash3_32 {
return combined;
}

result_type __device__ inline operator()(Key const& key) const { return compute(key); }
// TODO Do we need this operator() and/or compute? Probably not both.
Copy link
Contributor

@bdice bdice Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self for a future PR: I would try removing the compute method and move its definition to operator(). I think we might be able to safely remove the template <typename T> on compute(T). The operator() template parameter T has to match the class template parameter Key, from what I can see, and may be redundant. Any exceptions to this would probably be solved by removing the class template parameter Key and switching to just an operator() template parameter.

cpp/include/cudf/detail/utilities/hash_functions.cuh Outdated Show resolved Hide resolved
v22.04 Release automation moved this from PR-WIP to PR-Reviewer approved Feb 2, 2022
@bdice
Copy link
Contributor

bdice commented Feb 4, 2022

rerun tests

@bdice bdice added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 4, 2022
@bdice
Copy link
Contributor

bdice commented Feb 4, 2022

@cwharris This is ready for review, I just fixed the labels so it no longer says “in progress.” 😊

@bdice
Copy link
Contributor

bdice commented Feb 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 8014add into rapidsai:branch-22.04 Feb 7, 2022
v22.04 Release automation moved this from PR-Reviewer approved to Done Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants