Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD5 Python hash API #9390

Merged
merged 12 commits into from Oct 12, 2021
Merged

MD5 Python hash API #9390

merged 12 commits into from Oct 12, 2021

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Oct 6, 2021

This PR introduces a public API in cuDF for MD5 hashing, using the parameter DataFrame.hash_columns(..., method="md5") or Series.hash_values(..., method="md5"). The default hashing method is MurmurHash3 (method="murmur3"). I also changed the return value of Series.hash_values to be a Series, rather than a cupy array.

Related to #8641. SHA support will be added in a later PR.

@bdice bdice self-assigned this Oct 6, 2021
@github-actions github-actions bot added the cuDF (Python) Affects Python cuDF API. label Oct 6, 2021
@bdice bdice added feature request New feature or request non-breaking Non-breaking change labels Oct 6, 2021
@bdice bdice added this to PR-WIP in v21.12 Release via automation Oct 6, 2021
@bdice bdice marked this pull request as ready for review October 6, 2021 19:22
@bdice bdice requested a review from a team as a code owner October 6, 2021 19:22
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/series.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/series.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 6, 2021

Codecov Report

Merging #9390 (29de972) into branch-21.12 (ab4bfaa) will decrease coverage by 0.02%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9390      +/-   ##
================================================
- Coverage         10.79%   10.76%   -0.03%     
================================================
  Files               116      116              
  Lines             18869    19478     +609     
================================================
+ Hits               2036     2096      +60     
- Misses            16833    17382     +549     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_lib/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/lists.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/timedelta.py 0.00% <0.00%> (ø)
... and 77 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a743ce8...29de972. Read the comment docs.

@bdice bdice requested a review from isVoid October 7, 2021 15:06
@bdice bdice added the 3 - Ready for Review Ready for review by team label Oct 7, 2021
v21.12 Release automation moved this from PR-WIP to PR-Reviewer approved Oct 8, 2021
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

"a multi hash-step data point in the hash function being "
"tested. This string needed to be longer."
),
"All work and no play makes Jack a dull boy",
Copy link
Contributor

@shwina shwina Oct 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<heres_johnny.gif>

@bdice
Copy link
Contributor Author

bdice commented Oct 12, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6dbea58 into rapidsai:branch-21.12 Oct 12, 2021
v21.12 Release automation moved this from PR-Reviewer approved to Done Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team cuDF (Python) Affects Python cuDF API. feature request New feature or request non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants