Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String support for jcudf row to cudf column conversion #10871

Merged

Conversation

hyperbolic2346
Copy link
Contributor

This PR adds support for string column creation from jcudf row data. It leverages the fixed-width data copy to convert the offsets and lengths stored inside the fixed-width data section and then uses that information to copy the string data itself from the jcudf row format into the cudf column.

closes #10286

@hyperbolic2346 hyperbolic2346 added feature request New feature or request 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels May 17, 2022
@hyperbolic2346 hyperbolic2346 self-assigned this May 17, 2022
@hyperbolic2346 hyperbolic2346 requested a review from a team as a code owner May 17, 2022 05:14
@github-actions github-actions bot added the Java Affects Java cuDF API. label May 17, 2022
@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #10871 (546b4b4) into branch-22.06 (6352b4e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@               Coverage Diff                @@
##           branch-22.06   #10871      +/-   ##
================================================
+ Coverage         86.29%   86.32%   +0.02%     
================================================
  Files               144      144              
  Lines             22656    22668      +12     
================================================
+ Hits              19552    19569      +17     
+ Misses             3104     3099       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/_base_index.py 85.71% <ø> (ø)
python/cudf/cudf/core/frame.py 93.41% <ø> (ø)
python/cudf/cudf/core/groupby/groupby.py 91.79% <ø> (+0.22%) ⬆️
python/cudf/cudf/core/index.py 92.06% <ø> (ø)
python/cudf/cudf/core/indexed_frame.py 91.70% <ø> (ø)
python/cudf/cudf/core/series.py 95.17% <ø> (ø)
python/cudf/cudf/core/column/string.py 88.78% <100.00%> (+0.12%) ⬆️
python/dask_cudf/dask_cudf/groupby.py 97.42% <100.00%> (+<0.01%) ⬆️
python/dask_cudf/dask_cudf/tests/test_groupby.py 100.00% <100.00%> (ø)
python/cudf/cudf/utils/ioutils.py 79.47% <0.00%> (-0.13%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac77940...546b4b4. Read the comment docs.

@ttnghia ttnghia changed the title string support for jcudf row to cudf column conversion String support for jcudf row to cudf column conversion May 17, 2022
@ttnghia
Copy link
Contributor

ttnghia commented May 18, 2022

I don't see any unit test for verifying the correctness of the function(s)? It would be nice to have a round trip test (row => col => row and col => row => col).

hyperbolic2346 and others added 2 commits May 18, 2022 19:53
Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>
@hyperbolic2346
Copy link
Contributor Author

I don't see any unit test for verifying the correctness of the function(s)? It would be nice to have a round trip test (row => col => row and col => row => col).

I agree, but the java directory isn't included in cudf tests. We have tests on the java side, but the strings aren't tested there yet. I have a test set that I use for testing, but I have to copy the row_conversion.cu down into cudf to run the tests. I was hoping that a solid testing framework would be a part of the spark-jni repo and this could move there, but it hasn't happened yet.

@hyperbolic2346
Copy link
Contributor Author

rerun tests

1 similar comment
@hyperbolic2346
Copy link
Contributor Author

rerun tests

@hyperbolic2346
Copy link
Contributor Author

@gpucibot merge

@raydouglass raydouglass merged commit b4674a1 into rapidsai:branch-22.06 May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request Java Affects Java cuDF API. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JCUDF row to cuDF column for tables with strings
4 participants