Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in java bindings for DataSource #14254

Merged
merged 12 commits into from
Oct 11, 2023

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Oct 5, 2023

Description

This PR adds DataSource Java bindings. It also fixes a small bug in CUDF that made it so the bindings would not work for anything but CSV.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added 3 - Ready for Review Ready for review by team Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 5, 2023
@revans2 revans2 requested a review from a team as a code owner October 5, 2023 15:52
@revans2 revans2 self-assigned this Oct 5, 2023
@revans2 revans2 requested a review from a team as a code owner October 5, 2023 15:52
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Oct 5, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

java/src/main/java/ai/rapids/cudf/DataSource.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/DataSource.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/DataSource.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/DataSource.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/DataSource.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/DataSourceHelper.java Outdated Show resolved Hide resolved
java/src/main/java/ai/rapids/cudf/HostMemoryBuffer.java Outdated Show resolved Hide resolved
@revans2
Copy link
Contributor Author

revans2 commented Oct 9, 2023

@jlowe please take another look

@revans2
Copy link
Contributor Author

revans2 commented Oct 9, 2023

@davidwendt @vyasr could you please take a quick look at the C++ fix I had to put in to allow parquet/orc reads with a custom data source.

@davidwendt davidwendt requested a review from vuule October 9, 2023 20:50
@vuule
Copy link
Contributor

vuule commented Oct 10, 2023

Nice!
What is the motivating use case?

@revans2
Copy link
Contributor Author

revans2 commented Oct 10, 2023

Nice! What is the motivating use case?

I did most of it to try and see if it could help with some memory management pain points. This is one of the few places where we hold onto host memory while allocating GPU memory. It didn't work out exactly how we wanted, but the plan is to try and avoid an extra copy when stitching parts of ORC and parquet together into a final buffer we send to CUDF. It might get expanded to other things in the future.

@revans2
Copy link
Contributor Author

revans2 commented Oct 10, 2023

@vuule could you review the change I had to make to cpp/src/io/utilities/datasource.cpp? It is a C++ change to CUDF do I need a reviewer for it.

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving libcudf C++ change.

@revans2
Copy link
Contributor Author

revans2 commented Oct 10, 2023

/merge

@rapids-bot rapids-bot bot merged commit b17904d into rapidsai:branch-23.12 Oct 11, 2023
60 checks passed
@revans2 revans2 deleted the jni_io_callback branch October 11, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants