Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in JNI for parsing JSON data and getting the metadata back too. #11431

Merged
merged 6 commits into from
Aug 3, 2022

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Aug 2, 2022

Description

Adds in a new java binding to allow reading a JSON buffer and getting back the metadata along with the table when inferring the schema.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes. (the functionality is covered by C++ tests, this just adds in a new java binding, if more is needed/wanted I am happy to add more)
  • The documentation is up to date with these changes.

@revans2 revans2 added 3 - Ready for Review Ready for review by team Spark Functionality that helps Spark RAPIDS 4 - Needs cuDF (Java) Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 2, 2022
@revans2 revans2 added this to PR-WIP in v22.10 Release via automation Aug 2, 2022
@revans2 revans2 requested a review from a team as a code owner August 2, 2022 16:13
@revans2 revans2 self-assigned this Aug 2, 2022
@github-actions github-actions bot added the Java Affects Java cuDF API. label Aug 2, 2022
Copy link
Collaborator

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

v22.10 Release automation moved this from PR-WIP to PR-Reviewer approved Aug 2, 2022
@codecov
Copy link

codecov bot commented Aug 2, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@039622f). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.10   #11431   +/-   ##
===============================================
  Coverage                ?   86.47%           
===============================================
  Files                   ?      144           
  Lines                   ?    22856           
  Branches                ?        0           
===============================================
  Hits                    ?    19764           
  Misses                  ?     3092           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

* A table along with some metadata about the table. This is typically returned when
* reading data from an input file where the metadata can be important.
*/
public class TableWithMeta implements AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be implemented as a derived class extending Table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not match what the C++ code is doing. If you want me to try I can. I agree that conceptually it would be more interesting, but I wanted to preserve the C++ API in this case. Also it gets to be a more invasive change to make it work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So it is just some wrapper for special purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For all of the other APIs where we read the from a file we already know the names of the columns, as we passed them into the reader. So in those cases we just return a Table with the columns in the order that was requested when it was passed in. Here we need the metadata to be able to know what the names of the columns are. Long term we should probably update all of our reader APIs to match what CUDF is doing in C++ and just return the TableWithMeta. But for now to minimize the impact of the change we decided not to do that. I do like the idea of having TableWithMeta be a Table too. We probably will/should implement that if we do try to update all of the other APIs, just because it would minimize the impact of the change on customer code.

@ttnghia
Copy link
Contributor

ttnghia commented Aug 3, 2022

Some Python tests failed. Let's try to merge upstream.

@revans2
Copy link
Contributor Author

revans2 commented Aug 3, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 276b996 into rapidsai:branch-22.10 Aug 3, 2022
v22.10 Release automation moved this from PR-Reviewer approved to Done Aug 3, 2022
@revans2 revans2 deleted the json_schema_read branch August 3, 2022 18:47
@vyasr vyasr added the 4 - Needs Review Waiting for reviewer to review or respond label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants