Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add python bindings in the parquet reader for num_rows/skiprows #15144

Open
2 tasks
GregoryKimball opened this issue Feb 26, 2024 · 1 comment · May be fixed by #16214
Open
2 tasks

[FEA] Add python bindings in the parquet reader for num_rows/skiprows #15144

GregoryKimball opened this issue Feb 26, 2024 · 1 comment · May be fixed by #16214
Assignees
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request good first issue Good for newcomers libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Feb 26, 2024

Is your feature request related to a problem? Please describe.
Unfortunately there has been churn in libcudf around support for num_rows/skiprows in the Parquet and ORC readers. In 22.08 we deprecated these parameters in the parquet reader (#11218) and then in 22.10 we removed them from C++ (#11503) and python (#11480). We also deprecated num_rows/skiprows in the ORC reader (#11522, see issue #11519).

At this point, we realized that chunked parquet reading (#11867) would require adding num_rows/skiprows back to the C++ implementation (#11657).

Let's stabilize row selection APIs in libcudf by completing these tasks:

Additional context
We also dropped num_rows/skiprows support in the cuDF-python fuzz tests (#11505). My preference is to not include any python fuzz testing changes in the scope of this issue.

@GregoryKimball GregoryKimball added feature request New feature or request 0 - Backlog In queue waiting for assignment good first issue Good for newcomers libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Feb 26, 2024
@lithomas1 lithomas1 added the Python Affects Python cuDF API. label Jun 4, 2024
@lithomas1 lithomas1 self-assigned this Jun 17, 2024
@lithomas1
Copy link
Contributor

Planning on implementing this as part of porting the parquet reader to pylibcudf

@lithomas1 lithomas1 linked a pull request Jul 8, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment cuIO cuIO issue feature request New feature or request good first issue Good for newcomers libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants