Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for page skipping in parquet reader #14314

Open
mkleinbort-ic opened this issue Feb 6, 2024 · 1 comment
Open

Add support for page skipping in parquet reader #14314

mkleinbort-ic opened this issue Feb 6, 2024 · 1 comment
Labels
enhancement New feature or an improvement of an existing feature

Comments

@mkleinbort-ic
Copy link

Description

I am not sure if this is implemented, but keen to see it added if it's not.

ColumnIndex Layout to Support Page Skipping

It seems to be a way to skip data pages when reading a parquet file.

This can be very useful when you have chunky data - say - sorted first by date and userId. This would allow you to quickly select one date & userId pair (by reading only some pages of the relevant row group) without loosing the benefits of parquet.

@mkleinbort-ic mkleinbort-ic added the enhancement New feature or an improvement of an existing feature label Feb 6, 2024
@mkleinbort-ic
Copy link
Author

This is a useful reference:

https://stackoverflow.com/questions/26909543/index-in-parquet

With promising benchmarks:

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant