Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for page-level filter pushdown (column and offset indexes) #107

Merged
merged 16 commits into from
Mar 25, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Mar 20, 2022

This PR:

  • adds functions to read column and offset indexes [1] that contain the necessary information to read pages
  • changes the write APIs to write column and offset indexes [1]
  • a page iterator that leverages page indexes and column indexes to apply filter push-down at the page level.

[1] https://github.com/apache/parquet-format/blob/master/PageIndex.md

@jorgecarleitao jorgecarleitao added the feature A new feature label Mar 20, 2022
@codecov-commenter
Copy link

codecov-commenter commented Mar 20, 2022

Codecov Report

Merging #107 (69cdd07) into main (6511088) will increase coverage by 1.10%.
The diff coverage is 64.73%.

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   61.75%   62.86%   +1.10%     
==========================================
  Files          66       72       +6     
  Lines        2782     3172     +390     
==========================================
+ Hits         1718     1994     +276     
- Misses       1064     1178     +114     
Impacted Files Coverage Δ
parquet-tools/src/lib/meta.rs 0.00% <ø> (ø)
src/bloom_filter/read.rs 0.00% <0.00%> (ø)
src/error.rs 20.00% <0.00%> (-2.23%) ⬇️
src/metadata/column_descriptor.rs 100.00% <ø> (+26.66%) ⬆️
src/page/page_dict/mod.rs 78.94% <0.00%> (ø)
src/read/page/indexed_reader.rs 0.00% <0.00%> (ø)
src/read/page/stream.rs 0.00% <0.00%> (ø)
src/schema/types/basic_type.rs 100.00% <ø> (ø)
src/schema/types/converted_type.rs 35.71% <ø> (ø)
src/schema/types/physical_type.rs 91.30% <ø> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6511088...69cdd07. Read the comment docs.

@jorgecarleitao jorgecarleitao changed the title Added support to write column and offset indexes Added support for page-level filter pushdown (column and offset indexes) Mar 25, 2022
@jorgecarleitao jorgecarleitao merged commit 7cac93e into main Mar 25, 2022
@jorgecarleitao jorgecarleitao deleted the write_indexes branch March 28, 2022 18:36
dantengsky pushed a commit to datafuse-extras/parquet2 that referenced this pull request Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants