Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Parquet v2 dictionary filtering #251

Merged
merged 2 commits into from Mar 5, 2019

Conversation

3 participants
@ryanrupp
Copy link
Member

ryanrupp commented Feb 17, 2019

Use page encoding stats if they're available to determine if all pages
are dictionary encoded enabling the use of dictionary filtering.

@ryanrupp ryanrupp force-pushed the ryanrupp:support_parquet_v2_dictionary_filter branch from 53c24d9 to 982ca8e Feb 17, 2019

@ryanrupp

This comment has been minimized.

Copy link
Member Author

ryanrupp commented Feb 18, 2019

More context in prestodb/presto#12248

@dain dain requested a review from zhenxiao Feb 20, 2019

@zhenxiao
Copy link
Member

zhenxiao left a comment

Hi @ryanrupp thanks for ur work. This is a nice catch for Parquet upgrade
I left some comments
BTW, it might be nice to have 2 commits: one for "Support dictionary encoding in Parquet V2", another one for "migrate Parquet dictionary testcases to presto-parquet"

@ryanrupp ryanrupp force-pushed the ryanrupp:support_parquet_v2_dictionary_filter branch from 982ca8e to 15878ee Feb 25, 2019

@ryanrupp

This comment has been minimized.

Copy link
Member Author

ryanrupp commented Feb 25, 2019

Rebased this into two separate commits as suggested

@zhenxiao
Copy link
Member

zhenxiao left a comment

thank you @ryanrupp the code looks good to me, just one minor thing
could you please sign the CLA?

@dain could you please take another look?

Support dictionary encoding in Parquet V2
Use page encoding stats if they're available to determine if all pages
are dictionary encoded enabling the use of dictionary filtering. Previously, it couldn't be determined if fallback to plain encoding occurred in the V2 format therefore dictionary filtering was disabled for the V2 format.

@ryanrupp ryanrupp force-pushed the ryanrupp:support_parquet_v2_dictionary_filter branch from 15878ee to 2ba8358 Mar 1, 2019

@cla-bot cla-bot bot added the cla-signed label Mar 1, 2019

@ryanrupp

This comment has been minimized.

Copy link
Member Author

ryanrupp commented Mar 4, 2019

@zhenxiao @dain This should be good to go now, I signed the CLA

@martint martint requested a review from dain Mar 4, 2019

@prestosql prestosql deleted a comment from cla-bot bot Mar 5, 2019

@prestosql prestosql deleted a comment from cla-bot bot Mar 5, 2019

@prestosql prestosql deleted a comment from cla-bot bot Mar 5, 2019

@dain

dain approved these changes Mar 5, 2019

@dain dain merged commit df7a2ba into prestosql:master Mar 5, 2019

1 check passed

verification/cla-signed
Details

@ryanrupp ryanrupp deleted the ryanrupp:support_parquet_v2_dictionary_filter branch Mar 5, 2019

@ryanrupp ryanrupp referenced this pull request Mar 7, 2019

Closed

Release notes for 305 #342

4 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.