Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ORC Bloom Filter #921

Closed
dilipkasana opened this issue Jun 5, 2019 · 3 comments

Comments

3 participants
@dilipkasana
Copy link
Member

commented Jun 5, 2019

This is same issue refers to : prestodb/presto#12900
The same is reproduces in community version of presto.
All releases of community presto from 300 to 313 will not get advantage of ORC Bloom filter due to this.
Below is the description :
After changes of StreamId in readBloomFilterIndexes method of StripeReader class the Bloom filter does not skip unsatisfied Row Group of ORC due to coding bug as the below line return always null.

StripeReader.java
List<HiveBloomFilter> bloomFilters = bloomFilterIndexes.get(entry.getKey());
This have an impact on Presto ORC performance.

@martint I have created the same issue here too and submit the PR for this.

@Praveen2112

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

hi.. But would this PR solves (#914) the issue ?

@dilipkasana

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

@Praveen2112 Yes, The PR will resolve this.

This is the snapshot of the query run on the current presto community version that should skip all the Splits and should read 0 Rows instead.

presto:default> select count(*) from msc_cdr_orc where called_calling_no='0091889' AND date_of_call=190508 AND call_type='SMT';
 _col0
-------
     0
(1 row)

Query 20190605_123713_00008_qxskm, FINISHED, 1 node
Splits: 67 total, 67 done (100.00%)
0:05 [26.3M rows, 91.1MB] [5.35M rows/s, 18.5MB/s]

This is the snapshot after the fix.

select count(*) from msc_cdr_orc where called_calling_no='0091889' AND date_of_call=190508 AND call_type='SMT';
 _col0
-------
     0
(1 row)

Query 20190605_124920_00001_ieepp, FINISHED, 1 node
Splits: 67 total, 67 done (100.00%)
0:02 [0 rows, 0B] [0 rows/s, 0B/s]

dilipkasana added a commit to dilipkasana/presto-1 that referenced this issue Jun 5, 2019

dain added a commit that referenced this issue Jun 7, 2019

@dain dain referenced this issue Jun 7, 2019

Closed

Release notes for 314 #879

2 of 6 tasks complete
@martint

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

Fixed

@martint martint closed this Jun 7, 2019

@martint martint added this to the 314 milestone Jun 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.