-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add support for ORC reader #23037
base: master
Are you sure you want to change the base?
Conversation
55a8d5b
to
7325337
Compare
Hi @majetideepak @aditi-pandit could you please help review this PR? Thanks! |
@wypb can you add some end-to-end tests? Thanks! |
@wypb : Would be great to use ORC with the QueryRunners (https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java) in an e2e test. The test should highlight differences of ORC wrt Parquet, demonstrate filter pushdown as well. Using ORC with Hive and as a format with Iceberg is perfect. |
Hi @majetideepak @aditi-pandit I added TPCH tests for ORC, including the Iceberg data source. The TPCDS test for ORC is not added because some types of Velox's ORC reader currently do not implement fast path, which will cause exceptions when reading data.
|
@wypb : Your code looks fine. When I search for ORC in the presto-native-execution directory I also see the following usage. https://github.com/prestodb/presto/blob/master/presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestWriter.java#L71 needs a fix as well Please can you check about it. |
0d3570c
to
9615017
Compare
Good catch, thank you @aditi-pandit I've fixed it. |
@aditi-pandit I looked at the code again and found that this should not be removed. |
Description
We have recently merged the PR for reading ORC statistics and implementing OrcReader based on DwrfReader on the velox side. Now it is time to add support for ORC reader it in Prestissimo.
NOTE: Because Presto uses RLEv2 encoding to write ORC files, and some types of Velox ORC readers do not implement fast path readers, which will cause exceptions when Velox reads ORC, so end-to-end tests for ORC are not added here. Once Velox implements fast path readers for ORC RLEv2 encoding, we need to add ORC tests.