-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add hive configs for supported read and write formats #25147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java
Outdated
Show resolved
Hide resolved
|
@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers. |
3f1bcac to
7f731d8
Compare
|
Thanks for the feedback @tdcmeehan, @aditi-pandit . Added hive configs for supported read and write formats, and validated read/write operations fail for unsupported formats when these configs are set. Could you please take another look? |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pramodsatya
presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java
Show resolved
Hide resolved
|
Should we have documentation for these new properties? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
| ``hive.read-formats`` Comma separated list of file formats supported for reads | ||
| from tables. | ||
|
|
||
| ``hive.write-formats`` Comma separated list of file formats supported for writes | ||
| to tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention that the default behavior is to allow all built-in support of read and write formats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this to the doc, could you please take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pramodsatya : This code looks good. Though can you write e2e tests with queries ... maybe set read/write formats to only parquet in the queryRunner and show that reading.writing any other format fails.
Thanks @aditi-pandit, added tests for these configs with different file formats. Could you please take another look? |
|
Suggest adding a link to the doc in the release note entry, like so: |
| session.getRuntimeStats()); | ||
| Table table = layout.getTable(metastore, metastoreContext); | ||
|
|
||
| if (!readFormats.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we should delay checking the tablelayout for storageFormat until split scheduling. We want to avoid scheduling the fragment at all. Is it possible to do this earlier in BasePlanFragmenter
https://github.com/prestodb/presto/blob/master/presto-main-base/src/main/java/com/facebook/presto/sql/planner/BasePlanFragmenter.java#L249 ? The TableLayout is available at this point.
@tdcmeehan : wdyt ?
| SchemaTableName tableName = ((HiveTableHandle) tableHandle).getSchemaTableName(); | ||
| Table table = metastore.getTable(metastoreContext, tableName.getSchemaName(), tableName.getTableName()) | ||
| .orElseThrow(() -> new TableNotFoundException(tableName)); | ||
| HiveStorageFormat tableStorageFormat = extractHiveStorageFormat(table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this code is called during scheduling stage executions... Its worthwhile only if the check is done sooner in logical planning or BasePlanFragmenter.
I think extractHiveStorageFormat from a table can be called much sooner. Can you investigate further ?
Description
Adds hive configs
hive.read-formatsandhive.write-formatsto configure the file formats supported by hive connector for read and write operations respectively.Motivation and Context
Presto C++ only supports reading of tables with
DWRF,ORCandPARQUETformats, and writing to tables withDWRFandPARQUETformats, with the hive connector. Using these hive configs will allow to fail-fast at coordinator when attempting to read from and write to tables with unsupported file formats in Presto C++.Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:
Release Notes