Add hive configs for supported read and write formats #25147

pramodsatya · 2025-05-20T04:24:42Z

Description

Adds hive configs hive.read-formats and hive.write-formats to configure the file formats supported by hive connector for read and write operations respectively.

Motivation and Context

Presto C++ only supports reading of tables with DWRF, ORC and PARQUET formats, and writing to tables with DWRF and PARQUET formats, with the hive connector. Using these hive configs will allow to fail-fast at coordinator when attempting to read from and write to tables with unsupported file formats in Presto C++.
Currently attempting to read from tables with unsupported file formats in Presto C++ fails at the worker:

it != readerFactories().end() ReaderFactory is not registered for format text

Release Notes

== RELEASE NOTES ==
Hive Connector Changes
* Adds :ref:`connector/hive:Hive Configuration Properties` `hive.read-formats` and `hive.write-formats` to allow users to set file formats supported for read and write operations by hive connector.

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

aditi-pandit · 2025-05-20T20:51:00Z

@pramodsatya : Thanks for this code. Should we add a check for the file formats applicable at the Writer side as well ? Native execution only supports DWRF and Parquet writers.

pramodsatya · 2025-05-27T02:47:47Z

Thanks for the feedback @tdcmeehan, @aditi-pandit . Added hive configs for supported read and write formats, and validated read/write operations fail for unsupported formats when these configs are set. Could you please take another look?

aditi-pandit

Thanks @pramodsatya

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java

steveburnett · 2025-05-30T14:45:31Z

Should we have documentation for these new properties? https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst

steveburnett

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

tdcmeehan · 2025-06-04T20:28:40Z

presto-docs/src/main/sphinx/connector/hive.rst

+ ``hive.read-formats``                             Comma separated list of file formats supported for reads
+                                                   from tables.
+
+ ``hive.write-formats``                            Comma separated list of file formats supported for writes
+                                                   to tables.


Can you mention that the default behavior is to allow all built-in support of read and write formats?

Added this to the doc, could you please take another look?

aditi-pandit

@pramodsatya : This code looks good. Though can you write e2e tests with queries ... maybe set read/write formats to only parquet in the queryRunner and show that reading.writing any other format fails.

pramodsatya · 2025-06-09T04:20:11Z

@pramodsatya : This code lookso good. Though can you write e2e tests with queries ... maybe set read/write formats to only parquet in the queryRunner and show that reading.writing any other format fails.

Thanks @aditi-pandit, added tests for these configs with different file formats. Could you please take another look?

steveburnett · 2025-06-09T18:30:53Z

Suggest adding a link to the doc in the release note entry, like so:

== RELEASE NOTES ==

Hive Connector Changes
* Adds :ref:`connector/hive:Hive Configuration Properties` `hive.read-formats` and `hive.write-formats` to allow users to set file formats supported for read and write operations by hive connector.

aditi-pandit · 2025-06-10T22:08:04Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java

                session.getRuntimeStats());
        Table table = layout.getTable(metastore, metastoreContext);

+        if (!readFormats.isEmpty()) {


Don't think we should delay checking the tablelayout for storageFormat until split scheduling. We want to avoid scheduling the fragment at all. Is it possible to do this earlier in BasePlanFragmenter
https://github.com/prestodb/presto/blob/master/presto-main-base/src/main/java/com/facebook/presto/sql/planner/BasePlanFragmenter.java#L249 ? The TableLayout is available at this point.

@tdcmeehan : wdyt ?

aditi-pandit · 2025-06-10T22:19:16Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

        SchemaTableName tableName = ((HiveTableHandle) tableHandle).getSchemaTableName();
        Table table = metastore.getTable(metastoreContext, tableName.getSchemaName(), tableName.getTableName())
                .orElseThrow(() -> new TableNotFoundException(tableName));
+        HiveStorageFormat tableStorageFormat = extractHiveStorageFormat(table);


Even this code is called during scheduling stage executions... Its worthwhile only if the check is done sooner in logical planning or BasePlanFragmenter.

I think extractHiveStorageFormat from a table can be called much sooner. Can you investigate further ?

prestodb-ci added the from:IBM PR from IBM label May 20, 2025

pramodsatya marked this pull request as ready for review May 20, 2025 15:20

pramodsatya requested a review from a team as a code owner May 20, 2025 15:20

pramodsatya requested a review from jaystarshot May 20, 2025 15:20

prestodb-ci requested review from a team, pdabre12 and sh-shamsan and removed request for a team May 20, 2025 15:20

pramodsatya requested review from a team, aditi-pandit, nishithakbhaskaran and tdcmeehan and removed request for a team, pdabre12 and sh-shamsan May 20, 2025 15:20

tdcmeehan reviewed May 20, 2025

View reviewed changes

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java Outdated Show resolved Hide resolved

pramodsatya force-pushed the hive_rd_fmt branch 3 times, most recently from 3f1bcac to 7f731d8 Compare May 26, 2025 23:28

pramodsatya changed the title ~~[native] Fail-fast for file formats unsupported by hive connector~~ Add hive configs for supported read and write formats May 27, 2025

aditi-pandit reviewed May 29, 2025

View reviewed changes

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitManager.java Show resolved Hide resolved

presto-hive/src/main/java/com/facebook/presto/hive/HiveSessionProperties.java Show resolved Hide resolved

pramodsatya force-pushed the hive_rd_fmt branch from 7f731d8 to 02cf600 Compare June 3, 2025 02:19

pramodsatya requested review from elharo and steveburnett as code owners June 3, 2025 02:19

pramodsatya requested a review from aditi-pandit June 3, 2025 02:22

steveburnett previously approved these changes Jun 3, 2025

View reviewed changes

pramodsatya requested a review from tdcmeehan June 4, 2025 20:13

tdcmeehan reviewed Jun 4, 2025

View reviewed changes

pramodsatya dismissed steveburnett’s stale review via ce78374 June 6, 2025 03:18

pramodsatya force-pushed the hive_rd_fmt branch from 02cf600 to ce78374 Compare June 6, 2025 03:18

aditi-pandit reviewed Jun 6, 2025

View reviewed changes

Add hive configs for supported read and write formats

9968e12

pramodsatya force-pushed the hive_rd_fmt branch from ce78374 to 9968e12 Compare June 9, 2025 04:20

pramodsatya requested review from ClarenceThreepwood and feilong-liu as code owners June 9, 2025 04:20

pramodsatya requested review from aditi-pandit and tdcmeehan June 9, 2025 04:20

aditi-pandit reviewed Jun 10, 2025

View reviewed changes

Add hive configs for supported read and write formats #25147

Are you sure you want to change the base?

Add hive configs for supported read and write formats #25147

Uh oh!

Conversation

pramodsatya commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Release Notes

Uh oh!

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

pramodsatya commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

steveburnett commented May 30, 2025

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

tdcmeehan Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

pramodsatya Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

aditi-pandit left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pramodsatya commented Jun 9, 2025

Uh oh!

steveburnett commented Jun 9, 2025

Uh oh!

aditi-pandit Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pramodsatya commented May 20, 2025 •

edited

Loading

pramodsatya commented May 27, 2025 •

edited

Loading

aditi-pandit left a comment •

edited

Loading