[Native] Add missing plumbing for Cte support #22780

aditi-pandit · 2024-05-17T23:32:50Z

Description

CTE support is added in Presto from #20887.
This feature is largely in the Presto optimizer logic. But it relies on the Temporary table SPI to create TableWriterNodes on the workers.

The temporary table SPI was disabled in the PrestoToVelox conversion. The temporary table usage is like regular new Hive tables at the worker. The temporary table SPI creates table nodes and writes to them in the same pipeline. The table commit handling differs from regular tables and is processed with the TableFinish operator at the co-ordinator.

Motivation and Context

Use CTE with Prestissimo

Impact

https://prestodb.io/docs/0.286/admin/properties.html#cte-materialization-properties can be used with Prestissimo workers as well.

Though Prestissimo only supports only the following storage and compression-codec options

hive.temporary-table-storage-format = DWRF, PARQUET
hive.temporary-table-compression-codec = ZSTD, NONE

Test Plan

e2e tests added in the PR. These are derived from the e2e Java CTE tests. So CTE tests are now run with both engines.

== RELEASE NOTE ==

Add CTE materialization for Presto C++ workers with the configuration properties 
`hive.temporary-table-storage-format` (`DWRF` or `PARQUET` only) and 
`hive.temporary-table-compression-codec` (`ZSTD` or `NONE` only). 
:pr:`22780`

presto-native-execution/presto_cpp/main/types/PrestoToVeloxConnector.cpp

yingsu00 · 2024-07-09T14:58:29Z

...e-execution/src/test/java/com/facebook/presto/nativeworker/PrestoNativeQueryRunnerUtils.java

+                "legacy",
+                hiveProperties,
+                workerCount,
+                Optional.of(Paths.get(dataDirectory + "/" + storageFormat)),


storageFormat should NOT be added to the path by default. If you specify storageFormat, you won't be able to see tables with different file formats in the same catalog or schema(database) when you're running any QueryRunner. Remember, the QueryRunners should allow the users to have tables with different file formats in the same database, and even allow joining them together in the same query. Also, the DATA_DIR won't be visible to other query runner at all. We should gradually retire this storageFormat from the data path.

Suppose you set DATA_DIR='/Users/aditi', adding storage format would make all your metadata and data in /Users/aditi/PARQUET/hive_data/. You will only be able to create and query Parquet tables running the QueryRunner, but not DWRF tables. We introduced boolean addStorageFormatToPath parameter in createNativeQueryRunner() and set it to false by default just for backward compatibility. Can you do the same? Thanks.

My bad Ying. I do recall this discussion.

Updated the code.

yingsu00 · 2024-07-09T15:14:02Z

...execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeCteExecution.java

+    public void testPersistentCteWithChar() {}
+
+    @Override
+    // Unsupported nested encoding in Velox Parquet


Add writer to be more clear?
// Unsupported nested encoding in Velox Parquet writer

yingsu00 · 2024-07-09T15:16:10Z

...-native-execution/src/test/java/com/facebook/presto/nativeworker/NativeQueryRunnerUtils.java

    {
        if (!queryRunner.tableExists(queryRunner.getDefaultSession(), "lineitem")) {
+            String shipDate = castDateToVarchar ? "cast(shipdate as varchar) as shipdate" : "shipdate";


Is this needed by the CTE? If not, will you be able to move this change into a separate commit? Thanks.

Yes, this is needed by CTE.

I reused CTE Java engine tests for Native engine as well. The CTE tests use SQL with the right EXTRACT calls that depend on date columns being retained as DATE. All the native engine tests type-cast DATE columns to VARCHAR for both DWRF and Parquet in TPC-H schema.

aditi-pandit · 2024-07-09T21:30:11Z

@yingsu00 : Have addressed your review comments. PTAL.

jaystarshot · 2024-07-10T00:43:25Z

We should add a new row for presto c++ support in the documentation later here

steveburnett · 2024-07-10T15:26:13Z

We should add a new row for presto c++ support in the documentation later here

Good idea! Maybe mention it in the Presto C++ Features documentation.

yingsu00 · 2024-07-10T15:35:10Z

@aditi-pandit there're some test failures, have you checked them?
btw. I think it deserves a release note. Also, it'll be nice to add the documentation in this PR as well.

aditi-pandit · 2024-07-10T23:08:12Z

@yingsu00 @jaystarshot @steveburnett : So while this feature enables use of CTE, we support only DWRF and Parquet as file formats for the intermediate temporary tables created by it. I feel there is most potential with using pagefile format for these temp tables. We are working on the pagefile formats right now. So I preferred to add documentation once we have numbers for those.

wydt ? I'll add a RELEASE NOTE though.

aditi-pandit · 2024-07-11T00:49:46Z

@aditi-pandit there're some test failures, have you checked them? btw. I think it deserves a release note. Also, it'll be nice to add the documentation in this PR as well.

@yingsu00 : Those failures were on account of the Iceberg issue that is reverted now. I've rebased the build and the tests pass again.

yingsu00 · 2024-07-11T14:26:39Z

CTE materialization can be used with Prestissimo workers as well.
@aditi-pandit Release notes need to follow https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines . I think this one can start with "Add". How about the following modification?

Add CTE materialization support for Presto C++ clusters. It supports only the following storage and compression-codec options (ref https://prestodb.io/docs/0.286/admin/properties.html#cte-materialization-properties)
hive.temporary-table-storage-format = DWRF, PARQUET
hive.temporary-table-compression-codec = ZSTD, NONE

steveburnett · 2024-07-11T14:28:37Z

@yingsu00 @jaystarshot @steveburnett : So while this feature enables use of CTE, we support only DWRF and Parquet as file formats for the intermediate temporary tables created by it. I feel there is most potential with using pagefile format for these temp tables. We are working on the pagefile formats right now. So I preferred to add documentation once we have numbers for those.

wydt ? I'll add a RELEASE NOTE though.

I see your point as valid. Are you planning to add pagefile support in this PR, or open a new PR?

aditi-pandit · 2024-07-11T15:10:28Z

@yingsu00 @jaystarshot @steveburnett : So while this feature enables use of CTE, we support only DWRF and Parquet as file formats for the intermediate temporary tables created by it. I feel there is most potential with using pagefile format for these temp tables. We are working on the pagefile formats right now. So I preferred to add documentation once we have numbers for those.
wydt ? I'll add a RELEASE NOTE though.

I see your point as valid. Are you planning to add pagefile support in this PR, or open a new PR?

@steveburnett : It will be in a new PR.

aditi-pandit · 2024-07-11T15:12:41Z

CTE materialization can be used with Prestissimo workers as well.
@aditi-pandit Release notes need to follow https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines . I think this one can start with "Add". How about the following modification?

Add CTE materialization support for Presto C++ clusters. It supports only the following storage and compression-codec options (ref https://prestodb.io/docs/0.286/admin/properties.html#cte-materialization-properties) hive.temporary-table-storage-format = DWRF, PARQUET hive.temporary-table-compression-codec = ZSTD, NONE

@yingsu00 : Done. PTAL.

steveburnett · 2024-07-11T15:25:52Z

@steveburnett : It will be in a new PR.

Ideally I would suggest documenting the feature in the same PR, specifying the DWRF and Parquet file format in the doc, then in the new PR revising the doc to add pagefile support. But the separate PR soon should be okay. Thanks!

steveburnett · 2024-07-11T15:26:52Z

May I suggest a draft release note for consideration?

== RELEASE NOTE ==
Add CTE materialization for Prestissimo workers with the configuration properties `hive.temporary-table-storage-format` (`DWRF` or `PARQUET` only) and `hive.temporary-table-compression-codec` (`ZSTD` or `NONE` only). :pr:`22780`

aditi-pandit · 2024-07-11T17:57:44Z

May I suggest a draft release note for consideration?

== RELEASE NOTE ==
Add CTE materialization for Prestissimo workers with the configuration properties `hive.temporary-table-storage-format` (`DWRF` or `PARQUET` only) and `hive.temporary-table-compression-codec` (`ZSTD` or `NONE` only). :pr:`22780`

@steveburnett : Sounds good. Have updated the release notes.

yingsu00 · 2024-07-12T16:19:25Z

Add CTE materialization for Prestissimo workers with the configuration properties hive.temporary-table-storage-format (DWRF or PARQUET only) and hive.temporary-table-compression-codec (ZSTD or NONE only). :pr:22780

Looks good, just change Prestissimo to Presto C++....

aditi-pandit · 2024-07-12T16:38:51Z

Add CTE materialization for Prestissimo workers with the configuration properties hive.temporary-table-storage-format (DWRF or PARQUET only) and hive.temporary-table-compression-codec (ZSTD or NONE only). :pr:22780

Looks good, just change Prestissimo to Presto C++....

Done. Good idea to keep this consistent with all places.

aditi-pandit requested a review from a team as a code owner May 17, 2024 23:32

aditi-pandit marked this pull request as draft May 17, 2024 23:33

This was referenced May 17, 2024

[Do not review] Add support for Presto temporary tables SPI facebookincubator/velox#9844

Closed

[Native] CTE support in Prestissimo #22630

Open

aditi-pandit force-pushed the cte branch 2 times, most recently from 6abd059 to 98404f7 Compare May 17, 2024 23:43

aditi-pandit changed the title ~~[Native] Add Cte support~~ [Do not review][Native] Add Cte support May 17, 2024

aditi-pandit force-pushed the cte branch from 98404f7 to 61df4d3 Compare May 18, 2024 01:50

aditi-pandit force-pushed the cte branch 9 times, most recently from d352d27 to 2f3f268 Compare June 19, 2024 00:33

jaystarshot reviewed Jun 19, 2024

View reviewed changes

presto-native-execution/presto_cpp/main/types/PrestoToVeloxConnector.cpp Show resolved Hide resolved

aditi-pandit force-pushed the cte branch 3 times, most recently from 6177e1c to 5f96e90 Compare June 28, 2024 21:40

aditi-pandit changed the title ~~[Do not review][Native] Add Cte support~~ [Do not review][Native] Add missing plumbing for Cte support Jun 28, 2024

aditi-pandit changed the title ~~[Do not review][Native] Add missing plumbing for Cte support~~ [Native] Add missing plumbing for Cte support Jun 28, 2024

aditi-pandit marked this pull request as ready for review June 28, 2024 22:25

aditi-pandit requested a review from yingsu00 July 8, 2024 23:26

yingsu00 reviewed Jul 9, 2024

View reviewed changes

aditi-pandit force-pushed the cte branch from 5f96e90 to fbd0f29 Compare July 9, 2024 21:29

jaystarshot approved these changes Jul 10, 2024

View reviewed changes

[Native] Add missing plumbing for CTE support

7c847db

aditi-pandit force-pushed the cte branch from fbd0f29 to 7c847db Compare July 10, 2024 20:04

yingsu00 approved these changes Jul 12, 2024

View reviewed changes

aditi-pandit merged commit caa35d8 into master Jul 12, 2024
59 checks passed

aditi-pandit deleted the cte branch July 12, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Native] Add missing plumbing for Cte support #22780

[Native] Add missing plumbing for Cte support #22780

aditi-pandit commented May 17, 2024 •

edited

Loading

yingsu00 Jul 9, 2024 •

edited

Loading

yingsu00 Jul 9, 2024

aditi-pandit Jul 9, 2024

yingsu00 Jul 9, 2024

aditi-pandit Jul 9, 2024

yingsu00 Jul 9, 2024

aditi-pandit Jul 9, 2024

aditi-pandit commented Jul 9, 2024

jaystarshot commented Jul 10, 2024

steveburnett commented Jul 10, 2024

yingsu00 commented Jul 10, 2024

aditi-pandit commented Jul 10, 2024 •

edited

Loading

aditi-pandit commented Jul 11, 2024

yingsu00 commented Jul 11, 2024

steveburnett commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

steveburnett commented Jul 11, 2024

steveburnett commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

yingsu00 commented Jul 12, 2024

aditi-pandit commented Jul 12, 2024

[Native] Add missing plumbing for Cte support #22780

[Native] Add missing plumbing for Cte support #22780

Conversation

aditi-pandit commented May 17, 2024 • edited Loading

Description

Motivation and Context

Impact

Test Plan

yingsu00 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

yingsu00 Jul 9, 2024

Choose a reason for hiding this comment

aditi-pandit Jul 9, 2024

Choose a reason for hiding this comment

yingsu00 Jul 9, 2024

Choose a reason for hiding this comment

aditi-pandit Jul 9, 2024

Choose a reason for hiding this comment

yingsu00 Jul 9, 2024

Choose a reason for hiding this comment

aditi-pandit Jul 9, 2024

Choose a reason for hiding this comment

aditi-pandit commented Jul 9, 2024

jaystarshot commented Jul 10, 2024

steveburnett commented Jul 10, 2024

yingsu00 commented Jul 10, 2024

aditi-pandit commented Jul 10, 2024 • edited Loading

aditi-pandit commented Jul 11, 2024

yingsu00 commented Jul 11, 2024

steveburnett commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

steveburnett commented Jul 11, 2024

steveburnett commented Jul 11, 2024

aditi-pandit commented Jul 11, 2024

yingsu00 commented Jul 12, 2024

aditi-pandit commented Jul 12, 2024

aditi-pandit commented May 17, 2024 •

edited

Loading

yingsu00 Jul 9, 2024 •

edited

Loading

aditi-pandit commented Jul 10, 2024 •

edited

Loading