Skip to content

Conversation

@wraymo
Copy link
Contributor

@wraymo wraymo commented Mar 19, 2025

Description

This PR adds ZSTD compression support for Hive Parquet.

Motivation and Context

Impact

Test Plan

  • Add hive.compression-codec=zstd to the hive catalog file
  • Start the coordinator and worker
  • Run the following queries
CREATE TABLE orders_2 ( id INT, items ARRAY<ROW(name VARCHAR, quantity INTEGER, price DOUBLE)> ) with (format='PARQUET');
INSERT INTO orders_2(id, items) VALUES (1, ARRAY[ROW('Apple', 10, 1.99), ROW('Banana', 5, 0.99)])

Before the change, an error ZSTD compression is not supported with PARQUET was reported. Now there's no error and when using parquet-tools to inspect the file, we can see compression: ZSTD
Also verified it with presto-native-execution

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@wraymo wraymo requested a review from a team as a code owner March 19, 2025 22:49
@wraymo wraymo requested a review from presto-oss March 19, 2025 22:49
@wraymo wraymo changed the title add Hive Parquet ZSTD compression support Add Hive Parquet ZSTD compression support Mar 19, 2025
@steveburnett
Copy link
Contributor

hive.compression-codec is mentioned in Hive Configuration Properties in the documentation and GZIP is shown as the default.

Suggest adding a mention of ZSTD as an option for hive.compression-codec to the doc. If you have a better idea of where than this table, let me know. Thanks!

@wraymo
Copy link
Contributor Author

wraymo commented Mar 29, 2025

hive.compression-codec is mentioned in Hive Configuration Properties in the documentation and GZIP is shown as the default.

Suggest adding a mention of ZSTD as an option for hive.compression-codec to the doc. If you have a better idea of where than this table, let me know. Thanks!

@steveburnett Thanks for the comment. I think we can add Possible values are NONE, SNAPPY, LZ4, ZSTD, or GZIP. for that, similar to https://trino.io/docs/current/connector/hive.html

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks for adding the doc!

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you please

  1. Add ZSTD to the benchmarks com/facebook/presto/hive/benchmark/HiveFileFormatBenchmark.java and put the result in the PR message
  2. Add compression codecs to test com/facebook/presto/hive/TestHivePageSink.java

@steveburnett
Copy link
Contributor

@wraymo, when you have time would you address @yingsu00's comment?

@wraymo
Copy link
Contributor Author

wraymo commented Apr 10, 2025

@steveburnett @yingsu00 Thanks for the comment! Will add it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants