Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate LZ4 and LZ0 decompression of Parquet to dwio::common #6105

Closed
yingsu00 opened this issue Aug 14, 2023 · 5 comments
Closed

Consolidate LZ4 and LZ0 decompression of Parquet to dwio::common #6105

yingsu00 opened this issue Aug 14, 2023 · 5 comments
Labels
enhancement New feature or request parquet

Comments

@yingsu00
Copy link
Collaborator

Description

#5914 changes Parquet to use the common PagedInputStream, but LZ4 and LZ0 are still specially treated. We need to consolidate them into dwio common and make them supported by the common decompressor.

@yingsu00 yingsu00 added enhancement New feature or request parquet labels Aug 14, 2023
@yingsu00
Copy link
Collaborator Author

cc @nmahadevuni @Yuhta

@galhai87
Copy link

please take note of the following:

Not sure if this deserves a separate issue, but arrow is shifting its defaults towards block compression and the use of LZ4_RAW codec. (what used to be the default was LZ4_FRAME, and is now being deprecated)

see:
https://github.com/apache/parquet-format/blob/master/Compression.md
https://issues.apache.org/jira/browse/PARQUET-1996

@yingsu00
Copy link
Collaborator Author

@galhai87 Thanks for the heads up. Created #6662

@yingsu00
Copy link
Collaborator Author

Ongoing PR for this issue: #6673

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Dec 1, 2023

CLosed by #6673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parquet
Projects
None yet
Development

No branches or pull requests

2 participants