Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive file format cleanup #353

Merged
merged 3 commits into from Mar 2, 2019

Conversation

3 participants
@electrum
Copy link
Member

electrum commented Mar 1, 2019

No description provided.

@cla-bot cla-bot bot added the cla-signed label Mar 1, 2019

@electrum electrum requested a review from dain Mar 1, 2019

@electrum electrum force-pushed the electrum:hiveformat branch from a0f39e3 to 394f387 Mar 1, 2019

@findepi
Copy link
Member

findepi left a comment

"Remove support for DWRF file format"

i reviewed until (not including) StripeReader.java. The following code is alien to me.

@findepi
Copy link
Member

findepi left a comment

"Remove legacy writers for ORC and RCFile"

  • we definitely need a compatibility toggle for Hive <2.3 (for ORC; i don't know about RC)
  • no more code to be removed? Like writer factories for legacy?

@electrum electrum force-pushed the electrum:hiveformat branch from 394f387 to e5d6363 Mar 1, 2019

@electrum

This comment has been minimized.

Copy link
Member Author

electrum commented Mar 2, 2019

The legacy writers use the shared RecordFileWriter. Your comment reminded me of a couple additional things:

  • createRcFileWriter in HiveWriteUtils
  • OptimizedLazyBinaryColumnarSerde usage in RecordFileWriter (we can remove this from the shaded Hive build)

For DWRF, there might be further simplifications, such as removal of no longer needed abstractions, that can be done in the future.

@electrum electrum force-pushed the electrum:hiveformat branch from e5d6363 to 886c73a Mar 2, 2019

@electrum

This comment has been minimized.

Copy link
Member Author

electrum commented Mar 2, 2019

I added another commit that allows writing compatible ORC files.

@dain

dain approved these changes Mar 2, 2019

Copy link
Member

dain left a comment

Looks good to me.

electrum added some commits Feb 28, 2019

Remove support for DWRF file format
This format is not used and has no public specification.
Remove legacy writers for ORC and RCFile
The optimized writers have been enabled by default for many releases.
Continuing to support the legacy writes makes updating to Hive 3.1
more difficult due to backwards incompatible changes in the Hive
serialization code.
Allow writing ORC files compatible with Hive 2.x
Hive 2.x versions before Hive 2.3 incorrectly handled the writer
version number and would fail if the version was not 0-4. This was
fixed by ORC-125. This commit adds a config property that lets the
writer pretend to be an older version of Hive.

@electrum electrum force-pushed the electrum:hiveformat branch from 5954359 to 8a98edf Mar 2, 2019

@electrum electrum merged commit 50a4720 into prestosql:master Mar 2, 2019

1 check passed

verification/cla-signed
Details

@electrum electrum deleted the electrum:hiveformat branch Mar 2, 2019

@electrum electrum referenced this pull request Mar 7, 2019

Closed

Release notes for 305 #342

4 of 6 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.