Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant stat call when writing Parquet files #1746

Merged
merged 1 commit into from Jan 4, 2020

Conversation

electrum
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Oct 14, 2019
if (!abort) {
length = target.getFileSystem(conf).getFileStatus(target).getLen();
}
length = fileWriter.getPos();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop if (!abort) { ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we were getting the size by doing a stat call, which wouldn't work after an abort, since the file would have been deleted. But now we just ask the writer how much it wrote, which works in any case. (it's probably not useful in the case of an abort, but it doesn't hurt anything, and makes the code more consistent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] but it doesn't hurt anything

can this depend in impl of the writer?
can a writer still do some non O(1) call underneath?
(when aborting, you need to be defensive, especially that this code path is less frequent and less tested)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is ParquetRecordWriterUtil I would assume we only have one writer implementation to care about, so we just check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the check because it doesn't hurt. Hopefully this will be gone soon, once we merge the native Parquet writer code.

@electrum electrum requested review from findepi and dain December 6, 2019 01:06
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer to retain the "if not aborting" condition.
i dont feel strongly about this

if (!abort) {
length = target.getFileSystem(conf).getFileStatus(target).getLen();
}
length = fileWriter.getPos();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is ParquetRecordWriterUtil I would assume we only have one writer implementation to care about, so we just check.

@electrum electrum closed this Jan 4, 2020
@electrum electrum deleted the parquet-size branch January 4, 2020 08:13
@electrum electrum merged commit e6d2447 into trinodb:master Jan 4, 2020
@electrum electrum mentioned this pull request Jan 8, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants