New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid redundant stat call when writing Parquet files #1746
Conversation
if (!abort) { | ||
length = target.getFileSystem(conf).getFileStatus(target).getLen(); | ||
} | ||
length = fileWriter.getPos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why drop if (!abort) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we were getting the size by doing a stat call, which wouldn't work after an abort, since the file would have been deleted. But now we just ask the writer how much it wrote, which works in any case. (it's probably not useful in the case of an abort, but it doesn't hurt anything, and makes the code more consistent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] but it doesn't hurt anything
can this depend in impl of the writer?
can a writer still do some non O(1) call underneath?
(when aborting, you need to be defensive, especially that this code path is less frequent and less tested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is ParquetRecordWriterUtil
I would assume we only have one writer implementation to care about, so we just check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave the check because it doesn't hurt. Hopefully this will be gone soon, once we merge the native Parquet writer code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer to retain the "if not aborting" condition.
i dont feel strongly about this
if (!abort) { | ||
length = target.getFileSystem(conf).getFileStatus(target).getLen(); | ||
} | ||
length = fileWriter.getPos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is ParquetRecordWriterUtil
I would assume we only have one writer implementation to care about, so we just check.
No description provided.