New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option not to delete target directory on failure #167

Merged
merged 2 commits into from Feb 6, 2019

Conversation

4 participants
@electrum
Copy link
Member

electrum commented Feb 6, 2019

When a metastore operation fails, especially due to timeout, we do not know
whether it ultimately completes. Relying on the apparent failure to perform
cleanup operations is thus unsafe.

Cleaning up new target directories can cause data loss when Presto writes
to existing partitions and replaces them in the metastore. This occurs when
the metastore call times out on the Presto side, but eventually completes,
and the metastore call to undo the operation fails. This results in a
metastore partition pointing to the now-deleted staging directory.

This can be avoided by deferring cleanup of staging directories to an
external system that works in concert with the metastore to determine which
directories are actually referenced by partition locations.

Extracted-From: https://github.com/prestodb/presto

electrum and others added some commits Feb 6, 2019

Add option not to delete target directory on failure
When a metastore operation fails, especially due to timeout, we do
not know whether it ultimately completes. Relying on the apparent
failure to perform cleanup operations is thus unsafe.

Cleaning up new target directories can cause data loss when Presto
writes to existing partitions and replaces them in the metastore.
This occurs when the metastore call times out on the Presto side, but
eventually completes, and the metastore call to undo the operation
fails. This results in a metastore partition pointing to the
now-deleted staging directory.

This can be avoided by deferring cleanup of staging directories to an
external system that works in concert with the metastore to determine
which directories are actually referenced by partition locations.

Extracted-From: https://github.com/prestodb/presto
@cla-bot

This comment has been minimized.

Copy link

cla-bot bot commented Feb 6, 2019

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@prestosql.io. If you are contributing on behalf of someone else (e.g., your employer), the individual CLA may not be sufficient and your employer may need the Corporate CLA signed.

@dain

dain approved these changes Feb 6, 2019

@electrum electrum merged commit 0a543f6 into prestosql:master Feb 6, 2019

1 check failed

verification/cla-signed
Details

@electrum electrum deleted the electrum:staging branch Feb 6, 2019

@findepi

This comment has been minimized.

Copy link
Member

findepi commented Feb 6, 2019

Thanks @electrum for putting up a nice description for this change. Finally i understand what is the value that it brings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment