Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-22481: Fix deletion of file on formatter failure #213

Merged
merged 9 commits into from Dec 10, 2019
Merged

Conversation

timj
Copy link
Member

@timj timj commented Dec 10, 2019

Previously if the Formatter wrote a partial file that file would not be cleaned up.

This requires a force flag and allows a test to change
formatter multiple times, over-riding the configuration
values and trying different options.
Removing it can cause problems for concurrent processes trying
to ingest and already believing the directory has been created.
It is safer to let the directory persist even if a problem
occurred with the put.

This also removes the confusion of a log message stating on
error that the directory can not be removed since it contains
files.
@@ -238,6 +238,11 @@ def put(self, inMemoryDataset, ref):
Filename=tmpFile.name)
log.debug("Wrote file to %s via a temporary directory.", location.uri)

# Register a callback to try to delete the uploaded data if
# the ingest fails below
self._transaction.registerUndo("write", self.client.delete_object,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DinoBektesevic I've added this to the S3Datastore. Whether this is in the right place or not depends on whether upload_file or put_object are themselves transactional. If they fail and the object is effectively not in S3 then this is the right place. If they can partially work then instead of registerUndo we will have to do what we do in posixDatastore (use undoWith, catch the exception, reraise).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work. S3 API docs claim never to add partial files which is what boto3 eventually calls for both object and file uploads.

In my Jira comment, I was actually concerned about the upload_file calls in the _ExtractIngestInfo functionality. That bit, in S3Datastore, has no undo transactions protections, but definitely should. However, on move actions it would require me to delete the files in the original location (whether local disk or an different S3 Bucket). For S3 Bucket then it becomes difficult to confirm file actually got deleted because the delete_object calls perpetually return HTML 204 response.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think my line does make things better and won't trigger any problems in the ingest line because transfer=None. I can leave the transaction handling for ingest to another time.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Only one minor comment.

python/lsst/daf/butler/core/formatter.py Outdated Show resolved Hide resolved
This code was being used a lot and the method is generally
useful and will simplify datastore.getUri
It now works out the Formatter and asks the Formatter for the
final name that will be used.
The file removal was only previously effective if the file
was written correctly and an exception triggered later on.
Now a failure to write the file does correctly register
the removal of partial writes.

Includes tests to ensure that files are removed on failure.
If a formatter fails early there might not be a file to cleanup.
Currently the callback can not distinguish between a failure
that happens before the file was written and a failure that
happens after the file was written.
@timj timj merged commit fb95e6f into master Dec 10, 2019
@timj timj deleted the tickets/DM-22481 branch December 10, 2019 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants