New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-22481: Fix deletion of file on formatter failure #213
Conversation
This requires a force flag and allows a test to change formatter multiple times, over-riding the configuration values and trying different options.
Removing it can cause problems for concurrent processes trying to ingest and already believing the directory has been created. It is safer to let the directory persist even if a problem occurred with the put. This also removes the confusion of a log message stating on error that the directory can not be removed since it contains files.
dd07c55
to
37a8a5f
Compare
@@ -238,6 +238,11 @@ def put(self, inMemoryDataset, ref): | |||
Filename=tmpFile.name) | |||
log.debug("Wrote file to %s via a temporary directory.", location.uri) | |||
|
|||
# Register a callback to try to delete the uploaded data if | |||
# the ingest fails below | |||
self._transaction.registerUndo("write", self.client.delete_object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DinoBektesevic I've added this to the S3Datastore. Whether this is in the right place or not depends on whether upload_file or put_object are themselves transactional. If they fail and the object is effectively not in S3 then this is the right place. If they can partially work then instead of registerUndo we will have to do what we do in posixDatastore (use undoWith, catch the exception, reraise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work. S3 API docs claim never to add partial files which is what boto3 eventually calls for both object and file uploads.
In my Jira comment, I was actually concerned about the upload_file calls in the _ExtractIngestInfo
functionality. That bit, in S3Datastore, has no undo transactions protections, but definitely should. However, on move actions it would require me to delete the files in the original location (whether local disk or an different S3 Bucket). For S3 Bucket then it becomes difficult to confirm file actually got deleted because the delete_object
calls perpetually return HTML 204 response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think my line does make things better and won't trigger any problems in the ingest line because transfer=None. I can leave the transaction handling for ingest
to another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Only one minor comment.
This code was being used a lot and the method is generally useful and will simplify datastore.getUri
It now works out the Formatter and asks the Formatter for the final name that will be used.
The file removal was only previously effective if the file was written correctly and an exception triggered later on. Now a failure to write the file does correctly register the removal of partial writes. Includes tests to ensure that files are removed on failure.
If a formatter fails early there might not be a file to cleanup. Currently the callback can not distinguish between a failure that happens before the file was written and a failure that happens after the file was written.
95c805b
to
9091bb2
Compare
Previously if the Formatter wrote a partial file that file would not be cleaned up.