Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Atomic Blob Writes for HDFS Repository #37066

Merged

Conversation

original-brownbear
Copy link
Member

* Implement atomic writes the same way we do for the FsBlobContainer via rename which is atomic
* Relates elastic#37011
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

while ((bytesRead = inputStream.read(buffer)) != -1) {
stream.write(buffer, 0, bytesRead);
}
stream.hsync();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a little faster than the approach in writeBlob here and just hsync once when done writing. I think we can likely make the same change in writeBlob too because partial files don't really hold any value so no need to sync more at the end of a write than once for a blob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #37069 to speed up writes using the same strategy for non-atomic writes.

@original-brownbear original-brownbear removed the request for review from ywelsch January 2, 2019 12:59
@original-brownbear
Copy link
Member Author

Fails tests for some reason, looking into it

@@ -116,6 +118,30 @@ public void writeBlob(String blobName, InputStream inputStream, long blobSize, b
});
}

@Override
public void writeBlobAtomic(String blobName, InputStream inputStream, long blobSize, boolean failIfAlreadyExists) throws IOException {
final String tempBlob = FsBlobContainer.tempBlobName(blobName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this method up to the BlobContainer interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not I think. This method is kinda specific to the two file-system based implementations. Not sure it makes sense blowing up the interface for that?

while ((bytesRead = inputStream.read(buffer)) != -1) {
stream.write(buffer, 0, bytesRead);
}
stream.hsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only syncs the current block. To have syncs across blocks, you will also need to specify CreateFlag.SYNC_BLOCK when creating the file. I think the easier solution (after tracing through DFSOutputStream) is to just use the CreateFlag.SYNC_BLOCK flag and never perform a manual hsync here. Upon closing the stream, ``CreateFlag.SYNC_BLOCK` will also make sure that the endBlock will be synced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, you're right ... should've read the second paragraph in the docs of that method :)
Your solution is better will adjust.

@original-brownbear
Copy link
Member Author

@ywelsch fixed the sync method :)

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

Jenkins run gradle build tests 1

@original-brownbear original-brownbear merged commit 10d9819 into elastic:master Jan 3, 2019
@original-brownbear original-brownbear deleted the atomic-hdfs-blob-writes branch January 3, 2019 14:51
original-brownbear added a commit that referenced this pull request Jan 4, 2019
* Implement atomic writes the same way we do for the FsBlobContainer via rename which is atomic
* Relates #37011
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 6, 2019
* With elastic#37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates elastic#37011
original-brownbear added a commit that referenced this pull request Jan 7, 2019
* With #37066 introducing atomic writes to HDFS repository we can enforce atomic write capabilities on this interface
* The overrides on the other three cloud implementations are ok because:
   * https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html states that "Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket."
   * https://cloud.google.com/storage/docs/consistency states that GCS has strong read-after-write consistency
   * https://docs.microsoft.com/en-us/rest/api/storageservices/put-block#remarks Azure has the concept of committing blobs, so there's no partial content here either
* Relates #37011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants