Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3FileSystem Write support #6021

Closed

Conversation

majetideepak
Copy link
Collaborator

@majetideepak majetideepak commented Aug 7, 2023

S3WriteFile uses the Apache Arrow implementation as a reference.
AWS C++ SDK allows streaming writes via the MultiPart upload API.
Multipart upload allows you to upload a single object as a set of parts.
Each part is a contiguous portion of the object's data.
While AWS and Minio support different sizes for each
part (only requiring a minimum of 5MB), Certain object stores require that every
part be exactly equal (except for the last part). We set this to 10 MiB, so
that in combination with the maximum number of parts of 10,000, this gives a
file limit of 100k MiB (or about 98 GiB).
You can upload these object parts independently and in any order.
After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object.
S3WriteFile is not thread-safe.
UploadPart is currently synchronous during append. Flush is no-op as append
handles all the uploads.

Resolves: #4805

@netlify
Copy link

netlify bot commented Aug 7, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 29769d5
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/650463001592f00008daf269

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2023
@majetideepak
Copy link
Collaborator Author

majetideepak commented Aug 7, 2023

CC: @akashsha1, @paul-amonson, @tigrux for early feedback on the design.
I need to run some performance tests to wrap up this. I will share them.

request.SetBucket(awsString(bucket_));
request.SetKey(awsString(key_));
auto objectMetadata = client_->HeadObject(request);
VELOX_CHECK(!objectMetadata.IsSuccess(), "File already exists");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this error message be "S3 object already exists"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Fix.

This was referenced Aug 28, 2023
@majetideepak majetideepak force-pushed the support-s3-write branch 2 times, most recently from 97f2109 to 5b14e7e Compare September 12, 2023 02:58
@majetideepak majetideepak marked this pull request as ready for review September 12, 2023 02:59
Copy link
Contributor

@tigrux tigrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean code.

Copy link
Contributor

@paul-amonson paul-amonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

VELOX_CHECK(!closed_, "File is closed");
// 'flush' API should trigger uploadPart.
// But upload part if the maximum part size is reached.
if (currentPartSize_ + data.size() > kMaxPartSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be >=

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>= should work according to the documentation. Will do that.

@majetideepak majetideepak force-pushed the support-s3-write branch 2 times, most recently from 8978da9 to 7d8716e Compare September 14, 2023 22:02
@majetideepak
Copy link
Collaborator Author

majetideepak commented Sep 14, 2023

@tigrux @akashsha1 @paul-amonson I did some tests at scale and had to fix the semantics a bit.
I observed that the query would fail after writing a 5GiB file. The issue was the difference
between 5 GiB (code limit) vs. 5 GB (AWS limit) upper part limit.
This made me realize that the current TableWriter uses only append() and never a flush() call.
I verified this via logging the number of parts in the close() call.
So each part would have been 5GB big and was being buffered in memory before upload which is bad.
I looked at the latest Arrow S3FS implementation and saw that they use a fixed-size part 10MiB to upload.
I changed the semantics to match this. This made more sense as the write memory is now fixed.
The multi-part upload API also does not match the append() + flush() APIs as it has the min and max part limitations.
Making the append() manage the uploads and flush() a no-op matches the multi-part upload semantics well.
I added some comments to reflect this change.

@majetideepak
Copy link
Collaborator Author

With the new semantics, I verified that we can now write files larger than 5GiB.

@majetideepak
Copy link
Collaborator Author

I also looked at the gcs::ObjectWriteStream documentation and it does something similar.
https://cloud.google.com/cpp/docs/reference/storage/latest/classgoogle_1_1cloud_1_1storage_1_1ObjectWriteStream

@majetideepak
Copy link
Collaborator Author

@pedroerp, @kgpai can you please help merge this? Thanks.

@majetideepak
Copy link
Collaborator Author

The linux-build failure is unrelated.

@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in dc1c0c7.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit dc1c0c7c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
S3WriteFile uses the Apache Arrow implementation as a reference.
AWS C++ SDK allows streaming writes via the MultiPart upload API.
Multipart upload allows you to upload a single object as a set of parts.
Each part is a contiguous portion of the object's data.
While AWS and Minio support different sizes for each
part (only requiring a minimum of 5MB), Certain object stores require that every
part be exactly equal (except for the last part). We set this to 10 MiB, so
that in combination with the maximum number of parts of 10,000, this gives a
file limit of 100k MiB (or about 98 GiB).
You can upload these object parts independently and in any order.
After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object.
S3WriteFile is not thread-safe.
UploadPart is currently synchronous during append. Flush is no-op as append
handles all the uploads.

Resolves: facebookincubator#4805

Pull Request resolved: facebookincubator#6021

Reviewed By: kgpai

Differential Revision: D49324662

Pulled By: pedroerp

fbshipit-source-id: f26479058f576a63f7d4fe4527b57bd0aa87ab30
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
S3WriteFile uses the Apache Arrow implementation as a reference.
AWS C++ SDK allows streaming writes via the MultiPart upload API.
Multipart upload allows you to upload a single object as a set of parts.
Each part is a contiguous portion of the object's data.
While AWS and Minio support different sizes for each
part (only requiring a minimum of 5MB), Certain object stores require that every
part be exactly equal (except for the last part). We set this to 10 MiB, so
that in combination with the maximum number of parts of 10,000, this gives a
file limit of 100k MiB (or about 98 GiB).
You can upload these object parts independently and in any order.
After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object.
S3WriteFile is not thread-safe.
UploadPart is currently synchronous during append. Flush is no-op as append
handles all the uploads.

Resolves: facebookincubator#4805

Pull Request resolved: facebookincubator#6021

Reviewed By: kgpai

Differential Revision: D49324662

Pulled By: pedroerp

fbshipit-source-id: f26479058f576a63f7d4fe4527b57bd0aa87ab30
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
S3WriteFile uses the Apache Arrow implementation as a reference.
AWS C++ SDK allows streaming writes via the MultiPart upload API.
Multipart upload allows you to upload a single object as a set of parts.
Each part is a contiguous portion of the object's data.
While AWS and Minio support different sizes for each
part (only requiring a minimum of 5MB), Certain object stores require that every
part be exactly equal (except for the last part). We set this to 10 MiB, so
that in combination with the maximum number of parts of 10,000, this gives a
file limit of 100k MiB (or about 98 GiB).
You can upload these object parts independently and in any order.
After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object.
S3WriteFile is not thread-safe.
UploadPart is currently synchronous during append. Flush is no-op as append
handles all the uploads.

Resolves: facebookincubator#4805

Pull Request resolved: facebookincubator#6021

Reviewed By: kgpai

Differential Revision: D49324662

Pulled By: pedroerp

fbshipit-source-id: f26479058f576a63f7d4fe4527b57bd0aa87ab30
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
S3WriteFile uses the Apache Arrow implementation as a reference.
AWS C++ SDK allows streaming writes via the MultiPart upload API.
Multipart upload allows you to upload a single object as a set of parts.
Each part is a contiguous portion of the object's data.
While AWS and Minio support different sizes for each
part (only requiring a minimum of 5MB), Certain object stores require that every
part be exactly equal (except for the last part). We set this to 10 MiB, so
that in combination with the maximum number of parts of 10,000, this gives a
file limit of 100k MiB (or about 98 GiB).
You can upload these object parts independently and in any order.
After all parts of your object are uploaded, Amazon S3 assembles these parts
and creates the object.
S3WriteFile is not thread-safe.
UploadPart is currently synchronous during append. Flush is no-op as append
handles all the uploads.

Resolves: facebookincubator#4805

Pull Request resolved: facebookincubator#6021

Reviewed By: kgpai

Differential Revision: D49324662

Pulled By: pedroerp

fbshipit-source-id: f26479058f576a63f7d4fe4527b57bd0aa87ab30
@majetideepak majetideepak deleted the support-s3-write branch November 8, 2023 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add write support in S3FileSystem
5 participants