This repository has been archived by the owner on Jul 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 102
storage: refactor UploadWriter and implements part size inflation #600
Open
kennytm
wants to merge
8
commits into
pingcap:master
Choose a base branch
from
kennytm:progressively-increase-uploader-capacity
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
storage: refactor UploadWriter and implements part size inflation #600
kennytm
wants to merge
8
commits into
pingcap:master
from
kennytm:progressively-increase-uploader-capacity
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kennytm
force-pushed
the
progressively-increase-uploader-capacity
branch
from
November 17, 2020 02:15
302454a
to
e4b8b93
Compare
/run-all-tests |
overvenus
approved these changes
Dec 1, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@kennytm: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@kennytm please resolve the conflicts |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
It was previously found that Dumpling cannot upload files larger than 50 GB to S3. This is because we used multi-part upload to S3 with each part being 5 MB, but AWS S3 only allows up to 10,000 parts, so data beyond 50 GB will fail with "Part number must be an integer between 1 and 10000, inclusive".
What is changed and how it works?
Here we implement "part size inflation" to exponentially increase the size of each part as we write more data. Every part is larger than the previous part by 0.0654% (configurable). With small data, the part size is very close to the optimal size of 5 MB, but later ones will gradually increase, and the exponential increase ensures that after the 10,000th part the inflation reaches 688 × 5 MB and we can serve a total file size up to 5 TB, the maximum size allowed by S3.
In this PR we also refactored the UploadWriter so that the part size can be accurately controlled:
noCompressionBuffer
is merged entirely intosimpleCompressBuffer
by a no-op compress writer.uploadChunk
is now controlled by the size of compressed buffer rather than data input, so every part is accurately 5 MiB on S3 (this also reduces number of parts).NewUploadWriter
are collected into a struct since we are going to have too many arguments.Check List
Tests
Code changes
NewUploadWriter
's signature is entirely changed.Side effects
Related changes
Release Note