Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport "Fix archival upload data corruption" #3293 #3309

Merged
merged 3 commits into from
Dec 20, 2021

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Dec 17, 2021

Cover letter

Backport PR #3293

The bug manifested as a truncated segment in S3. This happens only if
the cloud_storage_segment_max_upload_interval_sec option is enabled.

This is what causes the bug.  The code reads the size of the segment at the
begining and uses it to set the initial content-length value. Then it
performs an asynchronous operation. During this asynchronous operation
the size of the segment might change but the content-length value will
stay the same.

After that we're getting the segment size once again and here we can have
a larger value. This value is evenutally used to compute the size of the tail
region of the segment which shouldn't be uploaded (file size minus the
position where the upload should stop). Because the code uses the new
file size here it subtracts larger value which results in truncation
of the segment.

The fix is to just read the size of the segment once.

(cherry picked from commit 3a25251)
When the cloud_storage_segment_max_upload_interval_sec option is used
archival subsystem tries to read from segment to find the right
locations of the begining and the end of the segment. This reads didn't
use the right io_priority_class that the rest of the archival uses. This
commit fixes this by propagating the right io_priority_class.

(cherry picked from commit 5285353)
If transform_stream throws we're handling an exception correctly by
closing the streams and propagating the exception. But if it returns an
error we might just create an incorrect upload.

This commit changes this behavior by throwing exception in case of any
error.

(cherry picked from commit e12105b)
@Lazin Lazin merged commit 205b126 into redpanda-data:v21.11.x Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants