Upload partial segments #6530

petuhovskiy · 2024-01-30T13:51:04Z

Add support for backing up partial segments to remote storage. Disabled by default, can be enabled with --partial-backup-enabled.

Safekeeper timeline has a background task which is subscribed to commit_lsn and flush_lsn updates. After the partial segment was updated (flush_lsn was changed), the segment will be uploaded to S3 in about 15 minutes.

The filename format for partial segments is Segment_Term_Flush_Commit_skNN.partial, where:

Segment – the segment name, like 000000010000000000000001
Term – current term
Flush – flush_lsn in hex format {:016X}, e.g. 00000000346BC568
Commit – commit_lsn in the same hex format
NN – safekeeper_id, like 1

The full object name example: 000000010000000000000002_2_0000000002534868_0000000002534410_sk1.partial

Each safekeeper will keep info about remote partial segments in its control file. Code updates state in the control file before doing any S3 operations. This way control file stores information about all potentially existing remote partial segments and can clean them up after uploading a newer version.

Closes #6336

github-actions · 2024-01-30T14:07:49Z

2772 tests run: 2628 passed, 0 failed, 144 skipped (full report)

Flaky tests (2)

Postgres 16

test_compute_pageserver_connection_stress: debug

Postgres 14

test_get_tenant_size_with_multiple_branches: debug

Code coverage* (full report)

functions: 27.9% (6387 of 22856 functions)
lines: 46.8% (44997 of 96051 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
803ca68 at 2024-04-03T15:35:11.778Z :recycle:}

petuhovskiy · 2024-01-30T20:49:50Z

This PR is not yet polished, but I want to have a pre-review for the main upload algo (the important part) before finishing the PR. In other words, the following steps:

await the review for this draft version (the important parts)
fix the feedback, polish, undraft
quick review, merge, deploy to staging

So, waiting for review @arssher :)
(not urgent)

arssher · 2024-02-09T07:25:28Z

Hmm. During yesterday call I assumed that we always remove the previous version before uploading the next one (ever have max 1 uploaded); now looking at the code I see this is not the case. In this case, names clash (different file contents under same file name) is indeed not good, i.e. gc looks unsafe wrt to it. While it is fixable it is safer/simpler to just always have unique names by including term + flush_lsn there.

Having multiple segments offloaded is a bit more complicated but it gives kinda atomicity, i.e. we are never in the state where we don't have partial offloaded at all (if it ever has been offloaded), which seems to be good. Thought note that upload up to flush_lsn is anyway something not reliable: 1) it might be from wrong term history 2) in theory there might be multiple not committed segments.

petuhovskiy · 2024-02-19T12:07:55Z

In this case, names clash (different file contents under same file name) is indeed not good, i.e. gc looks unsafe wrt to it. While it is fixable it is safer/simpler to just always have unique names by including term + flush_lsn there.

It is simple, but the filename becomes quite long, I'm afraid to hit S3 filename limit eventually. WDYT?

Thought note that upload up to flush_lsn is anyway something not reliable: 1) it might be from wrong term history

Is this any bad? For inactive timelines partial segment in S3 will match partial segment on disk eventually, even for uncommitted WAL.

in theory there might be multiple not committed segments.

True, but the number of such timelines should be very close to zero. I'd not fix this in this PR, and if it becomes an issue this can be fixed later.

Other than that, any more ideas for the first version of "partial segments upload"?

arssher · 2024-02-19T14:19:37Z

It is simple, but the filename becomes quite long, I'm afraid to hit S3 filename limit eventually. WDYT?

Should be ok, limit is 1024 in both s3 and azure blob storage. 2^64 in decimal is 10^19, enough space. So I'd at least have term + flush_lsn there, and if we reupload on commit_lsn change include commit_lsn as well.

Is this any bad? For inactive timelines partial segment in S3 will match partial segment on disk eventually, even for uncommitted WAL.

No, it's expected indeed.

True, but the number of such timelines should be very close to zero. I'd not fix this in this PR, and if it becomes an issue this can be fixed later.

Yeah, definitely not very important. We also may simply hold off partial upload if full is not done yet.

Other than that, any more ideas for the first version of "partial segments upload"?

Approach LGTM

petuhovskiy · 2024-03-13T23:01:55Z

After looking at S3 lifecycle rules (https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html), I realized that it might be useful to distinct full WAL segments from partial segments.

I think I'm going to add a special tag to partial segments, so we could select them in a lifecycle rule.

petuhovskiy · 2024-03-19T13:17:42Z

Ok, ready for review now.

Note that control file upgrade is not forward compatible, i.e. we cannot rollback safekeepers after deploy. Also, I tried to write a test for this PR, but faced problems with S3 inspection from python tests. Decided to test this on staging instead.

safekeeper/src/wal_backup_partial.rs

arssher · 2024-03-25T14:55:23Z

LGTM. Let's put PR description to wal_backup_partial.rs head comment.

jcsp

Approved for the remote_storage/ part

petuhovskiy · 2024-03-25T17:52:18Z

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

bayandin · 2024-03-25T18:00:14Z

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

There's no good way to do so for now. Is is possible not to break forward compatibility, maybe? 😄
Handling it via label makes adding this label to all other PRs necessary until we release these changes, so we can miss some other unexpected breaking changes.

I'll think about what we can do here.

jcsp · 2024-03-26T11:44:41Z

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

The general answer for forward compatibility is to merge+release a change that knows how to read the new format, and then later merge+release a change that enables writing it -- this is what we do for pageserver stuff.

petuhovskiy · 2024-03-26T18:38:05Z

A bit of context: control file is the file where safekeepers store all the data about timelines (all but WAL itself). Due to historic reasons, this is a binary file which is serialized directly from struct TimelinePersistentState. It means that any change to this structure (and dependants) will require an upgrade.

The issue here is that control files in safekeepers have always worked that way, and upgrade was always breaking forward compatibility. It seems that the last control file upgrade was 18 months ago, so we never had this issue before because test_forward_compatibility was probably created after that...

I dislike that currently control file upgrade requires a breaking change, but I see only two fixes for this:

merge+release two times for the single change, and write 2x more code for handling control file rollback
overhaul control files completely and use key-value serialization format for control files (e.g. protobuf, json...)

Both options are not simple and I don't think I want to work on them right now. Also it seems fine to allow forward incompatibility for safekeepers – in case of something goes wrong issues can always be fixed without rollback. Unfortunately I see that there's no good way to omit test_forward_compatibility in our current CI.

I'm not really sure what should be done next here..

arssher · 2024-03-26T19:11:57Z

The last time in similar situation I just manually added exclusion to test_compatibility code.

petuhovskiy · 2024-03-27T13:01:08Z

@bayandin is it ok if I add @pytest.mark.skip to test_forward_compatibility for a week?

bayandin · 2024-03-27T13:53:58Z

@bayandin is it ok if I add @pytest.mark.skip to test_forward_compatibility for a week?

If we do this, we won't catch any forward compatibility failures from other PRs.
Let me come up with a better solution for this.

petuhovskiy · 2024-04-03T13:46:49Z

Added @pytest.mark.xfail to test_forward_compatibility to unblock merge. Will remove it after the next release.

Found these logs on staging safekeepers: ``` INFO Partial backup{ttid=X/Y}: failed to upload 000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial: Failed to open file "/storage/safekeeper/data/X/Y/000000010000000000000000.partial" for wal backup: No such file or directory (os error 2) INFO Partial backup{ttid=X/Y}:upload{name=000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial}: starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial", commit_lsn: 0/0, flush_lsn: 0/0, term: 173 } ``` This is because partial backup tries to upload zero segment when there is no data in timeline. This PR fixes this bug introduced in #6530.

It was disabled due to #6530 breaking forward compatiblity. Now that we have deployed it to production, we can reenable the test

petuhovskiy force-pushed the sk-upload-partial-segments branch from 572274b to 7421064 Compare January 30, 2024 20:40

petuhovskiy requested a review from arssher January 30, 2024 20:49

petuhovskiy force-pushed the sk-upload-partial-segments branch 3 times, most recently from b38a2b7 to d0e928e Compare March 18, 2024 21:43

petuhovskiy added the forward compatibility breakage Broken forward compatibility. The previous version can't handle new data. It'll be hard to roll-back label Mar 19, 2024

petuhovskiy force-pushed the sk-upload-partial-segments branch from 1e89776 to c40b754 Compare March 19, 2024 13:01

petuhovskiy marked this pull request as ready for review March 19, 2024 13:17

petuhovskiy requested review from a team as code owners March 19, 2024 13:17

petuhovskiy requested a review from VladLazar March 19, 2024 13:17

petuhovskiy changed the title ~~WIP: Upload partial segments~~ Upload partial segments Mar 19, 2024

arssher approved these changes Mar 25, 2024

View reviewed changes

safekeeper/src/wal_backup_partial.rs Show resolved Hide resolved

safekeeper/src/wal_backup_partial.rs Show resolved Hide resolved

petuhovskiy force-pushed the sk-upload-partial-segments branch from c40b754 to 37145cc Compare March 25, 2024 17:10

jcsp approved these changes Mar 25, 2024

View reviewed changes

petuhovskiy added 7 commits April 3, 2024 13:42

WIP Upload partial segments

3924850

Update remote partial name

72b1a83

Fix self-review

b345d86

Add metrics

d075ae3

Tag S3 objects

6d5cabe

Fix review comments

06806b9

Disable test_forward_compatibility temporary

803ca68

petuhovskiy force-pushed the sk-upload-partial-segments branch from 37145cc to 803ca68 Compare April 3, 2024 13:44

petuhovskiy enabled auto-merge (squash) April 3, 2024 13:45

petuhovskiy merged commit 3f77f26 into main Apr 3, 2024
53 checks passed

petuhovskiy deleted the sk-upload-partial-segments branch April 3, 2024 15:20

petuhovskiy mentioned this pull request Apr 4, 2024

Fix zero segment upload #7318

Merged

petuhovskiy mentioned this pull request Apr 11, 2024

Reenable test_forward_compatibility #7358

Merged

petuhovskiy added a commit that referenced this pull request Apr 11, 2024

Reenable test_forward_compatibility (#7358)

db72543

It was disabled due to #6530 breaking forward compatiblity. Now that we have deployed it to production, we can reenable the test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload partial segments #6530

Upload partial segments #6530

petuhovskiy commented Jan 30, 2024 •

edited

github-actions bot commented Jan 30, 2024 •

edited

Postgres 16

Postgres 14

petuhovskiy commented Jan 30, 2024

arssher commented Feb 9, 2024 •

edited

petuhovskiy commented Feb 19, 2024

arssher commented Feb 19, 2024

petuhovskiy commented Mar 13, 2024

petuhovskiy commented Mar 19, 2024

arssher commented Mar 25, 2024

jcsp left a comment

petuhovskiy commented Mar 25, 2024

bayandin commented Mar 25, 2024

jcsp commented Mar 26, 2024

petuhovskiy commented Mar 26, 2024

arssher commented Mar 26, 2024

petuhovskiy commented Mar 27, 2024

bayandin commented Mar 27, 2024

petuhovskiy commented Apr 3, 2024

Upload partial segments #6530

Upload partial segments #6530

Conversation

petuhovskiy commented Jan 30, 2024 • edited

github-actions bot commented Jan 30, 2024 • edited

2772 tests run: 2628 passed, 0 failed, 144 skipped (full report)

Postgres 16

Postgres 14

Code coverage* (full report)

petuhovskiy commented Jan 30, 2024

arssher commented Feb 9, 2024 • edited

petuhovskiy commented Feb 19, 2024

arssher commented Feb 19, 2024

petuhovskiy commented Mar 13, 2024

petuhovskiy commented Mar 19, 2024

arssher commented Mar 25, 2024

jcsp left a comment

Choose a reason for hiding this comment

petuhovskiy commented Mar 25, 2024

bayandin commented Mar 25, 2024

jcsp commented Mar 26, 2024

petuhovskiy commented Mar 26, 2024

arssher commented Mar 26, 2024

petuhovskiy commented Mar 27, 2024

bayandin commented Mar 27, 2024

petuhovskiy commented Apr 3, 2024

petuhovskiy commented Jan 30, 2024 •

edited

github-actions bot commented Jan 30, 2024 •

edited

arssher commented Feb 9, 2024 •

edited