Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload partial segments #6530

Merged
merged 7 commits into from Apr 3, 2024
Merged

Upload partial segments #6530

merged 7 commits into from Apr 3, 2024

Conversation

petuhovskiy
Copy link
Member

@petuhovskiy petuhovskiy commented Jan 30, 2024

Add support for backing up partial segments to remote storage. Disabled by default, can be enabled with --partial-backup-enabled.

Safekeeper timeline has a background task which is subscribed to commit_lsn and flush_lsn updates. After the partial segment was updated (flush_lsn was changed), the segment will be uploaded to S3 in about 15 minutes.

The filename format for partial segments is Segment_Term_Flush_Commit_skNN.partial, where:

  • Segment – the segment name, like 000000010000000000000001
  • Term – current term
  • Flush – flush_lsn in hex format {:016X}, e.g. 00000000346BC568
  • Commit – commit_lsn in the same hex format
  • NN – safekeeper_id, like 1

The full object name example: 000000010000000000000002_2_0000000002534868_0000000002534410_sk1.partial

Each safekeeper will keep info about remote partial segments in its control file. Code updates state in the control file before doing any S3 operations. This way control file stores information about all potentially existing remote partial segments and can clean them up after uploading a newer version.

Closes #6336

Copy link

github-actions bot commented Jan 30, 2024

2772 tests run: 2628 passed, 0 failed, 144 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_compute_pageserver_connection_stress: debug

Postgres 14

  • test_get_tenant_size_with_multiple_branches: debug

Code coverage* (full report)

  • functions: 27.9% (6387 of 22856 functions)
  • lines: 46.8% (44997 of 96051 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
803ca68 at 2024-04-03T15:35:11.778Z :recycle:

@petuhovskiy
Copy link
Member Author

This PR is not yet polished, but I want to have a pre-review for the main upload algo (the important part) before finishing the PR. In other words, the following steps:

  1. await the review for this draft version (the important parts)
  2. fix the feedback, polish, undraft
  3. quick review, merge, deploy to staging

So, waiting for review @arssher :)
(not urgent)

@arssher
Copy link
Contributor

arssher commented Feb 9, 2024

Hmm. During yesterday call I assumed that we always remove the previous version before uploading the next one (ever have max 1 uploaded); now looking at the code I see this is not the case. In this case, names clash (different file contents under same file name) is indeed not good, i.e. gc looks unsafe wrt to it. While it is fixable it is safer/simpler to just always have unique names by including term + flush_lsn there.

Having multiple segments offloaded is a bit more complicated but it gives kinda atomicity, i.e. we are never in the state where we don't have partial offloaded at all (if it ever has been offloaded), which seems to be good. Thought note that upload up to flush_lsn is anyway something not reliable: 1) it might be from wrong term history 2) in theory there might be multiple not committed segments.

@petuhovskiy
Copy link
Member Author

In this case, names clash (different file contents under same file name) is indeed not good, i.e. gc looks unsafe wrt to it. While it is fixable it is safer/simpler to just always have unique names by including term + flush_lsn there.

It is simple, but the filename becomes quite long, I'm afraid to hit S3 filename limit eventually. WDYT?

Thought note that upload up to flush_lsn is anyway something not reliable: 1) it might be from wrong term history

Is this any bad? For inactive timelines partial segment in S3 will match partial segment on disk eventually, even for uncommitted WAL.

  1. in theory there might be multiple not committed segments.

True, but the number of such timelines should be very close to zero. I'd not fix this in this PR, and if it becomes an issue this can be fixed later.

Other than that, any more ideas for the first version of "partial segments upload"?

@arssher
Copy link
Contributor

arssher commented Feb 19, 2024

It is simple, but the filename becomes quite long, I'm afraid to hit S3 filename limit eventually. WDYT?

Should be ok, limit is 1024 in both s3 and azure blob storage. 2^64 in decimal is 10^19, enough space. So I'd at least have term + flush_lsn there, and if we reupload on commit_lsn change include commit_lsn as well.

Is this any bad? For inactive timelines partial segment in S3 will match partial segment on disk eventually, even for uncommitted WAL.

No, it's expected indeed.

True, but the number of such timelines should be very close to zero. I'd not fix this in this PR, and if it becomes an issue this can be fixed later.

Yeah, definitely not very important. We also may simply hold off partial upload if full is not done yet.

Other than that, any more ideas for the first version of "partial segments upload"?

Approach LGTM

@petuhovskiy
Copy link
Member Author

After looking at S3 lifecycle rules (https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html), I realized that it might be useful to distinct full WAL segments from partial segments.

I think I'm going to add a special tag to partial segments, so we could select them in a lifecycle rule.

@petuhovskiy petuhovskiy force-pushed the sk-upload-partial-segments branch 3 times, most recently from b38a2b7 to d0e928e Compare March 18, 2024 21:43
@petuhovskiy petuhovskiy added the forward compatibility breakage Broken forward compatibility. The previous version can't handle new data. It'll be hard to roll-back label Mar 19, 2024
@petuhovskiy
Copy link
Member Author

Ok, ready for review now.

Note that control file upgrade is not forward compatible, i.e. we cannot rollback safekeepers after deploy. Also, I tried to write a test for this PR, but faced problems with S3 inspection from python tests. Decided to test this on staging instead.

@petuhovskiy petuhovskiy marked this pull request as ready for review March 19, 2024 13:17
@petuhovskiy petuhovskiy requested review from a team as code owners March 19, 2024 13:17
@petuhovskiy petuhovskiy changed the title WIP: Upload partial segments Upload partial segments Mar 19, 2024
safekeeper/src/wal_backup_partial.rs Show resolved Hide resolved
safekeeper/src/wal_backup_partial.rs Show resolved Hide resolved
@arssher
Copy link
Contributor

arssher commented Mar 25, 2024

LGTM. Let's put PR description to wal_backup_partial.rs head comment.

Copy link
Contributor

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for the remote_storage/ part

@petuhovskiy
Copy link
Member Author

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

@bayandin
Copy link
Member

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

There's no good way to do so for now. Is is possible not to break forward compatibility, maybe? 😄
Handling it via label makes adding this label to all other PRs necessary until we release these changes, so we can miss some other unexpected breaking changes.

I'll think about what we can do here.

@jcsp
Copy link
Contributor

jcsp commented Mar 26, 2024

@bayandin do you know how to make test_forward_compatibility pass? The breakage in forward compatibility is expected

The general answer for forward compatibility is to merge+release a change that knows how to read the new format, and then later merge+release a change that enables writing it -- this is what we do for pageserver stuff.

@petuhovskiy
Copy link
Member Author

A bit of context: control file is the file where safekeepers store all the data about timelines (all but WAL itself). Due to historic reasons, this is a binary file which is serialized directly from struct TimelinePersistentState. It means that any change to this structure (and dependants) will require an upgrade.

The issue here is that control files in safekeepers have always worked that way, and upgrade was always breaking forward compatibility. It seems that the last control file upgrade was 18 months ago, so we never had this issue before because test_forward_compatibility was probably created after that...

I dislike that currently control file upgrade requires a breaking change, but I see only two fixes for this:

  1. merge+release two times for the single change, and write 2x more code for handling control file rollback
  2. overhaul control files completely and use key-value serialization format for control files (e.g. protobuf, json...)

Both options are not simple and I don't think I want to work on them right now. Also it seems fine to allow forward incompatibility for safekeepers – in case of something goes wrong issues can always be fixed without rollback. Unfortunately I see that there's no good way to omit test_forward_compatibility in our current CI.

I'm not really sure what should be done next here..

@arssher
Copy link
Contributor

arssher commented Mar 26, 2024

The last time in similar situation I just manually added exclusion to test_compatibility code.

@petuhovskiy
Copy link
Member Author

@bayandin is it ok if I add @pytest.mark.skip to test_forward_compatibility for a week?

@bayandin
Copy link
Member

@bayandin is it ok if I add @pytest.mark.skip to test_forward_compatibility for a week?

If we do this, we won't catch any forward compatibility failures from other PRs.
Let me come up with a better solution for this.

@petuhovskiy petuhovskiy enabled auto-merge (squash) April 3, 2024 13:45
@petuhovskiy
Copy link
Member Author

Added @pytest.mark.xfail to test_forward_compatibility to unblock merge. Will remove it after the next release.

@petuhovskiy petuhovskiy merged commit 3f77f26 into main Apr 3, 2024
53 checks passed
@petuhovskiy petuhovskiy deleted the sk-upload-partial-segments branch April 3, 2024 15:20
petuhovskiy added a commit that referenced this pull request Apr 5, 2024
Found these logs on staging safekeepers:
```
INFO Partial backup{ttid=X/Y}: failed to upload 000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial: Failed to open file "/storage/safekeeper/data/X/Y/000000010000000000000000.partial" for wal backup: No such file or directory (os error 2)
INFO Partial backup{ttid=X/Y}:upload{name=000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial}: starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial", commit_lsn: 0/0, flush_lsn: 0/0, term: 173 }
```

This is because partial backup tries to upload zero segment when there
is no data in timeline. This PR fixes this bug introduced in #6530.
petuhovskiy added a commit that referenced this pull request Apr 11, 2024
It was disabled due to #6530
breaking forward compatiblity.
Now that we have deployed it to production, we can reenable the test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forward compatibility breakage Broken forward compatibility. The previous version can't handle new data. It'll be hard to roll-back
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upload partial WAL segments to S3
4 participants