New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload partial segments #6530
Upload partial segments #6530
Conversation
2772 tests run: 2628 passed, 0 failed, 144 skipped (full report)Flaky tests (2)Postgres 16
Postgres 14
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
803ca68 at 2024-04-03T15:35:11.778Z :recycle: |
572274b
to
7421064
Compare
This PR is not yet polished, but I want to have a pre-review for the main upload algo (the important part) before finishing the PR. In other words, the following steps:
So, waiting for review @arssher :) |
Hmm. During yesterday call I assumed that we always remove the previous version before uploading the next one (ever have max 1 uploaded); now looking at the code I see this is not the case. In this case, names clash (different file contents under same file name) is indeed not good, i.e. gc looks unsafe wrt to it. While it is fixable it is safer/simpler to just always have unique names by including term + flush_lsn there. Having multiple segments offloaded is a bit more complicated but it gives kinda atomicity, i.e. we are never in the state where we don't have partial offloaded at all (if it ever has been offloaded), which seems to be good. Thought note that upload up to flush_lsn is anyway something not reliable: 1) it might be from wrong term history 2) in theory there might be multiple not committed segments. |
It is simple, but the filename becomes quite long, I'm afraid to hit S3 filename limit eventually. WDYT?
Is this any bad? For inactive timelines partial segment in S3 will match partial segment on disk eventually, even for uncommitted WAL.
True, but the number of such timelines should be very close to zero. I'd not fix this in this PR, and if it becomes an issue this can be fixed later. Other than that, any more ideas for the first version of "partial segments upload"? |
Should be ok, limit is 1024 in both s3 and azure blob storage. 2^64 in decimal is 10^19, enough space. So I'd at least have term + flush_lsn there, and if we reupload on commit_lsn change include commit_lsn as well.
No, it's expected indeed.
Yeah, definitely not very important. We also may simply hold off partial upload if full is not done yet.
Approach LGTM |
After looking at S3 lifecycle rules (https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html), I realized that it might be useful to distinct full WAL segments from partial segments. I think I'm going to add a special tag to partial segments, so we could select them in a lifecycle rule. |
b38a2b7
to
d0e928e
Compare
1e89776
to
c40b754
Compare
Ok, ready for review now. Note that control file upgrade is not forward compatible, i.e. we cannot rollback safekeepers after deploy. Also, I tried to write a test for this PR, but faced problems with S3 inspection from python tests. Decided to test this on staging instead. |
LGTM. Let's put PR description to wal_backup_partial.rs head comment. |
c40b754
to
37145cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for the remote_storage/ part
@bayandin do you know how to make |
There's no good way to do so for now. Is is possible not to break forward compatibility, maybe? 😄 I'll think about what we can do here. |
The general answer for forward compatibility is to merge+release a change that knows how to read the new format, and then later merge+release a change that enables writing it -- this is what we do for pageserver stuff. |
A bit of context: control file is the file where safekeepers store all the data about timelines (all but WAL itself). Due to historic reasons, this is a binary file which is serialized directly from The issue here is that control files in safekeepers have always worked that way, and upgrade was always breaking forward compatibility. It seems that the last control file upgrade was 18 months ago, so we never had this issue before because I dislike that currently control file upgrade requires a breaking change, but I see only two fixes for this:
Both options are not simple and I don't think I want to work on them right now. Also it seems fine to allow forward incompatibility for safekeepers – in case of something goes wrong issues can always be fixed without rollback. Unfortunately I see that there's no good way to omit I'm not really sure what should be done next here.. |
The last time in similar situation I just manually added exclusion to test_compatibility code. |
@bayandin is it ok if I add |
If we do this, we won't catch any forward compatibility failures from other PRs. |
37145cc
to
803ca68
Compare
Added |
Found these logs on staging safekeepers: ``` INFO Partial backup{ttid=X/Y}: failed to upload 000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial: Failed to open file "/storage/safekeeper/data/X/Y/000000010000000000000000.partial" for wal backup: No such file or directory (os error 2) INFO Partial backup{ttid=X/Y}:upload{name=000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial}: starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial", commit_lsn: 0/0, flush_lsn: 0/0, term: 173 } ``` This is because partial backup tries to upload zero segment when there is no data in timeline. This PR fixes this bug introduced in #6530.
It was disabled due to #6530 breaking forward compatiblity. Now that we have deployed it to production, we can reenable the test
Add support for backing up partial segments to remote storage. Disabled by default, can be enabled with
--partial-backup-enabled
.Safekeeper timeline has a background task which is subscribed to
commit_lsn
andflush_lsn
updates. After the partial segment was updated (flush_lsn
was changed), the segment will be uploaded to S3 in about 15 minutes.The filename format for partial segments is
Segment_Term_Flush_Commit_skNN.partial
, where:Segment
– the segment name, like000000010000000000000001
Term
– current termFlush
– flush_lsn in hex format{:016X}
, e.g.00000000346BC568
Commit
– commit_lsn in the same hex formatNN
– safekeeper_id, like1
The full object name example:
000000010000000000000002_2_0000000002534868_0000000002534410_sk1.partial
Each safekeeper will keep info about remote partial segments in its control file. Code updates state in the control file before doing any S3 operations. This way control file stores information about all potentially existing remote partial segments and can clean them up after uploading a newer version.
Closes #6336