Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard diff transfer integration #3509

Merged
merged 29 commits into from
Feb 23, 2024
Merged

Shard diff transfer integration #3509

merged 29 commits into from
Feb 23, 2024

Conversation

timvisee
Copy link
Member

@timvisee timvisee commented Feb 1, 2024

Tracked in: #3477
Depends on: #3551, #3571

This adds a new WalDelta shard transfer method to allow shard diff transfers. This variant remains unused for now and is hidden by default.

It adds a transfer_wal_delta function to actually drive such transfer from a source node.

The method currently utilizes a queue proxy shard because it already has all the necessary bits and bots implemented for transferring a WAL including all new incoming updates.

Tasks

  • Resolve all TODOs
  • Prevent physical WAL from being truncated if we diff, even if it has already been truncated of the logical WAL
    We can set a max_ack version now, but this only limits the logical WAL. The physical WAL probably contains important data too which we should not truncate off while we still need it for transferring the diff.
  • Merge Allow using queue proxy on WAL in the past #3551
  • Merge Resolve WAL delta for shard diff transfer #3571
  • Rebase this on dev
  • Target this PR to merge into dev
  • Undraft this PR

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

@timvisee timvisee marked this pull request as draft February 1, 2024 14:41
@timvisee timvisee changed the base branch from dev to queue-proxy-history February 6, 2024 16:15
Base automatically changed from queue-proxy-history to dev February 12, 2024 11:33
@timvisee timvisee changed the base branch from dev to resolve-wal-delta February 12, 2024 11:43
@timvisee timvisee force-pushed the shard-diff-transfer-stubs branch 2 times, most recently from 24223f5 to 001a456 Compare February 12, 2024 12:25
Base automatically changed from resolve-wal-delta to dev February 12, 2024 14:29
@timvisee timvisee force-pushed the shard-diff-transfer-stubs branch 2 times, most recently from 2c315a7 to ea3caf3 Compare February 13, 2024 10:16
@timvisee timvisee changed the title Stubs for driving shard diff transfer Shard diff transfer integration Feb 16, 2024
@timvisee
Copy link
Member Author

I'll merge this with WAL delta transfers disabled by default. I've confirmed this to work with 1.7 and 1.8 nodes in a single cluster.

For now, a user could explicitly invoke a diff transfer by setting "method": "wal_delta".

The plan is to make automatic WAL delta transfers happen in the 1.8 release, but there are some state problems to resolve. I'd prefer to merge this first so we have something to work with. Automatic WAL delta transfers will be worked on separately.

@timvisee timvisee merged commit 44fa95f into dev Feb 23, 2024
17 checks passed
@timvisee timvisee deleted the shard-diff-transfer-stubs branch February 23, 2024 16:32
timvisee added a commit that referenced this pull request Mar 5, 2024
* Add first stubs for WAL delta shard transfer method

* Repurpose queue proxy, use it for transferring WAL diff as well

* Integrate WAL delta transfer is transfer selection logic

* Add WalDelta shard transfer type which is not exposed in public API

* Basic implementation of falling back to stream records transfer

* Share await_consensus_sync function

* Ask remote shard for recovery point

* During WAL delta transfer, resolve shard diff locally for recovery point

* Rebase on latest dev, support empty WAL diff

* Rebase on latest dev, support empty WAL diff

* Use partial snapshot state for WAL delta transfer

* Set cutoff point on remote shard after shard WAL delta transfer

* Set cutoff point on remote shard after stream records transfer

* During WAL delta transfer, set s tate from partial snapshot to partial

* Describe WAL delta transfer in a comment

* Allow updating cutoff point in stream records transfer to fail

* Do not set cutoff point on remote shard on WAL delta transfer

* Make await consensus sync logic easier to read and reason about

* Fix fallback to other shard transfer method on WAL delta transfer fail

* Various minor improvements

* Add TODO for just ignoring API unimplemented errors

* Only allow stream records cutoff point error if remote is older version

* Allow switching to partial to fail when falling back

* Only change shard state to partial if not in partial state already

* Change default shard transfer method back to stream records

* Add important TODO back

* Add WAL delta shard transfer method in gRPC

* Prefer configured shard transfer method as default

* Extract shard transfer fallback logic into separate function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants