-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicate new shard to match replication factor in resharding driver #4381
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
timvisee
commented
Jun 3, 2024
Comment on lines
491
to
509
/// Await for a resharding shard trnasfer to succeed. | ||
/// | ||
/// Yields on a successfull transfer. | ||
/// | ||
/// Returns an error if: | ||
/// - the transfer failed or got aborted | ||
/// - the transfer timed out | ||
/// - no matching transfer is ongoing; it never started or went missing without a notification | ||
/// | ||
/// Yields on a succesful transfer. Returns an error if an error occurred or if the global timeout | ||
/// is reached. | ||
async fn await_transfer_success( | ||
reshard_key: &ReshardKey, | ||
transfer: &ShardTransfer, | ||
shard_holder: &Arc<LockedShardHolder>, | ||
collection_id: &CollectionId, | ||
consensus: &dyn ShardTransferConsensus, | ||
await_transfer_end: impl Future<Output = CollectionResult<Result<(), ()>>>, | ||
) -> CollectionResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this and it's usage may seem a bit complex, there's three important things it does:
- we subscribe to shard transfer changes before we actually start transfers, to ensure we don't miss notifications
- it enforces a timeout, we don't wait longer than 24 hours for a transfer to complete
- it does a periodic sanity check to ensure the transfer is still active in consensus, if it isn't we assume it has failed (and panic in debug mode because this cannot happen)
This was referenced Jun 3, 2024
timvisee
force-pushed
the
reshard-driver-persist-state
branch
from
June 4, 2024 09:13
f8d80d3
to
06fd713
Compare
timvisee
force-pushed
the
reshard-driver-replicate
branch
from
June 4, 2024 09:13
e9a74b7
to
e8f0585
Compare
timvisee
force-pushed
the
reshard-driver-replicate
branch
from
June 4, 2024 09:36
e8f0585
to
fe4a054
Compare
ffuugoo
approved these changes
Jun 7, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Sorry for taking so long to review this. 🙈
timvisee
added a commit
that referenced
this pull request
Jun 11, 2024
…4381) * Add replication stage for resharding * Use consistent naming * Simplify awaiting shard transfer success, add periodic sanity check * Fix typos
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracked in: #4213
Depends on: #4379
Implements the third stage in the resharding driver. It keeps replicating our new shard until we match the desired replication factor. This is dynamic and will follow the latest configured replication factor, even while it changes during this process. If we don't have enough peers to allow that number of replicas it continues early.
This does not implement the following yet, marked as TODOs:
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?