Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report update error, if update was not applied to at least one active replica (#2976) #3013

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

ffuugoo
Copy link
Contributor

@ffuugoo ffuugoo commented Nov 14, 2023

There's currently a slight consistency "gap" in how we handle updates:

  • Partial nodes have to be able to accept and apply update requests (from the user and other nodes)
    • this is required for the correctness/consistency of shard transfers
  • there's a valid scenario, when all available replicas of a shard are in Partial state, and the last Active replica is practically dead
    • it can be trivially reproduced:
      • cluster of 3 nodes
      • collection with replication factor 2
      • initiate shard transfer from one replica to another
      • kill transfer-sender replica in the middle of shard transfer
      • you now have a single available Partial node, and the only Active node is dead
    • similar condition can happen in production
  • in this case, the user can send an update to the Partial node that will accept and apply the update
    • but because there's no available Active replicas, the Partial node won't be able to propagate this update to it
  • then, once Active replica is back online, it will initiate shard transfers to Partial nodes
    • and overwrite all user updates, that were accepted by the Partial nodes, while Active node was offline
    • the user data is lost and user is unhappy :(

This PR checks that update was successfully applied to at least one Active node and report an error to the user if not.

Resolves #2976. Currently based on #3012, will be rebased on dev once it's merged.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@ffuugoo ffuugoo force-pushed the cancel-shard-transfer-on-sync-local-state branch from aabbf29 to 5a09931 Compare November 21, 2023 10:27
@ffuugoo ffuugoo force-pushed the report-update-error-if-no-active-replica-updated branch from 7b68576 to 6353d05 Compare November 21, 2023 10:41
Base automatically changed from cancel-shard-transfer-on-sync-local-state to dev November 21, 2023 15:23
@ffuugoo ffuugoo force-pushed the report-update-error-if-no-active-replica-updated branch from 6353d05 to e6fcdb9 Compare November 21, 2023 15:25
@ffuugoo ffuugoo merged commit f2e5631 into dev Nov 21, 2023
17 checks passed
@ffuugoo ffuugoo deleted the report-update-error-if-no-active-replica-updated branch November 21, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants