Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rare leadership transfer failures when writes happen during transfer #581

Merged
merged 7 commits into from
Dec 4, 2023

Commits on Nov 23, 2023

  1. Reproduce leadership transfer failures that can occur when writes are…

    … happening during the transfer. Fix the problem partially: it can still occur if we dispatchLogs after the transfer message has been sent to the target, before it wins the election.
    ncabatoff committed Nov 23, 2023
    Configuration menu
    Copy the full SHA
    e338309 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2023

  1. Fix some collateral damage I introduced. Also a bug in leaderTransfer…

    … changes: we weren't properly respecting stopCh.
    ncabatoff committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    a76a6f8 View commit details
    Browse the repository at this point in the history
  2. Attempt at a "fix": after we send the TimeoutNow, wait for up to Elec…

    …tionTimeout before resuming applies. Fix is in scare quotes because I'm not all sure this is acceptable.
    ncabatoff committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    8cecc28 View commit details
    Browse the repository at this point in the history
  3. Slow down writes a bit to make this more amenable to testing in paral…

    …lel, and improve test log output.
    ncabatoff committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    477cf7e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2b715ac View commit details
    Browse the repository at this point in the history
  5. A bit more cleanup, plus now we wait for the transfer to complete or …

    …fail before responding to the transfer request.
    ncabatoff committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    04fdca6 View commit details
    Browse the repository at this point in the history
  6. Make the test more robust. Part of this required modifying GetInState…

    …, which is racy in that in between calling pollState and setting up event monitoring, the cluster could've elected a leader. When that happens, Leader() errors returning 0 leaders.
    ncabatoff committed Nov 24, 2023
    Configuration menu
    Copy the full SHA
    cb62297 View commit details
    Browse the repository at this point in the history