Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry evict leader when upgrading TiKV if needed #5613

Merged
merged 2 commits into from
Apr 15, 2024

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Apr 15, 2024

What problem does this PR solve?

close #5614

In some cases when upgrading TiKV, the evict-leader-scheduler may not be added or missing in PD, but the evictLeaderBeginTime annotation of the TiKV pod is added. As TiDB Operator will not call PD API to add evict-leader-scheduler again, then the upgrade operation is blocked.

In this PR, we check whether the evict-leader-scheduler exists if a timeout (10m) is reached, and try to add evict-leader-again if it's missing.

What is changed and how does it work?

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
    1. hack TiDB Operator code to remove the controller.GetPDClient(u.deps.PDControl, tc).BeginEvictLeader(storeID) call in beginEvictLeader
    2. upgrade TiKV
    3. check whether the upgrade operation is blocked (wait for evictition to complete), and check the evictLeaderBeginTime annotation existing
    4. revert the hack in step.1 and re-deploy TiDB-Operator
    5. wait about 10m
    6. check evictLeaderBeginTime is updated to the new time
    7. check upgrade operation is continue and completed later
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.


@csuzhangxc
Copy link
Member Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot added the size/S label Apr 15, 2024
@ti-chi-bot ti-chi-bot bot requested a review from howardlau1999 April 15, 2024 04:06
@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 24.37%. Comparing base (72bccb5) to head (432a9a2).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #5613       +/-   ##
===========================================
- Coverage   61.49%   24.37%   -37.12%     
===========================================
  Files         235      219       -16     
  Lines       30337    30208      -129     
===========================================
- Hits        18655     7364    -11291     
- Misses       9813    21818    +12005     
+ Partials     1869     1026      -843     
Flag Coverage Δ
e2e 24.37% <0.00%> (?)
unittest ?

@csuzhangxc
Copy link
Member Author

/run-pull-e2e-kind-across-kubernetes

Copy link
Contributor

@ideascf ideascf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

ti-chi-bot bot commented Apr 15, 2024

@ideascf: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

LGTM

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

ti-chi-bot bot commented Apr 15, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ideascf
Once this PR has been reviewed and has the lgtm label, please ask for approval from csuzhangxc, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@csuzhangxc
Copy link
Member Author

/run-pull-e2e-kind-across-kubernetes

@csuzhangxc csuzhangxc merged commit 5631a83 into pingcap:master Apr 15, 2024
6 of 7 checks passed
@csuzhangxc csuzhangxc deleted the re-evict branch April 15, 2024 08:28
@csuzhangxc
Copy link
Member Author

/cherry-pick release-1.5

@ti-chi-bot
Copy link
Member

@csuzhangxc: new pull request created to branch release-1.5: #5616.

In response to this:

/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

csuzhangxc added a commit that referenced this pull request Apr 16, 2024
Co-authored-by: csuzhangxc <csuzhangxc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tikv upgrade blocked
4 participants