Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry evict leader when upgrading TiKV if needed (#5613) #5616

Merged

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #5613

What problem does this PR solve?

close #5614

In some cases when upgrading TiKV, the evict-leader-scheduler may not be added or missing in PD, but the evictLeaderBeginTime annotation of the TiKV pod is added. As TiDB Operator will not call PD API to add evict-leader-scheduler again, then the upgrade operation is blocked.

In this PR, we check whether the evict-leader-scheduler exists if a timeout (10m) is reached, and try to add evict-leader-again if it's missing.

What is changed and how does it work?

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
    1. hack TiDB Operator code to remove the controller.GetPDClient(u.deps.PDControl, tc).BeginEvictLeader(storeID) call in beginEvictLeader
    2. upgrade TiKV
    3. check whether the upgrade operation is blocked (wait for evictition to complete), and check the evictLeaderBeginTime annotation existing
    4. revert the hack in step.1 and re-deploy TiDB-Operator
    5. wait about 10m
    6. check evictLeaderBeginTime is updated to the new time
    7. check upgrade operation is continue and completed later
  • No code

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.


Copy link
Contributor

ti-chi-bot bot commented Apr 15, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign grovecai for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@csuzhangxc
Copy link
Member

/run-all-tests

@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 47.94%. Comparing base (d032996) to head (3357f6e).
Report is 2 commits behind head on release-1.5.

Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.5    #5616       +/-   ##
================================================
- Coverage        61.53%   47.94%   -13.60%     
================================================
  Files              229      221        -8     
  Lines            29331    29858      +527     
================================================
- Hits             18050    14315     -3735     
- Misses            9499    13839     +4340     
+ Partials          1782     1704       -78     
Flag Coverage Δ
e2e 47.94% <0.00%> (?)
unittest ?

@csuzhangxc csuzhangxc merged commit c2fea5f into pingcap:release-1.5 Apr 16, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants