-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: ignore old leader's request to revoke lease #12531
Conversation
649990f
to
219e828
Compare
219e828
to
133ba89
Compare
Fix #12528 (old leader still revokes lease after it steped to follower if cpu or disk io latency is high) @jpbetz @gyuho
|
133ba89
to
f392f1a
Compare
updated. thanks. @gyuho semaphores ci is failed, It seems that the failed test case is not related to this PR. my local e2e test is ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. We've verified this PR and it takes effect.
I noticed this PR is here for almost 1 month... Are there any more reviewers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
PRs that are not green on the list of PRs have lower chances to get attention.
I know its due to test flakiness, but still recommend running 'git commit --amend; git push' to trigger the re-test. -
It's hard here, but it would be good to have a test-case (at least a unit one) that shows this scenario.
-
I'm new to this code, but it seems unnatural that the decision to expire lease is not just propagated by RAFT... Alternatively the decision to revoke lease might contains term and its only executed (accepted by current leader in ) if the senders term matches the current term. For my education: were this option considered and refused for some reasons ?
Nit: 4. Please squash the commits.
6b84b7b
to
714c68c
Compare
Codecov Report
@@ Coverage Diff @@
## master #12531 +/- ##
===========================================
- Coverage 70.75% 57.80% -12.95%
===========================================
Files 429 426 -3
Lines 34111 34020 -91
===========================================
- Hits 24135 19666 -4469
- Misses 8076 12542 +4466
+ Partials 1900 1812 -88
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@tangcong can you push again to re-trigger travis ci? |
714c68c
to
61a170d
Compare
Procfile.gofail file
reproduce.sh
reproduce the issue, verify the pr
@ptabor it is hard to add integration and unit test case, I provide a simple script to inject disk-io delay failure into the leader, it can reproduce the issue verify the pr correctness. |
Adding it into the functional test should be good enough (translating your shell script into a Go program basically). |
From a user's perspective, it would be better to solve this issue first and then put a unit-test later in an independent PR. |
@xiang90 FAILPOINT TEST CASE is not enabled by default, I guess it may be too flaky, so my new FAILPOINT test case is also disabled. I'm not sure if etcd release team ran FAILPOINT TEST CASE in other environments when they released the new version of etcd. There was a bug( fixed by #12898) in it before and it seemed to be unable to run. |
147050f
to
1499344
Compare
1499344
to
110e0a0
Compare
110e0a0
to
9d40c6e
Compare
If the raft routine also blocks for some time, the isLeader might still return true while a new leader is already elected? I think it is safter to:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
No description provided.