-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance leader election doc #77585
enhance leader election doc #77585
Conversation
/test pull-kubernetes-bazel-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyxning
this change seems to break unit tests, see here:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/77585/pull-kubernetes-bazel-test/1126051762252288000
/kind bug
90fc423
to
6ee0ce4
Compare
/test pull-kubernetes-integration |
@neolit123 @cheftako @jpbetz CI is green now. PTAL. |
6ee0ce4
to
3ec68da
Compare
the change makes sense, but i will defer to the client-go maintainers. |
// 2. Record obtained, check the Identity & Time | ||
if !reflect.DeepEqual(le.observedRecord, *oldLeaderElectionRecord) { | ||
le.observedRecord = *oldLeaderElectionRecord | ||
le.observedTime = le.clock.Now() | ||
} | ||
if len(oldLeaderElectionRecord.HolderIdentity) > 0 && | ||
le.observedTime.Add(le.config.LeaseDuration).After(now.Time) && | ||
!(firstStart && oldLeaderElectionRecord.RenewTime.Add(le.config.LeaseDuration).Before(now.Time)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time stamp is not meaningful if it was collected on another machine. The implementation of this client only acts on locally collected time stamps and cannot rely on the accuracy of time stamp in the record for correctness. The client needs to wait the full lease duration without observing a change to the record before it can attempt to take over. Start isn’t different. This is documented here:
kubernetes/staging/src/k8s.io/client-go/tools/leaderelection/leaderelection.go
Lines 22 to 24 in f5a1ceb
// leader (a.k.a. fencing). A client observes timestamps captured locally to | |
// infer the state of the leader election. Thus the implementation is tolerant | |
// to arbitrary clock skew, but is not tolerant to arbitrary clock skew rate. |
10 minute lease duration sounds too long. It’s defaulted to 15 seconds in core components:
LeaseDuration: metav1.Duration{Duration: 15 * time.Second}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedanese So it is designed by purpose not a bug, actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time stamp is not meaningful if it was collected on another machine. The implementation of this client only acts on locally collected time stamps and cannot rely on the accuracy of time stamp in the record for correctness. The client needs to wait the full lease duration without observing a change to the record before it can attempt to take over. Start isn’t different.
This absolutely needs to be added to the leader election package documentation, IMO. It is more cleaer in describing a actual scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the intended behavior:
kubernetes/staging/src/k8s.io/client-go/tools/leaderelection/leaderelection.go
Lines 105 to 108 in 3d12466
// LeaseDuration is the duration that non-leader candidates will | |
// wait to force acquire leadership. This is measured against time of | |
// last observed ack. | |
LeaseDuration time.Duration |
And "A client observes timestamps captured locally to infer the state of the leader election" are both documented but an example would definitely make this clearer. We should also document reasonable values for LeaseDruation, RenewDeadline and RetryPeriod.
An ack is any change of the record observed by the client. The RenewTime could change to the previous day, and it would still reset the LeaseDuration countdown for non-leader clients waiting to become leader.
For a starting process, the countdown should start at the time that the initial record is read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedanese In a distributed system, time coherence in different nodes are hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedanese I have updated the PR title and content to add more doc about this behavior. PTAL.
3ec68da
to
49f60c8
Compare
87b3d05
to
7f1ddb0
Compare
7f1ddb0
to
bebb548
Compare
/test pull-kubernetes-e2e-gce-100-performance |
@@ -22,6 +22,19 @@ limitations under the License. | |||
// leader (a.k.a. fencing). A client observes timestamps captured locally to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe break the first sentence of this paragraph into the previous paragraph so that discussion on the finer points of timing is cleanly separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
// timestamps and cannot rely on the accuracy of timestamp in the record for | ||
// correctness. | ||
// | ||
// A client needs to wait a full LeaseDuration without observing a change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably better placed in the LeaseDuration field documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
bebb548
to
c680306
Compare
@mikedanese Done. PTAL. |
/test pull-kubernetes-kubemark-e2e-gce-big |
1 similar comment
/test pull-kubernetes-kubemark-e2e-gce-big |
/retest |
c680306
to
95f33ce
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyxning, mikedanese The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…n_start enhance leader election doc Kubernetes-commit: c2847e8
What type of PR is this?
/kind client-go
What this PR does / why we need it:
When we use leader election pkg under two operator app instances with lease id being Host-A and Host-B, i.e., hostname for lease id. If Host-A is the leader and Host-B is always trying to acquire the lease. After some time of running, we both stop operator app instances running on Host-A and Host-B. After ten minutes later which is longer than lease duration, we then first start operator app instances running on Host-B. But, it can not acquire the lease in the first lease duration time since it is first acquired by an instance running on Host-A.
This will trigger a problem as there is some time that no operator instances can run.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: