-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use actual etcd client for /healthz/etcd checks #65027
Conversation
@liggitt: GitHub didn't allow me to request PR reviews from the following users: gyuho. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
c.HealthzChecks = append(c.HealthzChecks, healthz.NamedCheck("etcd", func(r *http.Request) error { | ||
done, err := preflight.EtcdConnection{ServerList: s.StorageConfig.ServerList}.CheckEtcdServers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the checkEtcdServers func used by anything else after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like it's used as a gate when starting cmd/kube-apiserver... the etcd endpoint must be able to establish a tcp connection, but that's it. if we want to remove that pre-check, and delete the package entirely, I'd like to do it in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. thanks.
f0bda45
to
825742c
Compare
tested this with both etcd2 and etcd3. when the etcd server was unavailable, /healthz/etcd returned failure messages without hanging. when the etcd server became available again, the /healthz/etcd endpoint recovered and started returning success again |
|
||
clientValue := &atomic.Value{} | ||
|
||
clientErrMsg := &atomic.Value{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this for? I think If wait.PollUntil is not blocking, this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think If wait.PollUntil is not blocking, this is useful.
good catch, in my tests, etcd was always available as the apiserver was coming up. backgrounded the client init loop as I intended to originally
// constructing the etcd v3 client blocks and times out if etcd is not available. | ||
// retry in a loop until we successfully create the client, storing the client or error encountered | ||
|
||
clientValue := &atomic.Value{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
825742c
to
b39cd00
Compare
/lgtm |
/approve
…On Tue, Jun 12, 2018 at 10:25 PM, k8s-ci-robot ***@***.***> wrote:
[APPROVALNOTIFIER] This PR is *NOT APPROVED*
This pull-request has been approved by: *hzxuzhonghu
<#65027 (comment)>*,
*liggitt <#65027#>*
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: *sttts*
Assign the PR to them by writing /assign @sttts in a comment when ready.
The full list of commands accepted by this bot can be found here
<https://go.k8s.io/bot-commands>.
The pull request process is described here
<https://git.k8s.io/community/contributors/guide/owners.md#the-code-review-process>
Needs approval from an approver in each of these files:
- *staging/src/k8s.io/apiextensions-apiserver/Godeps/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiextensions-apiserver/Godeps/OWNERS>*
- staging/src/k8s.io/apiserver/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/OWNERS>
[liggitt]
- *staging/src/k8s.io/kube-aggregator/Godeps/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kube-aggregator/Godeps/OWNERS>*
- staging/src/k8s.io/sample-apiserver/Godeps/OWNERS
<https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/sample-apiserver/Godeps/OWNERS>
[liggitt]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#65027 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pzGFk1lv_NXdcrPejXHxG4vDYjhrks5t8HgrgaJpZM4Uk9sH>
.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu, liggitt, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/cc @cheftako |
[MILESTONENOTIFIER] Milestone Pull Request Labels Incomplete Action required: This pull request requires label changes. If the required changes are not made within 2 days, the pull request will be moved out of the v1.12 milestone. kind: Must specify exactly one of |
/test pull-kubernetes-e2e-gce |
Automatic merge from submit-queue (batch tested with PRs 64140, 64898, 65022, 65037, 65027). If you want to cherry-pick this change to another branch, please follow the instructions here. |
not necessarily... it fixes stuck tcp connections, which we have observed happening in failover cases. open to further improvement here, but this is better than what we had |
time passes It does seem like a good idea to drop all watches if our etcd goes away, and a restart is an especially (excessively?) thorough way to accomplish that... However! It doesn't seem great for apiserver to continuously fail liveness checks and crash-loop while etcd is down / split; it should be sufficient to just not be ready. So, maybe liveness shouldn't care about etcd until we observe a healthy etcd and become ready, at which point if we lose readiness because etcd goes away, we should also lose liveness and be restarted. We can probably come up with something better / more principled than that: AFAIK, no one has payed any holistic attention to apiserver's liveness / readiness checks. |
fixes #64909