Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise etcd connectivity in the /healthz/ping endpoint #48215

Closed
bjhaid opened this issue Jun 28, 2017 · 19 comments · Fixed by #49412
Closed

Exercise etcd connectivity in the /healthz/ping endpoint #48215

bjhaid opened this issue Jun 28, 2017 · 19 comments · Fixed by #49412
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@bjhaid
Copy link
Contributor

bjhaid commented Jun 28, 2017

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:

We fenced off an api-server from connecting to etcd, however the /healthz/ping endpoint was returning 200

What you expected to happen:

The /healthz/ping endpoint to exercise connectivity to etcd and return a 500 type response code if it can't connect to etcd since the apiserver is useless without etcd

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 28, 2017
@k8s-github-robot
Copy link

@bjhaid There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
e.g., @kubernetes/sig-api-machinery-* for API Machinery
(2) specifying the label manually: /sig <label>
e.g., /sig scalability for sig/scalability

Note: method (1) will trigger a notification to the team. You can find the team list here and label list here

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 28, 2017
@xiangpengzhao
Copy link
Contributor

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 28, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 28, 2017
@bjhaid
Copy link
Contributor Author

bjhaid commented Jun 28, 2017

thanks @xiangpengzhao

@deads2k
Copy link
Contributor

deads2k commented Jun 29, 2017

Checking to see if you can reach etcd makes sense. I don't think it should be part of ping though. How about a separate health check which is conditional on having a storage a config which speaks to etcd?

@bjhaid
Copy link
Contributor Author

bjhaid commented Jun 29, 2017

@deads2k

I don't think it should be part of ping though.
Fair

How about a separate health check which is conditional on having a storage a config which speaks to etcd?

It would be better to have an health check that exercises whatever storage backend is in use, however if that's too complex maybe having something etcd specific would be okay.

@deads2k
Copy link
Contributor

deads2k commented Jun 29, 2017

It would be better to have an health check that exercises whatever storage backend is in use, however if that's too complex maybe having something etcd specific would be okay.

I think those would be different health checks. It is possible to have multiple backends for a single server. Consider the arguments around events we're having now.

@bjhaid
Copy link
Contributor Author

bjhaid commented Jun 29, 2017

@deads2k how about aggregating health checks for multiple backends into a single endpoint?

@deads2k
Copy link
Contributor

deads2k commented Jun 29, 2017

@deads2k how about aggregating health checks for multiple backends into a single endpoint?

/healthz is already a summary of all health checks. I'd prefer to keep separate health checks separate and have one level of aggregation.

@bjhaid
Copy link
Contributor Author

bjhaid commented Jun 29, 2017

/healthz is already a summary of all health checks. I'd prefer to keep separate health checks separate and have one level of aggregation.

/healthz returns just bare ok on 1.6.1 I can't find any summary information there, is that expected? Or maybe I do not understand what you mean by:

summary of all health checks

@deads2k
Copy link
Contributor

deads2k commented Jun 29, 2017

/healthz returns just bare ok on 1.6.1 I can't find any summary information there, is that expected? Or maybe I do not understand what you mean by:

if any constituent health check (see kubectl get --raw / for the list) fails, it shows you the failing name, but not details (avoids leaking information). Given the name, you can lookup the specific health check failures. There are maybe a half dozen or so checks on a normal apiserver.

@bjhaid
Copy link
Contributor Author

bjhaid commented Jun 29, 2017

ah makes perfect sense, so how will this be turned into a work stream? ;) I also don't mind taking a stab at implementing it with some guidance

@bjhaid
Copy link
Contributor Author

bjhaid commented Jul 16, 2017

@deads2k I got something working here:

bjhaid@3d685f1

It's doesn't have tests, but I have tested it from the command line and it works as expected, I can maybe add tests and move the logic to a separate function if you think this makes sense

@deads2k
Copy link
Contributor

deads2k commented Jul 17, 2017

bjhaid/kubernetes@3d685f1

It's doesn't have tests, but I have tested it from the command line and it works as expected, I can maybe add tests and move the logic to a separate function if you think this makes sense

Yeah, it looks close enough to be worth a pull and some tests

@bjhaid
Copy link
Contributor Author

bjhaid commented Jul 17, 2017

Thanks ❤️ I'll clean it up and send in a PR

@lavalamp
Copy link
Member

/assign @jpbetz

@k8s-ci-robot
Copy link
Contributor

@lavalamp: GitHub didn't allow me to assign the following users: jpbetz.

Note that only kubernetes members can be assigned.

In response to this:

/assign @jpbetz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bjhaid
Copy link
Contributor Author

bjhaid commented Jul 20, 2017

@lavalamp I actually have a WIP for this and was going to PR this tomorrow

@jpbetz
Copy link
Contributor

jpbetz commented Jul 20, 2017

Loosely related to this- Etcd's own /health endpoint will start correctly reporting as unhealthy when any alarms are triggered: etcd-io/etcd#8272. This won't be available till etcd 3.3.0

@bjhaid
Copy link
Contributor Author

bjhaid commented Jul 20, 2017

@jpbetz that will not solve the motivation for this issue, the apiserver was fenced off from having a network communication with etcd and the healthz endpoints still reported healthy, even though all critical api requests were failing

bjhaid added a commit to bjhaid/kubernetes that referenced this issue Aug 1, 2017
k8s-github-robot pushed a commit that referenced this issue Aug 3, 2017
Automatic merge from submit-queue (batch tested with PRs 49989, 49806, 49649, 49412, 49512)

This adds an etcd health check endpoint to kube-apiserver

addressing #48215.

**What this PR does / why we need it**:
This ensures kube-apiserver `/healthz` endpoint fails whenever connectivity cannot be established to etcd, also ensures the etcd preflight checks works with unix sockets

**Which issue this PR fixes**: fixes #48215

**Special notes for your reviewer**:
This PR does not use the etcd client directly as the client object is wrapped behind the storage interface and not exposed directly for use, so I decided to reuse what's being done in the preflight. So this will only check fail for connectivity and not etcd auth related problems. I did not write tests for the endpoint because I couldn't find examples that I could follow for writing tests for healthz related endpoints, I'll be willing to write those tests if someone can point me at a relevant one.

**Release note**:
```release-note
Add etcd connectivity endpoint to healthz
```

@deads2k please help review, thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants