Exercise etcd connectivity in the /healthz/ping endpoint #48215

bjhaid · 2017-06-28T19:30:58Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:

We fenced off an api-server from connecting to etcd, however the /healthz/ping endpoint was returning 200

What you expected to happen:

The /healthz/ping endpoint to exercise connectivity to etcd and return a 500 type response code if it can't connect to etcd since the apiserver is useless without etcd

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2017-06-28T19:31:21Z

@bjhaid There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
e.g., @kubernetes/sig-api-machinery-* for API Machinery
(2) specifying the label manually: /sig <label>
e.g., /sig scalability for sig/scalability

Note: method (1) will trigger a notification to the team. You can find the team list here and label list here

xiangpengzhao · 2017-06-28T19:37:23Z

/sig api-machinery

bjhaid · 2017-06-28T19:38:11Z

thanks @xiangpengzhao

deads2k · 2017-06-29T12:21:16Z

Checking to see if you can reach etcd makes sense. I don't think it should be part of ping though. How about a separate health check which is conditional on having a storage a config which speaks to etcd?

bjhaid · 2017-06-29T15:03:17Z

@deads2k

I don't think it should be part of ping though.
Fair

How about a separate health check which is conditional on having a storage a config which speaks to etcd?

It would be better to have an health check that exercises whatever storage backend is in use, however if that's too complex maybe having something etcd specific would be okay.

deads2k · 2017-06-29T15:08:22Z

It would be better to have an health check that exercises whatever storage backend is in use, however if that's too complex maybe having something etcd specific would be okay.

I think those would be different health checks. It is possible to have multiple backends for a single server. Consider the arguments around events we're having now.

bjhaid · 2017-06-29T15:16:41Z

@deads2k how about aggregating health checks for multiple backends into a single endpoint?

deads2k · 2017-06-29T15:19:26Z

@deads2k how about aggregating health checks for multiple backends into a single endpoint?

/healthz is already a summary of all health checks. I'd prefer to keep separate health checks separate and have one level of aggregation.

bjhaid · 2017-06-29T15:23:52Z

/healthz is already a summary of all health checks. I'd prefer to keep separate health checks separate and have one level of aggregation.

/healthz returns just bare ok on 1.6.1 I can't find any summary information there, is that expected? Or maybe I do not understand what you mean by:

summary of all health checks

deads2k · 2017-06-29T15:35:43Z

/healthz returns just bare ok on 1.6.1 I can't find any summary information there, is that expected? Or maybe I do not understand what you mean by:

if any constituent health check (see kubectl get --raw / for the list) fails, it shows you the failing name, but not details (avoids leaking information). Given the name, you can lookup the specific health check failures. There are maybe a half dozen or so checks on a normal apiserver.

bjhaid · 2017-06-29T15:42:37Z

ah makes perfect sense, so how will this be turned into a work stream? ;) I also don't mind taking a stab at implementing it with some guidance

bjhaid · 2017-07-16T02:06:09Z

@deads2k I got something working here:

bjhaid@3d685f1

It's doesn't have tests, but I have tested it from the command line and it works as expected, I can maybe add tests and move the logic to a separate function if you think this makes sense

deads2k · 2017-07-17T14:27:22Z

bjhaid/kubernetes@3d685f1

It's doesn't have tests, but I have tested it from the command line and it works as expected, I can maybe add tests and move the logic to a separate function if you think this makes sense

Yeah, it looks close enough to be worth a pull and some tests

bjhaid · 2017-07-17T14:51:26Z

Thanks ❤️ I'll clean it up and send in a PR

lavalamp · 2017-07-20T21:20:13Z

/assign @jpbetz

k8s-ci-robot · 2017-07-20T21:20:14Z

@lavalamp: GitHub didn't allow me to assign the following users: jpbetz.

Note that only kubernetes members can be assigned.

In response to this:

/assign @jpbetz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bjhaid · 2017-07-20T21:28:39Z

@lavalamp I actually have a WIP for this and was going to PR this tomorrow

jpbetz · 2017-07-20T21:29:30Z

Loosely related to this- Etcd's own /health endpoint will start correctly reporting as unhealthy when any alarms are triggered: etcd-io/etcd#8272. This won't be available till etcd 3.3.0

bjhaid · 2017-07-20T21:34:35Z

@jpbetz that will not solve the motivation for this issue, the apiserver was fenced off from having a network communication with etcd and the healthz endpoints still reported healthy, even though all critical api requests were failing

addressing kubernetes#48215.

@deads2k

Automatic merge from submit-queue (batch tested with PRs 49989, 49806, 49649, 49412, 49512) This adds an etcd health check endpoint to kube-apiserver addressing #48215. **What this PR does / why we need it**: This ensures kube-apiserver `/healthz` endpoint fails whenever connectivity cannot be established to etcd, also ensures the etcd preflight checks works with unix sockets **Which issue this PR fixes**: fixes #48215 **Special notes for your reviewer**: This PR does not use the etcd client directly as the client object is wrapped behind the storage interface and not exposed directly for use, so I decided to reuse what's being done in the preflight. So this will only check fail for connectivity and not etcd auth related problems. I did not write tests for the endpoint because I couldn't find examples that I could follow for writing tests for healthz related endpoints, I'll be willing to write those tests if someone can point me at a relevant one. **Release note**: ```release-note Add etcd connectivity endpoint to healthz ``` @deads2k please help review, thanks!

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 28, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 28, 2017

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jun 28, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 28, 2017

bjhaid mentioned this issue Jul 21, 2017

This adds an etcd health check endpoint to kube-apiserver #49412

Merged

bjhaid added a commit to bjhaid/kubernetes that referenced this issue Aug 1, 2017

This adds an etcd health check endpoint to kube-apiserver

47d748c

addressing kubernetes#48215.

k8s-github-robot closed this as completed in #49412 Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise etcd connectivity in the /healthz/ping endpoint #48215

Exercise etcd connectivity in the /healthz/ping endpoint #48215

bjhaid commented Jun 28, 2017

k8s-github-robot commented Jun 28, 2017

xiangpengzhao commented Jun 28, 2017

bjhaid commented Jun 28, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

bjhaid commented Jul 16, 2017

deads2k commented Jul 17, 2017

bjhaid commented Jul 17, 2017

lavalamp commented Jul 20, 2017

k8s-ci-robot commented Jul 20, 2017

bjhaid commented Jul 20, 2017

jpbetz commented Jul 20, 2017 •

edited

bjhaid commented Jul 20, 2017

Exercise etcd connectivity in the /healthz/ping endpoint #48215

Exercise etcd connectivity in the /healthz/ping endpoint #48215

Comments

bjhaid commented Jun 28, 2017

k8s-github-robot commented Jun 28, 2017

xiangpengzhao commented Jun 28, 2017

bjhaid commented Jun 28, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

deads2k commented Jun 29, 2017

bjhaid commented Jun 29, 2017

bjhaid commented Jul 16, 2017

deads2k commented Jul 17, 2017

bjhaid commented Jul 17, 2017

lavalamp commented Jul 20, 2017

k8s-ci-robot commented Jul 20, 2017

bjhaid commented Jul 20, 2017

jpbetz commented Jul 20, 2017 • edited

bjhaid commented Jul 20, 2017

jpbetz commented Jul 20, 2017 •

edited