-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load balancer health check for kube-apiserver #3537
load balancer health check for kube-apiserver #3537
Conversation
@abhinavdahiya let me know if this is the right place for this doc, otherwise I will move it. /assign @abhinavdahiya |
Thanks for the detailed doc @tkashem ! I think the next step will be to make this doc discoverable by linking this doc from the code that defines these healthchecks in |
- Existing connections are not cut off hard, they are allowed to complete gracefully. | ||
|
||
## Load Balancer Health Check Probe | ||
`kube-apiserver` provides graceful termination support via the `/readyz` health check endpoint. When `/readyz`reports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space in front of "reports"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
Now let's walk through the events (in chronological order) that unfold when a `kube-apiserver` instance restarts: | ||
* E1: `T+0s`: `kube-apiserver` receives a TERM signal. | ||
* E2: `T+0s`: `/readyz` starts reporting `failure` to signal to the load balancer that a shut down is in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everywhere in the doc: load balancers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* E3: `T+70s`: `kube-apiserver` (the http server) stops listening: | ||
* `/healthz` turns red. | ||
* Default TCP health check probe on port `6443` will fail. | ||
* Any new request forwarded to it will fail, most likely with a `connection refused` error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or GOAWAY for http/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
* E5: `T+70s+60s`: The apiserver process exits. | ||
|
||
An important note to consider is that today the time difference between `E3` and `E2` is `70s`. This is known as | ||
`shutdown-delay-duration` and is configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be clear: not configurable by the user, but by the devs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Let's plot the events as it happens when a load balance determines that a `kube-apiserver` is unhealthy and | ||
takes it out of service. This considers a worst case scenario: | ||
* P1: T+0s: `/readyz` starts reporting red. | ||
* P2: T+10s: first probe initiated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be clear that this is an example. P2 could be right at T+0s, depending on the alignment of the probe request interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it clear that this is a worst case scenario to calculate at most 30s
|
||
We assume the following: | ||
* Each health check request is independent and lasts the entire interval. | ||
* The time it takes for the instance to respond does not affect the interval for the next health check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know from aws/gcp docs that this is really the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is true for aws, i copied them verbatim from aws doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link the docs, to make this easy to check? Seems like it's the classic-LB docs, but the installer uses network load balancers (classic LBs are aws_elb
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a doc that describes this for network load balancer exclusively. I think the health check mechanics should be the same for classic, application and network LB. Maybe we can ask this question to our AWS account rep.
On the other hand, what we stipulate above must hold true for all health checks universally. Otherwise if we allow one interval to bleed into another then we don't have a deterministic "at most".
28371c9
to
b8d4bb5
Compare
|
||
## Load Balancer Health Check Probe | ||
`kube-apiserver` provides graceful termination support via the `/readyz` health check endpoint. When `/readyz` reports | ||
`ok` it indicates that the apiserver is ready to serve request(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ok
-> 200 OK
? I expect LBs to care about HTTP status codes and not about the response body. And your ok
is likely shorthand for the 200 status, but I think explicitly saying "200" (and possibly even "HTTP status 200 OK
") would make it harder to misunderstand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed doc @tkashem !
I think the next step will be to make this doc discoverable by linking this doc from the code that defines these healthchecks indata/data/{aws,gcp}
..
@abhinavdahiya I linked the doc.
* E2: `T+0s`: `/readyz` starts reporting `failure` to signal to the load balancers that a shut down is in progress. | ||
* The apiserver will continue to accept new request(s). | ||
* The apiserver waits for certain amount of time (configurable by `shutdown-delay-duration`) before it stops accepting new request(s). | ||
* E3: `T+70s`: `kube-apiserver` (the http server) stops listening: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere in the doc you have:
In future we will reduce
shutdown-delay-duration
to30s
.
I'd rather make this portion of the doc robust to that sort of pivot by using T+shudown-delay-duration
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as the user/dev is concerned, they should treat shutdown-delay-duration
to be 30s
for the purpose of designing health check probes. So I changed it to T+30s
.
* `/healthz` turns red. | ||
* Default TCP health check probe on port `6443` will fail. | ||
* Any new request forwarded to it will fail, most likely with a `connection refused` error or `GOAWAY` for http/2. | ||
* Existing request(s) in-flight are not cut off but are given up to `60s` to complete gracefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this 60s also have a config variable name that we can use to guard against future default changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoded in kube-apiserver.
lgtm |
Per [0], the /readyz endpoint is how the api communicates that it is gracefully shutting down. Once /readyz starts to report failure, we want to stop sending traffic to that backend. If we wait for /healthz, it may be too late because once /healthz starts failing the api is already not accepting connections. 0: openshift/installer#3537
Per [0], the /readyz endpoint is how the api communicates that it is gracefully shutting down. Once /readyz starts to report failure, we want to stop sending traffic to that backend. If we wait for /healthz, it may be too late because once /healthz starts failing the api is already not accepting connections. I also moved the liveness probe for haproxy itself to use a /readyz endpoint for consistency. This isn't strictly necessary, but I think it will be less confusing if there aren't multiple health check endpoints in the config. 0: openshift/installer#3537
b8d4bb5
to
dcd415c
Compare
upi/gcp/02_lb_ext.py
Outdated
@@ -1,5 +1,6 @@ | |||
def GenerateConfig(context): | |||
|
|||
// Refer to docs/dev/kube-apiserver-health-check.md on how to correctly setup health check probe for kube-apiserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably not correct comment syntax in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, my bad. fixed.
upi/gcp/02_lb_int.py
Outdated
@@ -6,6 +6,7 @@ def GenerateConfig(context): | |||
'group': '$(ref.' + context.properties['infra_id'] + '-master-' + zone + '-instance-group' + '.selfLink)' | |||
}) | |||
|
|||
// Refer to docs/dev/kube-apiserver-health-check.md on how to correctly setup health check probe for kube-apiserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/test e2e-gcp-upi |
49cb2af
to
3bc71bb
Compare
@tkashem: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/approve |
Adding valid bug since this is docs update |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
Per [0], the /readyz endpoint is how the api communicates that it is gracefully shutting down. Once /readyz starts to report failure, we want to stop sending traffic to that backend. If we wait for /healthz, it may be too late because once /healthz starts failing the api is already not accepting connections. I also moved the liveness probe for haproxy itself to use a /readyz endpoint for consistency. This isn't strictly necessary, but I think it will be less confusing if there aren't multiple health check endpoints in the config. 0: openshift/installer#3537 (cherry picked from commit 022933c)
No description provided.