Cloud load-balancers should have health checks for nodes #14661

thockin · 2015-09-28T16:39:45Z

Given the state of cloud-LB today, most (GCE, AWS, Openstack) LB implementations target nodes indiscriminately. We should ensure that the cloud-load-balancer is only targetting healthy nodes.

We should health-check to the nodes' kubelet or kube-proxy or add a new "do nothing but answer node health" daemon.

bprashanth · 2015-09-28T17:10:51Z

We should health-check to the nodes' kubelet or kube-proxy or add a new "do nothing but answer node health" daemon.

Nodes will already flip to unhealthy after ~40s of kubelet silence, or immediately when something like docker death is observed by kubelet. You can probably fix it by swapping the nodelister: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/service/servicecontroller.go#L77 with the conditional node lister https://github.com/kubernetes/kubernetes/blob/master/pkg/client/cache/listers.go#L119, or diffing them.

bprashanth · 2015-09-28T17:11:44Z

I mean, swap those and remove unhealthy nodes from the target instance group

thockin · 2015-09-28T17:21:02Z

That's a lot of propagation. 40s is an eternity

On Mon, Sep 28, 2015 at 10:12 AM, Prashanth B notifications@github.com
wrote:

I mean, swap those and remove unhealthy nodes from the target instance
group

—
Reply to this email directly or view it on GitHub
#14661 (comment)
.

bprashanth · 2015-09-28T17:37:45Z

I mean your daemon could die too so you'll need a timeout. Avoiding the gce health checks suggestion because I'd rather do this in kube and have it work xplat.

roberthbailey · 2016-01-07T17:54:22Z

Healthchecking the kube-proxy seems like the best idea to me. Port 10249 is currently used for both healthz and pprof, and we might not want pprof data exposed to the internet. But it wouldn't be difficult to make healthz bind to 0.0.0.0 by default (even if we moved pprof to a different port) and then allow 10249 to be opened to healthcheckers as necessary.

I don't think we should be doing our own healthchecking and manually modifying the target instance group. That code would be GCE-specific and seems like it's just re-implementing what is provided for free by the GCP cloud healthchecker as long as you can provide an http endpoint that is accessible to 130.211.0.0/22.

bgrant0607 · 2016-02-12T06:58:15Z

See #8673

thockin · 2016-02-14T05:00:08Z

I'm going close the dup (I dup'ed myself!). This one has more info, so keeping it.

bprashanth · 2016-02-14T05:13:07Z

Why health check nodes instead of straight to network endpoint? If we do that, isn't the 40s eternity enough for node health checks?

thockin · 2016-02-14T05:16:50Z

Today we load-balance to nodes. If a node dies, I want my cloud LB
targetting a different node ASAP.

On Sat, Feb 13, 2016 at 9:13 PM, Prashanth B notifications@github.com
wrote:

Why health check nodes instead of straight to network endpoint? If we do
that, isn't the 40s eternity enough for node health checks?

—
Reply to this email directly or view it on GitHub
#14661 (comment)
.

bprashanth · 2016-02-14T05:30:51Z

Yes we can easily implement a short term solution where the service/ingress controller runs a goroutine that just polls health check daemons on the node. "Bouncy" makes that harder because kube-proxy will still continue to think the endpoint is ready. Even if we don't go to the extent of reporting utiliztion information per request, I think we should come up with somethig that re-uses an existing health check idiom (either nodecontroller health or liveness/readiness), in the long run.

The risk of having some sort of instantaneous and binary feedback loop is osciallation or flapping. The "right" way to solve this, IMO, is to use backend weights.

glerchundi · 2016-10-18T12:57:55Z

@bprashanth @thockin which is the status of this issue? I would like to add GCE LB health checking, did you solved by making kubelet port available or how?

thockin · 2016-10-19T06:35:45Z

We're actually using cloud healthchecks for something different now, so we
should close this.

On Tue, Oct 18, 2016 at 5:58 AM, Gorka Lerchundi Osa <
notifications@github.com> wrote:

@bprashanth https://github.com/bprashanth @thockin
https://github.com/thockin which is the status of this issue? I would
like to add GCE LB health checking, did you solved by making kubelet port
available or how?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14661 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVIKYfKkrY6LwjECZPRx-lEV0m2kfks5q1MJ4gaJpZM4GFDrX
.

glerchundi · 2016-10-19T06:45:07Z

Ok @thockin can you point me to the documentation/issue/whatever to follow the new direction? I'll appreciate any hint that helps me finding out this concrete topic.

Thanks

thockin · 2016-10-19T06:53:35Z

#29409

On Tue, Oct 18, 2016 at 11:45 PM, Gorka Lerchundi Osa <
notifications@github.com> wrote:

Ok @thockin https://github.com/thockin can you point me to the
documentation/issue/whatever to follow the new direction? I'll appreciate
any hint that helps me finding out this concrete topic.

Thanks

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14661 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVLfVHiQQ_lMPXxvfizekHDDoLXJoks5q1byVgaJpZM4GFDrX
.

glerchundi · 2016-10-19T06:55:14Z

thanks!

bprashanth · 2016-10-21T19:37:09Z

This is still an issue of course, meaning, if a node disappears due to partition we will take 40s to kill endpoints. If we had a health check endpoint, it would take O(5s). There are different ways to tackle it.

One way is to make kube-proxy health checking much smarter (i.e use ipvs). We need to confirm that ipvs will autofail over if it noticed that eg a SYN got blackholed once. I wasn't able to confirm this with some preliminary tinkering: #30134 (comment)

Another, is to make our health checking smarter. The problem right now is that the kubelet is responsible for reporting endpoint health and node status. We could come up with a system that reports endpoint health much more frequently, and doesn't actually run on the node: #28442

And yet another, is to actually do what this issue proposes, and add a secondary health check endpoint to nodes. Or find some clever way to leverage the same "healthcheck-nodeport" logic in a way that works for both "onlyLocal" and "Global" services.

thockin · 2016-10-25T18:35:31Z

On Fri, Oct 21, 2016 at 12:37 PM, Prashanth B notifications@github.com wrote:

This is still an issue of course, meaning, if a node disappears due to partition we will take 40s to kill endpoints. If we had a health check endpoint, it would take O(5s). There are different ways to tackle it.

One way is to make kube-proxy health checking much smarter (i.e use ipvs). We need to confirm that ipvs will autofail over if it noticed that eg a SYN got blackholed once. I wasn't able to confirm this with some preliminary tinkering: #30134 (comment)

I convinced myself that it DOES NOT do what we want. it does exactly
what we do today.

Another, is to make our health checking smarter. The problem right now is that the kubelet is responsible for reporting endpoint health and node status. We could come up with a system that reports endpoint health much more frequently, and doesn't actually run on the node: #28442

And yet another, is to actually do what this issue proposes, and add a secondary health check endpoint to nodes. Or find some clever way to leverage the same "healthcheck-nodeport" logic in a way that works for both "onlyLocal" and "Global" services.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

thockin · 2017-01-10T19:16:13Z

Re-opening. I do think we should have a node-level healthcheck for LBs that are not using the OnlyLocal annotation.

Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522) Make GCE load-balancers create health checks for nodes From #14661. Proposal on kubernetes/community#552. Fixes #46313. Bullet points: - Create nodes health check and firewall (for health checking) for non-OnlyLocal service. - Create local traffic health check and firewall (for health checking) for OnlyLocal service. - Version skew: - Don't create nodes health check if any nodes has version < 1.7.0. - Don't backfill nodes health check on existing LBs unless users explicitly trigger it. **Release note**: ```release-note GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default. An existing LoadBalancer will have health check attached to it when: - Change Service.Spec.Type from LoadBalancer to others and flip it back. - Any effective change on Service.Spec.ExternalTrafficPolicy. ```

MrHohn · 2017-11-30T19:23:24Z

Unassigning as GCE part was done.
/unassign
/remove-area platform/gce
/area cloudprovider

thockin · 2017-11-30T21:23:41Z

Can you update what you think still needs to be done, and who should be doing it?

…

On Thu, Nov 30, 2017 at 11:24 AM, Zihong Zheng ***@***.***> wrote: Unassigning as GCE part was done. /unassign /remove-area platform/gce /area cloudprovider — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14661 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVKTZyv6gzrkY9E3EQs_DQoc1oFsGks5s7wDmgaJpZM4GFDrX> .

MrHohn · 2017-11-30T21:51:25Z

Can you update what you think still needs to be done, and who should be
doing it?

Sure, the real work for each cloud provider is to attach (or configure) a health check pointing to node:10256/healthz (kube-proxy healthz port is 10256 by default) while provisioning load balancers.

I'd expect in-tree cloud provider owners to follow up on this. Looping in the ones I found in OWNER files (omitting those don't have real loadbalancer implemention).
@justinsb for AWS
@colemickens for Azure
@FengyunPan for Openstack

MrHohn · 2017-11-30T22:00:16Z

@ngtuna for CloudStack

jhorwit2 · 2017-12-01T02:49:51Z

@MrHohn This requires at least v1.7.2, correct?

MrHohn · 2017-12-01T05:22:01Z

MrHohn This requires at least v1.7.2, correct?

@jhorwit2 Thanks for mentioning, that is correct. v1.7.2 is the earliest k8s version that kube-proxy properly serves healthz port. We should probably make it a global const somewhere.

thockin · 2017-12-01T06:39:07Z

Should we be doing this atthe Service controller level, or should this simply be an implementation detail - if a cloud wants to do it, they should do it, but it should not be part of Service Controller? especially as we move towards Cloud Controller Manager ...

…

On Thu, Nov 30, 2017 at 9:22 PM, Zihong Zheng ***@***.***> wrote: MrHohn This requires at least v1.7.2, correct? @jhorwit2 <https://github.com/jhorwit2> Thanks for mentioning, that is correct. v1.7.2 is the earliest k8s version that kube-proxy properly serves healthz port. We should probably make it a global const somewhere. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#14661 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVC9slX_vEj3S4vs1_Wy88fY-mnymks5s740TgaJpZM4GFDrX> .

colemickens · 2017-12-11T18:36:32Z

I think they're sort of the same thing, the SC would just be calling into the cloudprovider and asking for this health check, or it would be part of the normal behavior of the SC calling the CP to create/update a load balancer.

It seems like it would be helpful if Kubernetes were prescriptive here, one way or the other, and that issues per-cloud-provider were opened to track them adding this health check.

Questions:

Are there backward compatibility concerns? Based on how I'd implement this in Azure on a first-pass, it would retroactively add health checks which seems potentially risky.
How is the cloudprovider supposed to know what port kubelet is listening on for /healthz? I don't think it's appropriate to hardcode the default port. This will lead to broken configurations for anyone that chooses to change the default port (maybe for security, not sure what all's exposed on /healthz...)

MrHohn · 2017-12-11T19:17:12Z

I think they're sort of the same thing, the SC would just be calling into the cloudprovider and asking for this health check, or it would be part of the normal behavior of the SC calling the CP to create/update a load balancer.

I'd prefer the latter, make "ensure health check" another interface may place more constraints on cloud provider implementation --- given that even how health check is attached to a LB may happen very differently, and could be coupled with LB management.

Are there backward compatibility concerns? Based on how I'd implement this in Azure on a first-pass, it would retroactively add health checks which seems potentially risky.

Adding health check retroactively seems risky if adding health check itself is service disruptive. Another concern is various node versions --- some node reposes health check while others (in older version) don't.

How is the cloudprovider supposed to know what port kubelet is listening on for /healthz? I don't think it's appropriate to hardcode the default port. This will lead to broken configurations for anyone that chooses to change the default port (maybe for security, not sure what all's exposed on /healthz...)

To clarify, this "/healthz" is served by kube-proxy but not kubelet. The proxy healthz port is defined in the ports package:

kubernetes/pkg/master/ports/ports.go

Line 43 in ca59d90

ProxyHealthzPort = 10256

If someone choose to use a different healthz port in kube-proxy (via flag or config), they will need to do the same for service controller (or cloud controller manager).

colemickens · 2017-12-11T19:26:11Z

Adding health check retroactively seems risky if adding health check itself is service disruptive. Another concern is various node versions --- some node reposes health check while others (in older version) don't.

I don't recall the stance on version skew, but if it has been there for at least a few revisions, then I'm not worried about the presence of /healthz presuming this feature wouldn't be backported. I hadn't considered that adding the health-check would be disruptive. Do we know if this is really the case in any of the supported clouds?

To clarify, this "/healthz" is served by kube-proxy but not kubelet. The proxy healthz port is defined in the ports package:

Got it, thanks.

If someone choose to use a different healthz port in kube-proxy (via flag or config), they will need to do the same for service controller (or cloud controller manager).

Hm, to my knowledge, most of the cloud-providers don't really even take configuration today (or if they do, it's inferred configuration discovered via their metadata servers).

I guess it wouldn't be the worst thing in the world if it defaulted to the default kube-proxy port and was overrideble inside the cloud provider, but there's still going to be the case where:

User is running with non-default healthz port.
User upgrades cluster, switches to CCM.
None of their (new) load-balancers work because they're now health checking a path on a wrong port.

The user doesn't really know why any of this is happening until they find this issue, find the corresponding PR for their cloud provider and find where the new flag/config field is.

fejta-bot · 2018-03-12T02:13:32Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

thockin · 2018-03-20T04:42:19Z

/lifecycle frozen
/remove-lifecycle stale

alok87 · 2018-05-15T06:40:39Z

@justinsb Do we have health check for AWS ELBs? We want to keep adding/removing nodes in the ELB based on health. What is the right way to do it.

We are not using k8s provisioned ELB, it is the ELB we have created. We want to keep healthy kubernetes nodes under it.

jhorwit2 · 2018-05-15T14:09:31Z

@alok87 You'd want emulate what Kubernetes does in that scenario, which is either use the kube-proxy healthz endpoint, which is by default port 10256, or use the healthcheck port on the service if it's traffic policy is local.

alok87 · 2018-05-15T14:46:19Z

@jhorwit2 What does the Kubernetes does? On what basis it taints a node healthy or not unhealthy. We have two system pods - weave and kube-proxy. We want to check both are good and also check node_ip:port is working before making a node healthy and attaching it to LB. Do I need to write a custom controller to do this?

krmayankk · 2019-02-20T07:26:06Z

Curious what is remaining here ? Also do all health checks (for both https, network, container native) go to kube-proxy 10256 port ? Why do they not go to kubelet ?

thockin · 2019-02-20T19:30:50Z

kube-proxy exposes "healthiness" as "was able to update the heartbeat recently. We may want to go further with "readiness" vs "liveness" (I have an idea brewing) but it's unlikely that it will be totally user-defined. Instead, I am focusing on things like node schedulability. If the node is unschedulable, it should probably not be considered for new LB traffic.

That allow users to define arbitrary rules for what makes a node unschedulable, which is something we want anyway.

Closing this. Thanks for the reminder :)

thockin added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/cluster labels Sep 28, 2015

thockin added this to the v1.2-candidate milestone Sep 28, 2015

thockin mentioned this issue Sep 28, 2015

We should remove NotReady nodes from a load balancer #13346

Closed

thockin mentioned this issue Feb 14, 2016

GCE load-balancer should have a node health-check #8673

Closed

bprashanth modified the milestones: next-candidate, v1.2-candidate Feb 19, 2016

thockin closed this as completed Oct 19, 2016

thockin reopened this Jan 10, 2017

thockin mentioned this issue Jan 10, 2017

Document role of cloudprovider healthchecks in L4 lb #32827

Closed

thockin added sig/network Categorizes an issue or PR as relevant to SIG Network. area/platform/gce labels Jan 10, 2017

bowei mentioned this issue Oct 11, 2017

Point gce ingress health checks at the node for onlylocal services kubernetes/ingress-gce#17

Closed

k8s-ci-robot unassigned MrHohn Nov 30, 2017

k8s-ci-robot added area/cloudprovider and removed area/platform/gce labels Nov 30, 2017

FengyunPan mentioned this issue Dec 1, 2017

OpenStack load-balancer should have a node health-check #56657

Closed

bgrant0607 removed the team/cluster (deprecated - do not use) label Dec 12, 2017

jhorwit2 mentioned this issue Dec 20, 2017

#57109 Verification of connection to apiserver in kube-proxy #57288

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 12, 2018

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 20, 2018

thockin closed this as completed Feb 20, 2019

SataQiu mentioned this issue Apr 1, 2021

Kubeproxy healthz port exposed to outside kubernetes cluster. #100740

Closed

Cloud load-balancers should have health checks for nodes #14661

Cloud load-balancers should have health checks for nodes #14661

Comments

thockin commented Sep 28, 2015

bprashanth commented Sep 28, 2015

bprashanth commented Sep 28, 2015

thockin commented Sep 28, 2015

bprashanth commented Sep 28, 2015

roberthbailey commented Jan 7, 2016

bgrant0607 commented Feb 12, 2016

thockin commented Feb 14, 2016

bprashanth commented Feb 14, 2016

thockin commented Feb 14, 2016

bprashanth commented Feb 14, 2016

glerchundi commented Oct 18, 2016

thockin commented Oct 19, 2016

glerchundi commented Oct 19, 2016

thockin commented Oct 19, 2016

glerchundi commented Oct 19, 2016

bprashanth commented Oct 21, 2016

thockin commented Oct 25, 2016

thockin commented Jan 10, 2017

MrHohn commented Nov 30, 2017

thockin commented Nov 30, 2017 via email

MrHohn commented Nov 30, 2017

MrHohn commented Nov 30, 2017

jhorwit2 commented Dec 1, 2017

MrHohn commented Dec 1, 2017

thockin commented Dec 1, 2017 via email

colemickens commented Dec 11, 2017

MrHohn commented Dec 11, 2017

colemickens commented Dec 11, 2017

fejta-bot commented Mar 12, 2018

thockin commented Mar 20, 2018

alok87 commented May 15, 2018

jhorwit2 commented May 15, 2018

alok87 commented May 15, 2018

krmayankk commented Feb 20, 2019

thockin commented Feb 20, 2019