kubelet apiserver: be gentle closing connections on heartbeat failures #108107

aojea · 2022-02-14T16:39:09Z

Follow up on #104844
Alternative to #107879

Kubelet was forcefully closing all connections (idle and active) on heartbeat failures #63492
However, since #95981, all clients using HTTP2 use a health check by default that allows to detect stale connections without any additional logic.

In case users are disabling http2, by setting the environment variable DISABLE_HTTP2, the previous behavior is maintained.

/kind bug

kubelet don't forcefully close active connections on heartbeat failures, using the http2 health check mechanism to detect broken connections. Users can force the previous behavior of the kubelet by setting the environment variable DISABLE_HTTP2.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

aojea · 2022-02-14T16:42:04Z

/assign @liggitt @wojtek-t
/cc @JohnRusk

I think that this is simpler than #107879

k8s-ci-robot · 2022-02-14T16:42:06Z

@aojea: GitHub didn't allow me to request PR reviews from the following users: johnRusk.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/assign @liggitt @wojtek-t
/cc @JohnRusk

I think that this is simpler than #107879

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2022-02-14T16:42:28Z

/sig api-machinery

wojtek-t · 2022-02-14T19:15:53Z

This LGTM, but I would also like to hear from @liggitt

JohnRusk · 2022-02-14T20:00:51Z

Might be nice to have a comment in the code to explain why the code has to close idle connections on heartbeat failure. I.e. Why does failure of a heartbeat (which is presumably happending on connection that is not idle) signal to us that we need to close the idle ones?

BTW, I like the idea of relying on Pings rather than heartbeats for monitoring health of the one "live" HTTP2 connection. That looks nice.

The bit that seems confusing to me, and I suggest may need an explainatory comment, it's just the fact that closeAllConnections gets wired up to a method that closes Idle connections. It's not obvious to readers of the code (at least, not to me) why that is necessary or correct.

aojea · 2022-02-15T09:02:28Z

The bit that seems confusing to me, and I suggest may need an explainatory comment, it's just the fact that closeAllConnections gets wired up to a method that closes Idle connections. It's not obvious to readers of the code (at least, not to me) why that is necessary or correct.

yeah, let's discuss the whole problem:

The thing is that the function is passed as kubeDeps.OnHeartbeatFailure

kubernetes/cmd/kubelet/app/server.go

Lines 556 to 564 in d899c39

    
           case kubeDeps.KubeClient == nil, kubeDeps.EventClient == nil, kubeDeps.HeartbeatClient == nil: 
        
           	clientConfig, closeAllConns, err := buildKubeletClientConfig(ctx, s, nodeName) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	if closeAllConns == nil { 
        
           		return errors.New("closeAllConns must be a valid function other than nil") 
        
           	} 
        
           	kubeDeps.OnHeartbeatFailure = closeAllConns

and then plumbed to the lease controller #107879

So the real semantics should be "function that we call to do things on heartbeat failures" , that previously was "closeAllConnections" and now is "closeAllIdleConnections", but it can still mean "closeAllConnections" if HTTP2 is explicitly disabled.

The current situation is that we already have retry logic in client-go on network errors (default to 10 times)

kubernetes/staging/src/k8s.io/client-go/rest/request.go

Lines 1000 to 1012 in d899c39

    
           retryAfter, retry = r.retry.NextRetry(req, resp, err, func(req *http.Request, err error) bool { 
        
           	// "Connection reset by peer" or "apiserver is shutting down" are usually a transient errors. 
        
           	// Thus in case of "GET" operations, we simply retry it. 
        
           	// We are not automatically retrying "write" operations, as they are not idempotent. 
        
           	if r.verb != "GET" { 
        
           		return false 
        
           	} 
        
           	// For connection errors and apiserver shutdown errors retry. 
        
           	if net.IsConnectionReset(err) || net.IsProbableEOF(err) { 
        
           		return true 
        
           	} 
        
           	return false 
        
           })

so some of the loops are really not needed since are multiplying the number of retries, per example, the node status update:

kubernetes/pkg/kubelet/kubelet_node_status.go

Lines 451 to 463 in d899c39

    
           // updateNodeStatus updates node status to master with retries if there is any 
        
           // change or enough time passed from the last sync. 
        
           func (kl *Kubelet) updateNodeStatus() error { 
        
           	klog.V(5).InfoS("Updating node status") 
        
           	for i := 0; i < nodeStatusUpdateRetry; i++ { 
        
           		if err := kl.tryUpdateNodeStatus(i); err != nil { 
        
           			if i > 0 && kl.onRepeatedHeartbeatFailure != nil { 
        
           				kl.onRepeatedHeartbeatFailure() 
        
           			} 
        
           			klog.ErrorS(err, "Error updating node status, will retry") 
        
           		} else { 
        
           			return nil 
        
           		}

that means that an apiserver not replying will create a max of 50 connections attempts per kubelet , because it retries 5 times

pkg/kubelet/kubelet.go: // nodeStatusUpdateRetry specifies how many times kubelet retries when posting node status failed.
pkg/kubelet/kubelet.go: nodeStatusUpdateRetry = 5

TCP sockets are expensive, but you should only notice at a relative high scale, that is why I think that small or relatively idle cluster doesn't notice this problem.
With HTTP2 #95981 the client automatically detects idle, it has an embedded heartbeat detection, so we should not really need any heartbeat logic at all, the stdlib does it for us.
With HTTP1 you can only have one connection in the pool, so it can only be idle or active. If active the client always dial a new connection , if it is idle, we clean it to force to dial a new connection (avoid to reuse a stale connection)

This PR is the easy solution, I really don't know how to document this better, but in order to keep compatibility with old systems that still use HTTP1 I feel this is the less risky approach

fedebongio · 2022-02-15T21:14:33Z

/triage accepted

JohnRusk · 2022-02-15T21:41:19Z

Thanks for the explaination @aojea!

I reckon there's no need to add any additional docs to the code, since if anyone is curious they can find this PR from the commit history in the future, and read what you wrote above.

JohnRusk · 2022-02-15T21:41:50Z

This PR supercedes #107781, so I should close #107781 now, right?

(Note to future readers, background discussion can be found in #107781)

matthyx · 2022-02-16T11:06:02Z

This PR supercedes #107781, so I should close #107781 now, right?

yes please :)

matthyx

/lgtm

aojea · 2022-02-20T21:45:06Z

b9d865a

they show that the new function is able to recover from a situation that the TCP connection is broken but the endpoint is not aware, without requiring with close the whole connection, just forcing the client to try a new connection

ryanzhang-oss · 2022-02-20T22:31:56Z

b9d865a

they show that the new function is able to recover from a situation that the TCP connection is broken but the endpoint is not aware, without requiring with close the whole connection, just forcing the client to try a new connection

I think the rule of thumb is that we need at least one test that fails before the fix and passes after. I am not sure if we can create a test case like that here.

JohnRusk · 2022-03-08T23:31:24Z

Any status updates on this? I'm finding I have conversations with colleagues about this bug almost every day. Is the fix going ahead?

wojtek-t · 2022-03-09T07:30:55Z

/lgtm
/approve

Thanks!

k8s-ci-robot · 2022-03-09T07:31:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, matthyx, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubelet/OWNERS~~ [wojtek-t]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JohnRusk · 2022-03-09T19:11:25Z

@aojea Awesome to see this merged. Thank you! Are there any plans to create cherry pick PRs to get this into 1.23 (and maybe 1.22)? That could be helpful for users currently suffering from the issue but not ready for a major version upgrade.

liggitt · 2022-03-09T19:37:45Z

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md#what-kind-of-prs-are-good-for-cherry-picks

since this isn't fixing a regression from those versions, and I'm pretty skeptical it rises to the level of critical bug fix, I wouldn't really expect a backport of this

jackfrancis · 2022-03-09T19:59:21Z

@liggitt The side-effect of certain cluster behaviors at scale seems to meet the "Panic, crash, hang" criterion in the cherry-pick definition. This change will ameliorate those apiserver degradation scenarios. Are we in disagreement about that? Or is the actual cherry-pick process itself non-trivial (composing a change as multiple PRs out-of-sequence, stuff like that)

liggitt · 2022-03-09T20:05:10Z

While we don't anticipate issues with this fix (otherwise we wouldn't have merged it), backporting exposes release branches to unanticipated issues. Since this is a historically fragile area, I'd be extremely cautious about taking back a fix here for anything other than a regression in one of those releases

JohnRusk · 2022-03-09T20:10:10Z

anything other than a regression in one of those releases

FYI, my understanding is that this issue is a regression, but that the regression happened several releases ago. (e.g. in 1.18 or something. I haven't checked exactly). If that's corrrect does it change anything in your reply @liggitt? (I imagine not, but I'm just checking :-))

jackfrancis · 2022-03-09T20:21:05Z

Understand the practical realities at play here, thx for clarifying @liggitt

liggitt · 2022-03-09T22:03:30Z

FYI, my understanding is that this issue is a regression, but that the regression happened several releases ago.

#63492 merged in 1.11 and was picked back to 1.8.x, so this behavior has existed since then. Choosing between an edge case that can result in a crash at scale and an edge case that results in a silent and ~unrecoverable hang of nodes is not a super clear choice. I'd still lean against backporting this.

aojea · 2022-03-09T23:01:43Z

I agree with Jordan judgement, however, in case of backport , this can only go maximum to 1.23, since this depends on b9d865a and that is clearly not backportable

JohnRusk · 2022-03-09T23:31:37Z

Thanks for the clarification guys. I understand your reasoning.

djsly · 2022-03-23T01:41:06Z

So this will be in 1.25 only ? We are currently on 1.21 and affected once a week/two weeks in one of our clusters at random.

ehashman · 2022-04-12T21:33:47Z

@djsly this is in 1.24 and will be backported to 1.23.

ehashman · 2022-04-12T21:34:49Z

/priority important-soon

…8107-upstream-release-1.23 Automated cherry pick of #108107: kubelet apiserver: be gentle closing connections on

kubelet apiserver: be gentle closing connections on heartbeat failures

ac22287

k8s-ci-robot requested review from matthyx and mrunalp February 14, 2022 16:39

k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 14, 2022

k8s-ci-robot assigned liggitt and wojtek-t Feb 14, 2022

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 14, 2022

SergeyKanzhelev added this to Triage in SIG Node PR Triage Feb 15, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 15, 2022

matthyx approved these changes Feb 16, 2022

View reviewed changes

aojea mentioned this pull request Feb 22, 2022

Update client-go latency metrics bucket #106911

Merged

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2022

k8s-ci-robot merged commit a41f9e9 into kubernetes:master Mar 9, 2022

SIG Node PR Triage automation moved this from Triage to Done Mar 9, 2022

k8s-ci-robot added this to the v1.24 milestone Mar 9, 2022

JohnRusk mentioned this pull request Mar 20, 2022

apiserver cannot recover after restarting apiserver in large scale cluster(5k nodes, 15w pods) #65954

Closed

github-actions bot mentioned this pull request Mar 22, 2022

Week Ending March 13, 2022 dev-obs/actus#391

Open

JohnRusk mentioned this pull request Apr 7, 2022

Automated cherry pick of #108107: kubelet apiserver: be gentle closing connections on #109381

Merged

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 12, 2022

k8s-ci-robot added a commit that referenced this pull request May 9, 2022

Merge pull request #109381 from JohnRusk/automated-cherry-pick-of-#10…

884dc65

…8107-upstream-release-1.23 Automated cherry pick of #108107: kubelet apiserver: be gentle closing connections on

aojea mentioned this pull request Aug 2, 2023

[v1.18 ]use of closed connection issue of kubelet klts-io/kubernetes-lts#180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet apiserver: be gentle closing connections on heartbeat failures #108107

kubelet apiserver: be gentle closing connections on heartbeat failures #108107

aojea commented Feb 14, 2022

aojea commented Feb 14, 2022

k8s-ci-robot commented Feb 14, 2022

aojea commented Feb 14, 2022

wojtek-t commented Feb 14, 2022

JohnRusk commented Feb 14, 2022 •

edited

aojea commented Feb 15, 2022 •

edited

fedebongio commented Feb 15, 2022

JohnRusk commented Feb 15, 2022

JohnRusk commented Feb 15, 2022 •

edited

matthyx commented Feb 16, 2022

matthyx left a comment

aojea commented Feb 20, 2022

ryanzhang-oss commented Feb 20, 2022

JohnRusk commented Mar 8, 2022

wojtek-t commented Mar 9, 2022

k8s-ci-robot commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

liggitt commented Mar 9, 2022

jackfrancis commented Mar 9, 2022

liggitt commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

jackfrancis commented Mar 9, 2022

liggitt commented Mar 9, 2022

aojea commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

djsly commented Mar 23, 2022

ehashman commented Apr 12, 2022

ehashman commented Apr 12, 2022

kubelet apiserver: be gentle closing connections on heartbeat failures #108107

kubelet apiserver: be gentle closing connections on heartbeat failures #108107

Conversation

aojea commented Feb 14, 2022

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

aojea commented Feb 14, 2022

k8s-ci-robot commented Feb 14, 2022

aojea commented Feb 14, 2022

wojtek-t commented Feb 14, 2022

JohnRusk commented Feb 14, 2022 • edited

aojea commented Feb 15, 2022 • edited

fedebongio commented Feb 15, 2022

JohnRusk commented Feb 15, 2022

JohnRusk commented Feb 15, 2022 • edited

matthyx commented Feb 16, 2022

matthyx left a comment

Choose a reason for hiding this comment

aojea commented Feb 20, 2022

ryanzhang-oss commented Feb 20, 2022

JohnRusk commented Mar 8, 2022

wojtek-t commented Mar 9, 2022

k8s-ci-robot commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

liggitt commented Mar 9, 2022

jackfrancis commented Mar 9, 2022

liggitt commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

jackfrancis commented Mar 9, 2022

liggitt commented Mar 9, 2022

aojea commented Mar 9, 2022

JohnRusk commented Mar 9, 2022

djsly commented Mar 23, 2022

ehashman commented Apr 12, 2022

ehashman commented Apr 12, 2022

JohnRusk commented Feb 14, 2022 •

edited

aojea commented Feb 15, 2022 •

edited

JohnRusk commented Feb 15, 2022 •

edited