Exec timeout does not kick users out on kubectl client 1.21 #102569

Joseph-Goergen · 2021-06-03T17:54:41Z

What happened:

Exec-ing onto a pod does not kick you out.

What you expected to happen:

To get kicked out of the pod after the container runtime specified time is reached.

How to reproduce it (as minimally and precisely as possible):

configure your kubectl client to 1.21
kubectl exec -it <any pod> -- sh
wait however long your container runtime is set to and see that it doesn't kick you out.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):

$ k version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+IKS", GitCommit:"04e16b9d6c7af21e462f7ac675c34b667cd6149e", GitTreeState:"clean", BuildDate:"2021-05-24T08:19:13Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:
We're using containerd 1.5.2 and konnectivity 0.19

/sig cli

The text was updated successfully, but these errors were encountered:

Joseph-Goergen · 2021-06-03T18:26:32Z

containerd/containerd#5563

rtheis · 2021-06-04T17:52:31Z

Related to #97083.

jmcmeek · 2021-06-07T16:38:44Z

One perspective on this. We use the containerd stream idle timeout to implement a requirement that kubectl exec sessions be killed after being idle for 15 minutes. On the surface it seems like this issue exposes a problem on the server side (containerd or Kubernetes) as anyone with this version of kubectl is unwittingly circumventing the timeout. Or maybe we are misusing the stream idle timeout?

aojea · 2021-06-14T17:02:03Z

/assign

eddiezane · 2021-06-23T16:36:22Z

/sig node

ffromani · 2021-06-25T10:03:31Z

/priority important-longterm
/triage accepted

aojea · 2021-06-25T10:36:19Z

One perspective on this. We use the containerd stream idle timeout to implement a requirement that kubectl exec sessions be killed after being idle for 15 minutes. On the surface it seems like this issue exposes a problem on the server side (containerd or Kubernetes) as anyone with this version of kubectl is unwittingly circumventing the timeout. Or maybe we are misusing the stream idle timeout?

containerd/containerd#5563 (comment)

k8s-triage-robot · 2021-09-23T11:06:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2021-09-23T11:19:58Z

/remove-lifecycle stale

k8s-triage-robot · 2021-12-22T11:30:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2021-12-22T11:40:20Z

/remove-lifecycle stale

k8s-triage-robot · 2022-03-22T12:22:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2022-03-22T12:36:29Z

/remove-lifecycle stale

k8s-triage-robot · 2022-06-20T13:18:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2022-06-20T13:21:25Z

/remove-lifecycle stale

k8s-triage-robot · 2022-09-18T14:02:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2022-09-19T10:48:48Z

/remove-lifecycle stale

k8s-triage-robot · 2022-12-18T11:36:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rtheis · 2022-12-20T00:55:16Z

/remove-lifecycle stale

saschagrunert · 2023-11-29T10:52:59Z

Kubernetes v1.21 is out of support since quite some time. Do we need to keep this issue open?

rtheis · 2023-11-29T13:21:07Z

@Joseph-Goergen does this problem still exist?

Joseph-Goergen · 2023-11-29T15:37:39Z

Yes, we expected to get kicked out after 15 minutes.

root@lima-rancher-desktop:/home/test# date; kubectl exec -it -n kube-system   public-crcligg6n106oup9la3opg-alb1-84647bbdc6-9jbxv -- sh; date
Wed Nov 29 09:05:29 CST 2023
Defaulted container "nginx-ingress" out of: nginx-ingress, sysctl (init)
/etc/nginx $ exit
Wed Nov 29 09:34:07 CST 2023
root@lima-rancher-desktop:/home/test# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"e251b5ebd2466adde7b395bf3ab515b8f6cd0c84", GitTreeState:"clean", BuildDate:"2023-07-28T20:59:48Z", GoVersion:"go1.19.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.0-rc.0+IKS", GitCommit:"72e05a14a44f21af99ba99a39391620d6495a7a9", GitTreeState:"clean", BuildDate:"2023-11-22T17:43:16Z", GoVersion:"go1.21.4", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.26) and server (1.29) exceeds the supported minor version skew of +/-1

liggitt · 2023-11-29T15:43:39Z

I don't think this is a bug, I think it's a misapplication of stream_idle_timeout to mean "no user activity", since the keepalive ping (needed to make slow-moving log and port-forward connections stable) is making the connection "not idle".

saschagrunert · 2023-11-29T15:51:13Z

The option StreamingConnectionIdleTimeout is not used any more in the kubelet, I think it's gone with dockershim. Tested the change there: #122104

kwilczynski · 2023-11-29T16:19:26Z

To add to what @saschagrunert mentioned above.

There are no users left of this field in the underlying type, and this is not used anywhere any more. I found out about this while looking into idle stream timeouts used together with CRI-O.

That said, I also believe that streaming was moved out from kubelet some time ago, and now the CRI-compatible runtimes are expected to handle this functionality (CRI-O does already).

@Joseph-Goergen, I believe that you can set this too for containerd, per:

containerd/docs/cri/config.md#L142-L146

rtheis · 2023-11-30T14:22:48Z

@liggitt @kwilczynski we do set stream_idle_timeout and expected to get booted after that the timeout but as you noted there is a "keepalive ping" so the connection is not really idle. We see #115493 opened to provide an option along with cri-o/cri-o#4995 and containerd/containerd#5563 opened against the container runtimes. How do you all recommend that we proceed to at least support an option for an exec timeout?

saschagrunert · 2023-11-30T15:14:33Z

@rtheis we already have --stream-idle-timeout in CRI-O. Handling this at the container runtime level seems right to me.

kwilczynski · 2023-11-30T21:52:32Z

@rtheis, we are troubleshooting a similar issue in OpenShift at the moment, albeit affecting our oc command-line utility, which is kubectl compatible.

That said, there is a change in dependency for the CLI, both oc and kubectl (since oc follows dependencies from the upstream, as such, it makes sense both are affected) around the Kubernetes 1.21 release.

If you grab kubectl 1.20, it will work fine, and the timeout in the CRI will be obeyed and consistent.

Any newer release will, at least in my testing, have an inconsistent behaviour, one of the:

Failure due to SPDY PING response failure
Never time out—it will eventually reach some default timeout, or the 4 hours hardcoded one
Time out on point—happens very rarely
Time out somewhere above the allotted deadline, often adding a lot of slack to the deadline...

I am not sure what is causing this. However, I am positive that it's not the CRI, so it's neither CRI-O nor containerd. I am also not convinced that SPDY PING being sent back and forth across to the multiplexed connection is the culprit
here since I see these also taking place when an older client is used, and things work fine with it.

~~Thoughts?~~

Update: This appears to be related to SPDY PING, contrary to what I initially thought, as when I reverted the change from #97083 (part of the 1.21 release), the timeout issue will no longer be impacted.

Perhaps the solution to the problem suggested in the #115493 would be a way forward.

That said, another person suggests that disabling SPDY PING didn't help them much, per: #115493 (comment).

There is also a matter of expectations: many of our users expect idle connections to be closed, preferably without having to configure anything on the client side - which would then control the behaviour of this feature, and this might not be desirable.

aojea · 2023-12-01T17:22:23Z

it is the SPDY ping containerd/containerd#5563 (comment)

This is a problem of when a bug becomes a feature, the spdy library conflated the ping control frames with the data frame, so the pings account as part of the session, renewing it and never reaching the timeout. Ping control frames should not be considered for the stream timeout, however, it is not possible to fix that in a backwards compatible way, and my suggestion to add a new timeout based on session activity was discarded ... maybe now that we are moving from spdy to websockets we can fix it, but as today there is no more chances than disabling spdy pings

Joseph-Goergen added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2021

k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 3, 2021

Joseph-Goergen mentioned this issue Jun 3, 2021

kubectl 1.21 and containerd fail to follow stream_idle_timeout containerd/containerd#5563

Open

k8s-ci-robot assigned aojea Jun 14, 2021

aojea mentioned this issue Jun 14, 2021

kubectl 1.21 and crio fail to follow stream_idle_timeout #5563 cri-o/cri-o#4995

Closed

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 23, 2021

soltysh removed the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Jun 23, 2021

n4j added this to Triaged in SIG Node Bugs Jul 9, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2022

aojea mentioned this issue Feb 4, 2023

Disable SPDY pings for kubectl cp and add --ping flag for kubectl exec #115493

Closed

terinjokes mentioned this issue Feb 9, 2024

Containers can get stuck in stopping state until cri-o is restarted cri-o/cri-o#6699

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exec timeout does not kick users out on kubectl client 1.21 #102569

Exec timeout does not kick users out on kubectl client 1.21 #102569

Joseph-Goergen commented Jun 3, 2021

Joseph-Goergen commented Jun 3, 2021

rtheis commented Jun 4, 2021

jmcmeek commented Jun 7, 2021

aojea commented Jun 14, 2021

eddiezane commented Jun 23, 2021

ffromani commented Jun 25, 2021

aojea commented Jun 25, 2021

k8s-triage-robot commented Sep 23, 2021

rtheis commented Sep 23, 2021

k8s-triage-robot commented Dec 22, 2021

rtheis commented Dec 22, 2021

k8s-triage-robot commented Mar 22, 2022

rtheis commented Mar 22, 2022

k8s-triage-robot commented Jun 20, 2022

rtheis commented Jun 20, 2022

k8s-triage-robot commented Sep 18, 2022

rtheis commented Sep 19, 2022

k8s-triage-robot commented Dec 18, 2022

rtheis commented Dec 20, 2022

saschagrunert commented Nov 29, 2023

rtheis commented Nov 29, 2023

Joseph-Goergen commented Nov 29, 2023

liggitt commented Nov 29, 2023

saschagrunert commented Nov 29, 2023

kwilczynski commented Nov 29, 2023

rtheis commented Nov 30, 2023

saschagrunert commented Nov 30, 2023

kwilczynski commented Nov 30, 2023 •

edited

aojea commented Dec 1, 2023

Exec timeout does not kick users out on kubectl client 1.21 #102569

Exec timeout does not kick users out on kubectl client 1.21 #102569

Comments

Joseph-Goergen commented Jun 3, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Joseph-Goergen commented Jun 3, 2021

rtheis commented Jun 4, 2021

jmcmeek commented Jun 7, 2021

aojea commented Jun 14, 2021

eddiezane commented Jun 23, 2021

ffromani commented Jun 25, 2021

aojea commented Jun 25, 2021

k8s-triage-robot commented Sep 23, 2021

rtheis commented Sep 23, 2021

k8s-triage-robot commented Dec 22, 2021

rtheis commented Dec 22, 2021

k8s-triage-robot commented Mar 22, 2022

rtheis commented Mar 22, 2022

k8s-triage-robot commented Jun 20, 2022

rtheis commented Jun 20, 2022

k8s-triage-robot commented Sep 18, 2022

rtheis commented Sep 19, 2022

k8s-triage-robot commented Dec 18, 2022

rtheis commented Dec 20, 2022

saschagrunert commented Nov 29, 2023

rtheis commented Nov 29, 2023

Joseph-Goergen commented Nov 29, 2023

liggitt commented Nov 29, 2023

saschagrunert commented Nov 29, 2023

kwilczynski commented Nov 29, 2023

rtheis commented Nov 30, 2023

saschagrunert commented Nov 30, 2023

kwilczynski commented Nov 30, 2023 • edited

aojea commented Dec 1, 2023

kwilczynski commented Nov 30, 2023 •

edited