Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exec timeout does not kick users out on kubectl client 1.21 #102569

Open
Joseph-Goergen opened this issue Jun 3, 2021 · 29 comments
Open

Exec timeout does not kick users out on kubectl client 1.21 #102569

Joseph-Goergen opened this issue Jun 3, 2021 · 29 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@Joseph-Goergen
Copy link
Contributor

What happened:

Exec-ing onto a pod does not kick you out.

What you expected to happen:

To get kicked out of the pod after the container runtime specified time is reached.

How to reproduce it (as minimally and precisely as possible):

  • configure your kubectl client to 1.21
  • kubectl exec -it <any pod> -- sh
  • wait however long your container runtime is set to and see that it doesn't kick you out.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
$ k version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1+IKS", GitCommit:"04e16b9d6c7af21e462f7ac675c34b667cd6149e", GitTreeState:"clean", BuildDate:"2021-05-24T08:19:13Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
    We're using containerd 1.5.2 and konnectivity 0.19

/sig cli

@Joseph-Goergen Joseph-Goergen added the kind/bug Categorizes issue or PR as related to a bug. label Jun 3, 2021
@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 3, 2021
@Joseph-Goergen
Copy link
Contributor Author

@rtheis
Copy link

rtheis commented Jun 4, 2021

Related to #97083.

@jmcmeek
Copy link
Contributor

jmcmeek commented Jun 7, 2021

One perspective on this. We use the containerd stream idle timeout to implement a requirement that kubectl exec sessions be killed after being idle for 15 minutes. On the surface it seems like this issue exposes a problem on the server side (containerd or Kubernetes) as anyone with this version of kubectl is unwittingly circumventing the timeout. Or maybe we are misusing the stream idle timeout?

@aojea
Copy link
Member

aojea commented Jun 14, 2021

/assign

@eddiezane
Copy link
Member

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 23, 2021
@soltysh soltysh removed the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Jun 23, 2021
@ffromani
Copy link
Contributor

/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2021
@aojea
Copy link
Member

aojea commented Jun 25, 2021

One perspective on this. We use the containerd stream idle timeout to implement a requirement that kubectl exec sessions be killed after being idle for 15 minutes. On the surface it seems like this issue exposes a problem on the server side (containerd or Kubernetes) as anyone with this version of kubectl is unwittingly circumventing the timeout. Or maybe we are misusing the stream idle timeout?

containerd/containerd#5563 (comment)

@n4j n4j added this to Triaged in SIG Node Bugs Jul 9, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021
@rtheis
Copy link

rtheis commented Sep 23, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021
@rtheis
Copy link

rtheis commented Dec 22, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2022
@rtheis
Copy link

rtheis commented Mar 22, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2022
@rtheis
Copy link

rtheis commented Jun 20, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2022
@rtheis
Copy link

rtheis commented Sep 19, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2022
@rtheis
Copy link

rtheis commented Dec 20, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2022
@saschagrunert
Copy link
Member

Kubernetes v1.21 is out of support since quite some time. Do we need to keep this issue open?

@rtheis
Copy link

rtheis commented Nov 29, 2023

@Joseph-Goergen does this problem still exist?

@Joseph-Goergen
Copy link
Contributor Author

Yes, we expected to get kicked out after 15 minutes.

root@lima-rancher-desktop:/home/test# date; kubectl exec -it -n kube-system   public-crcligg6n106oup9la3opg-alb1-84647bbdc6-9jbxv -- sh; date
Wed Nov 29 09:05:29 CST 2023
Defaulted container "nginx-ingress" out of: nginx-ingress, sysctl (init)
/etc/nginx $ exit
Wed Nov 29 09:34:07 CST 2023
root@lima-rancher-desktop:/home/test# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.1", GitCommit:"e251b5ebd2466adde7b395bf3ab515b8f6cd0c84", GitTreeState:"clean", BuildDate:"2023-07-28T20:59:48Z", GoVersion:"go1.19.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"29+", GitVersion:"v1.29.0-rc.0+IKS", GitCommit:"72e05a14a44f21af99ba99a39391620d6495a7a9", GitTreeState:"clean", BuildDate:"2023-11-22T17:43:16Z", GoVersion:"go1.21.4", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.26) and server (1.29) exceeds the supported minor version skew of +/-1

@liggitt
Copy link
Member

liggitt commented Nov 29, 2023

I don't think this is a bug, I think it's a misapplication of stream_idle_timeout to mean "no user activity", since the keepalive ping (needed to make slow-moving log and port-forward connections stable) is making the connection "not idle".

@saschagrunert
Copy link
Member

The option StreamingConnectionIdleTimeout is not used any more in the kubelet, I think it's gone with dockershim. Tested the change there: #122104

@kwilczynski
Copy link
Contributor

To add to what @saschagrunert mentioned above.

There are no users left of this field in the underlying type, and this is not used anywhere any more. I found out about this while looking into idle stream timeouts used together with CRI-O.

That said, I also believe that streaming was moved out from kubelet some time ago, and now the CRI-compatible runtimes are expected to handle this functionality (CRI-O does already).

@Joseph-Goergen, I believe that you can set this too for containerd, per:

@rtheis
Copy link

rtheis commented Nov 30, 2023

@liggitt @kwilczynski we do set stream_idle_timeout and expected to get booted after that the timeout but as you noted there is a "keepalive ping" so the connection is not really idle. We see #115493 opened to provide an option along with cri-o/cri-o#4995 and containerd/containerd#5563 opened against the container runtimes. How do you all recommend that we proceed to at least support an option for an exec timeout?

@saschagrunert
Copy link
Member

@rtheis we already have --stream-idle-timeout in CRI-O. Handling this at the container runtime level seems right to me.

@kwilczynski
Copy link
Contributor

kwilczynski commented Nov 30, 2023

@rtheis, we are troubleshooting a similar issue in OpenShift at the moment, albeit affecting our oc command-line utility, which is kubectl compatible.

That said, there is a change in dependency for the CLI, both oc and kubectl (since oc follows dependencies from the upstream, as such, it makes sense both are affected) around the Kubernetes 1.21 release.

If you grab kubectl 1.20, it will work fine, and the timeout in the CRI will be obeyed and consistent.

Any newer release will, at least in my testing, have an inconsistent behaviour, one of the:

  • Failure due to SPDY PING response failure
  • Never time out—it will eventually reach some default timeout, or the 4 hours hardcoded one
  • Time out on point—happens very rarely
  • Time out somewhere above the allotted deadline, often adding a lot of slack to the deadline...

I am not sure what is causing this. However, I am positive that it's not the CRI, so it's neither CRI-O nor containerd. I am also not convinced that SPDY PING being sent back and forth across to the multiplexed connection is the culprit
here since I see these also taking place when an older client is used, and things work fine with it.

Thoughts?

Update: This appears to be related to SPDY PING, contrary to what I initially thought, as when I reverted the change from #97083 (part of the 1.21 release), the timeout issue will no longer be impacted.

Perhaps the solution to the problem suggested in the #115493 would be a way forward.

That said, another person suggests that disabling SPDY PING didn't help them much, per: #115493 (comment).

There is also a matter of expectations: many of our users expect idle connections to be closed, preferably without having to configure anything on the client side - which would then control the behaviour of this feature, and this might not be desirable.

@aojea
Copy link
Member

aojea commented Dec 1, 2023

it is the SPDY ping containerd/containerd#5563 (comment)

This is a problem of when a bug becomes a feature, the spdy library conflated the ping control frames with the data frame, so the pings account as part of the session, renewing it and never reaching the timeout. Ping control frames should not be considered for the stream timeout, however, it is not possible to fix that in a backwards compatible way, and my suggestion to add a new timeout based on session activity was discarded ... maybe now that we are moving from spdy to websockets we can fix it, but as today there is no more chances than disabling spdy pings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests