New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods communicating via websockets and API hanging #67457
Comments
/sig network |
Hit same issue in my lab.
|
found a workaround: add request timeout, not sure if it is a proper solution.
|
We are hitting the same issue with K8S v1.11.0 K8S & python client 7.0.0 (same behavior also with k8s py client 8.0.0). Is there any update on that case? |
Same issue here |
Same issue here as well. |
Are you connecting via direct pod IPs, or via service names/IPs? Is it possible that the destination pod is dying and being restarted at any point? (eg, do "kubectl get pods" and see if RESTARTS is non-0). |
@danwinship If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
/remove-triage unresolved |
In my case, the connection is created directly to the pod IP. I am pretty sure the target pod is not being restarted during the connection time. |
If you're connecting directly to the pod IP, then this would most likely be a problem with your network plugin, not with kubernetes itself |
@danwinship: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
We have python tasks running on kube pods that need to communicate with each other. We run commands in one pod, from another pod, by using connect_get_namespaced_pod_exec [1] via the Kubernetes Python API, which, if we understand correctly, is a wrapper for 'kubectl exec'.
Once in a while the tasks seem to hang after the stream call sending the command to the other task/pod.
The issue is that the call never seems to return, which makes the tasks hang forever and the underlying pods appears running without any error.
We don’t have a proper fix, we have to delete the pod and re-create it completely, which most likely leads to the task then terminating successfully.
Is it a problematic practice to regularly run commands inside a pod with kubectl exec as part of an application's logic? Can we expect this interface to be stable?
[1] https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#connect_get_namespaced_pod_exec
What you expected to happen:
Most of the time, the websocket calls return, which is what we expect. Right now, even if the underlying socket is actually being terminated or the connection timing out (I don’t know if that is the case though), the call itself never returns.
How to reproduce it (as minimally and precisely as possible):
We don’t have a systematic way to reproduce the issue. We have python tasks that do the following:
from kubernetes.client.apis import core_v1_api
from kubernetes.stream import stream
def some_method_in_some_class(self, cmd: str):
self.api = core_v1_api.CoreV1Api()
[…]
# The call below hangs sometimes
return_value = stream(
self.api.connect_get_namespaced_pod_exec,
self.web_pod_name,
self.namespace,
command=['airflow'] + cmd,
container='web',
stdin='True',
stdout='True’)
[…]
Environment:
kubectl version
): 1 (1.9.7-gke.5)/sig bugs
The text was updated successfully, but these errors were encountered: