Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api request hang and stuck #1740

Closed
geotransformer opened this issue Mar 11, 2022 · 11 comments
Closed

api request hang and stuck #1740

geotransformer opened this issue Mar 11, 2022 · 11 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@geotransformer
Copy link

geotransformer commented Mar 11, 2022

What happened (please include outputs or screenshots):
ubuntu@xxxx-control-plane-1:~ kubectl logs -n foo xxx-controller-bc4db5d46-h594h
2022-03-10 23:43:49.065 INFO main: Check Apiserver connection
2022-03-10 23:43:51.946 WARNING urllib3.connectionpool: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f4d5a8d3630>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
ubuntu@xxxx-control-plane-1:~

What you expected to happen:
ubuntu@xxxx-control-plane: kubectl logs xxx-controller-5696b686fb-c5tb5 --previous
2022-02-27 19:37:30.000 INFO main: Check Apiserver connection
2022-02-27 19:37:32.251 WARNING urllib3.connectionpool: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b6d8>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:35.323 WARNING urllib3.connectionpool: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b7f0>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:38.395 WARNING urllib3.connectionpool: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b898>: Failed to establish a new connection: [Errno 113] No route to host',)': /apis/apiextensions.k8s.io/v1/
2022-02-27 19:37:41.467 ERROR main: Exception when calling ApiextensionsV1Api->get_api_resources: HTTPSConnectionPool(host='10.96.0.1', port=443): Max retries exceeded with url: /apis/apiextensions.k8s.io/v1/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faacf65b978>: Failed to establish a new connection: [Errno 113] No route to host',))

How to reproduce it (as minimally and precisely as possible):
Happened intermittently, Cannot reproduce it manually. Only happens in automation testing in a multi-nodes k8s cluster

Anything else we need to know?:

#!/usr/bin/env python3

try:
    from kubernetes import client, config, watch
    from kubernetes.client.configuration import Configuration
    from kubernetes.config import kube_config
    import os
    import time
    import sys
except ImportError as e:
    raise ImportError(str(e)

class xxx_controller:
    def __init__(self):
        self.api_client = client.ApiClient()
        self.v1 = client.ApiextensionsV1Api(self.api_client)
        self.api_instance = client.CoreV1Api(self.api_client)
        self.crds = client.CustomObjectsApi(self.api_client)

    def check_apiserver_conn(self):
        try:
            logger.info("Check Apiserver connection")
            api_response = self.v1.get_api_resources()
        except Exception as e:
            logger.error("Exception when calling ApiextensionsV1Api->get_api_resources: %s\n" % e)

def main():
    config.load_incluster_config()
    cObj = xxx_controller()
    cObj.check_apiserver_conn()

if __name__ == "__main__":
    main()

Environment:

  • Kubernetes version (kubectl version): 1.21
  • OS (e.g., MacOS 10.13.6): Ubuntu 18.04
  • Python version (python --version) 3.6
  • Python client version (pip list | grep kubernetes)
    root@xxx-bc4db5d46-h594h:/opt/run/server# pip list | grep kube
    kubernetes 21.7.0
    WARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.
    You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
@geotransformer geotransformer added the kind/bug Categorizes issue or PR as related to a bug. label Mar 11, 2022
@roycaihw
Copy link
Member

It's hard to tell what went wrong from the error message along. It's a networking connection issue. If you could manually reproduce the issue or provide more details we may be able to help more.

@aviresonai
Copy link

We expirience a similar (and reporducable) problem with read_namespaced_stateful_set
This is a regression in release 23.3.0 (22.6.0) works fine
our code runs on GKE version 1.20.15-gke.1000 inside a pod and looks like


k8s.config.load_incluster_config()
api_client = k8s.client.ApiClient()
self.k8sappsclient = k8s.client.AppsV1Api(api_client)
self.k8sappsclient.read_namespaced_stateful_set("mysetname", "my_namepsace")

the call to read_namespaced_stateful_set never returns

@roycaihw
Copy link
Member

Does the server respond if you call a different API, or use kubectl inside the pod? Would be usefully if we can capture the HTTP requests (using debug logging for this client, and -v=9 for kubectl)

@aviresonai
Copy link

Does the server respond if you call a different API, or use kubectl inside the pod? Would be usefully if we can capture the HTTP requests (using debug logging for this client, and -v=9 for kubectl)

Tried the debug option and it hep me understand that

  1. the hung issue was related to our logging and not to kubernetes driver
  2. The actual issue is that 23.3.0 throws an exception on this API (which we havent seen on previous versions) - below is the actual exception stack trace

2022-03-31 05:20:40,915 - yowza.yapi.k8s.k8s_cluster_facade(140490894980928) - k8s_cluster_facade.py:544 - INFO - call read_namespaced_stateful_set
Traceback (most recent call last):
File "irocket/pepper/pepper.py", line 320, in
main()
File "irocket/pepper/pepper.py", line 314, in main
pepper = Pepper()
File "irocket/pepper/pepper.py", line 71, in init
self.k8s_cluster_facade.query_stateful_set_replicas(self.mesh_stateful_set_name)
File "/usr/src/app/yapi/k8s/k8s_cluster_facade.py", line 545, in query_stateful_set_replicas
api_response = self.k8sappsclient.read_namespaced_stateful_set(stateful_set, self.namespace)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api/apps_v1_api.py", line 7223, in read_namespaced_stateful_set
return self.read_namespaced_stateful_set_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api/apps_v1_api.py", line 7324, in read_namespaced_stateful_set_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 192, in __call_api
return_data = self.deserialize(response_data, response_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 264, in deserialize
return self.__deserialize(data, response_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
kwargs[attr] = self.__deserialize(value, attr_type)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
instance = klass(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/models/v1_stateful_set_status.py", line 79, in init
self.available_replicas = available_replicas
File "/usr/local/lib/python3.7/dist-packages/kubernetes/client/models/v1_stateful_set_status.py", line 119, in available_replicas
raise ValueError("Invalid value for available_replicas, must not be None") # noqa: E501
ValueError: Invalid value for available_replicas, must not be None

@roycaihw
Copy link
Member

roycaihw commented Apr 1, 2022

Thanks @aviresonai! Interesting, not sure if @geotransformer hit the same issue.

Invalid value for available_replicas, must not be None

That is almost certainly a similar issue to kubernetes-client/gen#52. Reading the k8s API, it looks like the field used to be optional, but in kubernetes/kubernetes#104045 it was changed to be a required field. However the API server may still return a statefulset with this field missing, which fails the openapi-generated client-side validation.

Typically we fix this kind of issues by marking the field optional in k8s API. Would you mind opening an issue/PR in k8s?

@aviresonai
Copy link

Thanks @aviresonai! Interesting, not sure if @geotransformer hit the same issue.

Invalid value for available_replicas, must not be None

That is almost certainly a similar issue to kubernetes-client/gen#52. Reading the k8s API, it looks like the field used to be optional, but in kubernetes/kubernetes#104045 it was changed to be a required field. However the API server may still return a statefulset with this field missing, which fails the openapi-generated client-side validation.

Typically we fix this kind of issues by marking the field optional in k8s API. Would you mind opening an issue/PR in k8s?

Thanks @roycaihw , I am not sure how to describe this as n API error (the API version we use is older then the client and at that time this field was not mandatory) yet , working with older versions of the API seems like a requirment for the client library not for the API

@roycaihw
Copy link
Member

roycaihw commented Apr 4, 2022

The fix will be in the upstream openapi spec, and eventually land in this python client. You're correct that we won't change the server's behavior. Here is an example: kubernetes/kubernetes#64996

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 2, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants