You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
We have large scale k8s 1.7 cluster with etcd 3.1.9, and we meet apiserver performance issue these days.
The phenomena is pod action become very slow including get/list/update.
We have lots of applications running in the cluster which do pod list periodically using REST client with empty ListOption or GetOption.
I am not sure why k8s apiserver design the List call like this. Even kubectl get po -v 10, the request has no resourceVersion in the query string. Which means, it does not use apiserver cache.
Can anyone explain the reason of above design?
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
k8s 1.7 + etcd 3.1.9 + 10000 pods
Do large scale of list pods without resourceVersion specified.
test 1:
kubemark nodes: ~900
total pods in cluster: ~10000
run `500` threads to list pods (list 7000+pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *20s*
test 2:
kubemark nodes: ~900
total pods in cluster: ~10000
run `2000` threads to list pods (list 7 pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *0.2s*
summary:
If there're too many `heavy list query`(example: `api/v1/pods`, even add fieldSelector), apiserver/etcd cannot tolerant the load.
Anything else we need to know?:
Environment:
Kubernetes version (use kubectl version): 1.7
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others: etcd 3.1.9
The text was updated successfully, but these errors were encountered:
@xinxiaogang: There are no sig labels on this issue. Please add a sig label.
A sig label can be added by either:
mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR
specifying the label manually: /sig <group-name>
e.g., /sig scalability to apply the sig/scalability label
Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
We have large scale k8s 1.7 cluster with etcd 3.1.9, and we meet apiserver performance issue these days.
The phenomena is pod action become very slow including get/list/update.
We have lots of applications running in the cluster which do pod list periodically using REST client with empty ListOption or GetOption.
From apiserver code, I see if the List request contains no
resourceVersion
, it will directly hit etcd storage while not hit the cache in apiserver.https://github.com/kubernetes/kubernetes/blob/release-1.7/staging/src/k8s.io/apiserver/pkg/storage/cacher.go#L461
I am not sure why k8s apiserver design the List call like this. Even
kubectl get po -v 10
, the request has noresourceVersion
in the query string. Which means, it does not use apiserver cache.Can anyone explain the reason of above design?
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
k8s 1.7 + etcd 3.1.9 + 10000 pods
Do large scale of list pods without
resourceVersion
specified.Anything else we need to know?:
Environment:
kubectl version
): 1.7uname -a
):The text was updated successfully, but these errors were encountered: