Default get/list pod request does not hit apiserver cache #61343

xinxiaogang · 2018-03-19T10:40:41Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
We have large scale k8s 1.7 cluster with etcd 3.1.9, and we meet apiserver performance issue these days.
The phenomena is pod action become very slow including get/list/update.
We have lots of applications running in the cluster which do pod list periodically using REST client with empty ListOption or GetOption.

From apiserver code, I see if the List request contains no resourceVersion, it will directly hit etcd storage while not hit the cache in apiserver.
https://github.com/kubernetes/kubernetes/blob/release-1.7/staging/src/k8s.io/apiserver/pkg/storage/cacher.go#L461

I am not sure why k8s apiserver design the List call like this. Even kubectl get po -v 10, the request has no resourceVersion in the query string. Which means, it does not use apiserver cache.

Can anyone explain the reason of above design?

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
k8s 1.7 + etcd 3.1.9 + 10000 pods
Do large scale of list pods without resourceVersion specified.

test 1:
kubemark nodes: ~900
total pods in cluster: ~10000
run `500` threads to list pods (list 7000+pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *20s*

test 2:
kubemark nodes: ~900
total pods in cluster: ~10000
run `2000` threads to list pods (list 7 pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *0.2s*

summary:
If there're too many `heavy list query`(example: `api/v1/pods`, even add fieldSelector), apiserver/etcd cannot tolerant the load.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.7
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others: etcd 3.1.9

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2018-03-19T10:40:42Z

@xinxiaogang: There are no sig labels on this issue. Please add a sig label.

A sig label can be added by either:

mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR
specifying the label manually: /sig <group-name>
e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

liggitt · 2018-03-19T13:00:22Z

See #59848 for an example of why this is not safe to do by default

Serving from the cache loses quorum safety, and means you can get stale, in some cases very stale, data back

/close

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Mar 19, 2018

k8s-ci-robot assigned liggitt Mar 19, 2018

k8s-ci-robot closed this as completed Mar 19, 2018

xinxiaogang mentioned this issue Apr 2, 2018

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees #59848

Open

tamilselvan1102 mentioned this issue Sep 21, 2023

support pod namespace index in cache #120778

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default get/list pod request does not hit apiserver cache #61343

Default get/list pod request does not hit apiserver cache #61343

xinxiaogang commented Mar 19, 2018

k8s-ci-robot commented Mar 19, 2018

liggitt commented Mar 19, 2018

Default get/list pod request does not hit apiserver cache #61343

Default get/list pod request does not hit apiserver cache #61343

Comments

xinxiaogang commented Mar 19, 2018

k8s-ci-robot commented Mar 19, 2018

liggitt commented Mar 19, 2018