Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default get/list pod request does not hit apiserver cache #61343

Closed
xinxiaogang opened this issue Mar 19, 2018 · 2 comments
Closed

Default get/list pod request does not hit apiserver cache #61343

xinxiaogang opened this issue Mar 19, 2018 · 2 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one.

Comments

@xinxiaogang
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
We have large scale k8s 1.7 cluster with etcd 3.1.9, and we meet apiserver performance issue these days.
The phenomena is pod action become very slow including get/list/update.
We have lots of applications running in the cluster which do pod list periodically using REST client with empty ListOption or GetOption.

From apiserver code, I see if the List request contains no resourceVersion, it will directly hit etcd storage while not hit the cache in apiserver.
https://github.com/kubernetes/kubernetes/blob/release-1.7/staging/src/k8s.io/apiserver/pkg/storage/cacher.go#L461

I am not sure why k8s apiserver design the List call like this. Even kubectl get po -v 10, the request has no resourceVersion in the query string. Which means, it does not use apiserver cache.

Can anyone explain the reason of above design?

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
k8s 1.7 + etcd 3.1.9 + 10000 pods
Do large scale of list pods without resourceVersion specified.

test 1:
kubemark nodes: ~900
total pods in cluster: ~10000
run `500` threads to list pods (list 7000+pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *20s*

test 2:
kubemark nodes: ~900
total pods in cluster: ~10000
run `2000` threads to list pods (list 7 pods in per request `without resourceVersion`)
result:
time to exec `kubectl get pod -n default`: *0.2s*

summary:
If there're too many `heavy list query`(example: `api/v1/pods`, even add fieldSelector), apiserver/etcd cannot tolerant the load.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.7
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others: etcd 3.1.9
@k8s-ci-robot
Copy link
Contributor

@xinxiaogang: There are no sig labels on this issue. Please add a sig label.

A sig label can be added by either:

  1. mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
    e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR

  2. specifying the label manually: /sig <group-name>
    e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. See the group list.
The <group-suffix> in method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Mar 19, 2018
@liggitt
Copy link
Member

liggitt commented Mar 19, 2018

See #59848 for an example of why this is not safe to do by default

Serving from the cache loses quorum safety, and means you can get stale, in some cases very stale, data back

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants