[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31589

k8s-github-robot · 2016-08-28T13:47:48Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/5336/

Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:197
scaling rc load-medium-rc-7 for the first time
Expected error:
    <*errors.errorString | 0xc837b9b590>: {
        s: "error while scaling RC load-medium-rc-7 to 38 replicas: timed out waiting for \"load-medium-rc-7\" to be synced",
    }
    error while scaling RC load-medium-rc-7 to 38 replicas: timed out waiting for "load-medium-rc-7" to be synced
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:318

Previous issues for this test: #26544 #26938 #27595 #30146 #30469 #31374 #31427 #31433

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2016-08-29T06:07:04Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/5357/

Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:197
scaling rc load-small-rc-1490 for the first time
Expected error:
    <*errors.errorString | 0xc83499f5f0>: {
        s: "error while scaling RC load-small-rc-1490 to 2 replicas: timed out waiting for the condition",
    }
    error while scaling RC load-small-rc-1490 to 2 replicas: timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:318

wojtek-t · 2016-08-29T07:26:45Z

I think there are two different issues here - I thikn that I already debugged the second case.
Basically, looking into logs, it seems that there are pretty big pauses between calls to apiserver from the test (my very strong hypothesis is that it is because of overloaded Jenkins machine where the test is running).
These are the logs from the offending RC update:

I0828 13:16:58.801043    3174 handlers.go:162] GET /api/v1/namespaces/e2e-tests-load-30-nodepods-3-i64c1/replicationcontrollers/load-medium-rc-7: (1.621528ms) 200 [[e2e.test/v1.4.0 (linux    /amd64) kubernetes/72fbb51] 104.154.21.165:43674]
I0828 13:17:02.461486    3174 handlers.go:162] PUT /api/v1/namespaces/e2e-tests-load-30-nodepods-3-i64c1/replicationcontrollers/load-medium-rc-7: (16.951018ms) 200 [[e2e.test/v1.4.0 (linu    x/amd64) kubernetes/72fbb51] 104.154.21.165:43713]
I0828 13:17:06.900362    3174 handlers.go:162] GET /api/v1/namespaces/e2e-tests-load-30-nodepods-3-i64c1/replicationcontrollers/load-medium-rc-7: (1.07016ms) 200 [[e2e.test/v1.4.0 (linux/    amd64) kubernetes/72fbb51] 104.154.21.165:43749]
I0828 13:17:06.902427    3174 handlers.go:162] GET /api/v1/watch/namespaces/e2e-tests-load-30-nodepods-3-i64c1/replicationcontrollers?fieldSelector=metadata.name%3Dload-medium-rc-7&resour    ceVersion=258064: (677.848µs) 200 [[e2e.test/v1.4.0 (linux/amd64) kubernetes/72fbb51] 104.154.21.165:43749]

As a result, we will end up with "too old resource version" error coming from watch (since it was called ~5 seconds later then the get for it).

However, this test is pretty big, so the solution for this problem is to actually make the size of the sliding window larger (we can afford it in large clusters). Will send a PR for it.

wojtek-t · 2016-08-29T07:26:51Z

@gmarek ^^

Automatic merge from submit-queue Increase cache size for RCs Ref #31589 [This should also help with failures of kubemark-scale.]

k8s-github-robot · 2016-08-30T02:56:40Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/5384/

Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:197
scaling rc load-big-rc-10 for the first time
Expected error:
    <*errors.errorString | 0xc83655c6c0>: {
        s: "error while scaling RC load-big-rc-10 to 128 replicas: timed out waiting for the condition",
    }
    error while scaling RC load-big-rc-10 to 128 replicas: timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:318

wojtek-t · 2016-08-30T06:31:37Z

We should double check, but my hypothesis is that it may (similarly to our failures of kubemark-scale) be a consequence of overloaded Jenkins machine.

wojtek-t · 2016-08-30T06:58:54Z

I checked the last failure and it's pretty obvious that the machine where the test is running is overloaded.
We are calling this code:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/scale.go#L164

Get() and Update() are the method visible in apiserver logs. And here are the logs (corresponding get and put operations for one of RCs):

I0830 02:40:15.818176    3176 handlers.go:162] GET /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (806.784µs) 200 [[e2e.test/v1.4.0 (linux/a     md64) kubernetes/956501b] 104.154.21.165:51865]
I0830 02:40:29.498850    3176 handlers.go:162] PUT /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (1.109854ms) 409 [[e2e.test/v1.4.0 (linux/     amd64) kubernetes/956501b] 104.154.21.165:51981]
...
I0830 02:40:47.138619    3176 handlers.go:162] GET /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (1.013773ms) 200 [[e2e.test/v1.4.0 (linux/     amd64) kubernetes/956501b] 104.154.21.165:52145]
I0830 02:41:08.338827    3176 handlers.go:162] PUT /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (1.155109ms) 409 [[e2e.test/v1.4.0 (linux/     amd64) kubernetes/956501b] 104.154.21.165:52388]
...
I0830 02:41:30.058385    3176 handlers.go:162] GET /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (750.595µs) 200 [[e2e.test/v1.4.0 (linux/a     md64) kubernetes/956501b] 104.154.21.165:52605]
I0830 02:41:49.539039    3176 handlers.go:162] PUT /api/v1/namespaces/e2e-tests-load-30-nodepods-1-w11cf/replicationcontrollers/load-big-rc-10: (1.312908ms) 409 [[e2e.test/v1.4.0 (linux/     amd64) kubernetes/956501b] 104.154.21.165:52865]

As you can see, there are even ~20s breaks between those two consecutive calls.
@fejta @ixdy - FYI

gmarek · 2016-08-30T07:17:41Z

FYI @fejta @ixdy

k8s-github-robot · 2016-08-31T01:09:29Z

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-kubemark-500-gce/5412/

Failed: [k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:197
scaling rc load-big-rc-7 for the first time
Expected error:
    <*errors.errorString | 0xc8213e1cd0>: {
        s: "error while scaling RC load-big-rc-7 to 294 replicas: timed out waiting for the condition",
    }
    error while scaling RC load-big-rc-7 to 294 replicas: timed out waiting for the condition
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/load.go:318

wojtek-t · 2016-09-01T08:30:48Z

Hmm - we are now running tests on exclusive machines, and the symptoms are pretty much the same.

The new hypothesis is that maybe client (from test) is being throttled?

wojtek-t · 2016-09-01T09:21:58Z

Yeah - I confirmed running large kubemark on my own cluster, that the problem is actually throttling in the e2e test client.
I will send out a PR that is fixing this.

Automatic merge from submit-queue Higher client qps in load Fix #31589

Automatic merge from submit-queue Make it possible to set higher limit for in-flight requests in test Ref #31589

k8s-github-robot assigned wojtek-t Aug 28, 2016

k8s-github-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. kind/flake Categorizes issue or PR as related to a flaky test. labels Aug 28, 2016

wojtek-t mentioned this issue Aug 29, 2016

Increase cache size for RCs #31599

Merged

k8s-github-robot pushed a commit that referenced this issue Aug 29, 2016

Merge pull request #31599 from wojtek-t/fix_load_test

8675adf

Automatic merge from submit-queue Increase cache size for RCs Ref #31589 [This should also help with failures of kubemark-scale.]

k8s-github-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Aug 30, 2016

wojtek-t assigned gmarek Aug 30, 2016

k8s-github-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 31, 2016

wojtek-t mentioned this issue Aug 31, 2016

Isolate large kubemark executors to heavy machines kubernetes/test-infra#478

Merged

This was referenced Sep 1, 2016

Higher client qps in load #31865

Merged

Increase limit for inflight requests in apiserver for large kubemarks kubernetes/test-infra#490

Merged

Make it possible to set higher limit for in-flight requests in test #31868

Merged

k8s-github-robot closed this as completed in #31865 Sep 1, 2016

k8s-github-robot pushed a commit that referenced this issue Sep 1, 2016

Merge pull request #31865 from wojtek-t/higher_client_qps_in_load

ce1e17f

Automatic merge from submit-queue Higher client qps in load Fix #31589

k8s-github-robot pushed a commit that referenced this issue Sep 1, 2016

Merge pull request #31868 from wojtek-t/enable_more_inflight_requests

ef2dde9

Automatic merge from submit-queue Make it possible to set higher limit for in-flight requests in test Ref #31589

This was referenced Sep 2, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31981

Closed

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #32257

Closed

k8s-github-robot mentioned this issue Sep 29, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #33711

Closed

k8s-github-robot mentioned this issue Sep 30, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #33839

Closed

k8s-github-robot mentioned this issue Nov 10, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #36547

Closed

This was referenced Nov 18, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #37111

Closed

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #37470

Closed

k8s-github-robot mentioned this issue Apr 24, 2017

ci-kubernetes-e2e-gci-gke-staging: broken test run #43037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31589

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31589

k8s-github-robot commented Aug 28, 2016

k8s-github-robot commented Aug 29, 2016

wojtek-t commented Aug 29, 2016

wojtek-t commented Aug 29, 2016

k8s-github-robot commented Aug 30, 2016

wojtek-t commented Aug 30, 2016

wojtek-t commented Aug 30, 2016 •

edited

gmarek commented Aug 30, 2016

k8s-github-robot commented Aug 31, 2016

wojtek-t commented Sep 1, 2016

wojtek-t commented Sep 1, 2016

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31589

[k8s.io] Load capacity [Feature:Performance] should be able to handle 30 pods per node {Kubernetes e2e suite} #31589

Comments

k8s-github-robot commented Aug 28, 2016

k8s-github-robot commented Aug 29, 2016

wojtek-t commented Aug 29, 2016

wojtek-t commented Aug 29, 2016

k8s-github-robot commented Aug 30, 2016

wojtek-t commented Aug 30, 2016

wojtek-t commented Aug 30, 2016 • edited

gmarek commented Aug 30, 2016

k8s-github-robot commented Aug 31, 2016

wojtek-t commented Sep 1, 2016

wojtek-t commented Sep 1, 2016

wojtek-t commented Aug 30, 2016 •

edited