Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SchedulerPredicates [Serial] validates MaxPods limit number of pods that are allowed to run [Slow] {Kubernetes e2e suite} #27529

Closed
k8s-github-robot opened this issue Jun 16, 2016 · 18 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@k8s-github-robot
Copy link

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-subnet/3111/

Failed: SchedulerPredicates [Serial] validates MaxPods limit number of pods that are allowed to run [Slow] {Kubernetes e2e suite}

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:265
Error waiting for 320 pods to be running - probably a timeout
Expected error:
    <*errors.errorString | 0xc2093ae0d0>: {
        s: "Timeout while waiting for pods with labels \"startPodsID=b242ed80-3391-11e6-9d7b-0242ac110005\" to be running",
    }
    Timeout while waiting for pods with labels "startPodsID=b242ed80-3391-11e6-9d7b-0242ac110005" to be running
not to have occurred
@k8s-github-robot k8s-github-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. kind/flake Categorizes issue or PR as related to a flaky test. labels Jun 16, 2016
@wojtek-t
Copy link
Member

@gmarek

@gmarek
Copy link
Contributor

gmarek commented Jun 16, 2016

It's something we haven't seen for some time. Multiple errors like this one:

Jun 16 00:22:48.037: INFO: At 2016-06-16 00:13:09 -0700 PDT - event for maxp-270: {kubelet gke-auto-subnet-default-pool-6ba261dc-ge68} FailedSync: Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container 5fa941c9f87bc749cd4bea068b4f316fc91d2791e16da94e553f0887aaf19b79: no available IPv4 addresses on this network's address pools: bridge (b09d44b32c266a5c3d8f91e45c1296e80407a4fe6dbe9c3951b0d1e45f49cfe1)\n"

cc @yujuhong

@yujuhong
Copy link
Contributor

@gmarek what's the version of docker used in this suite?
If it's v1.9, based on the conclusion last time, it's a known issue and we are unlikely to dedicate more time to fix this.
If it's v1.11 (k8s v1.3+), that's the first failure we've seen. It's worth investigating.

@gmarek
Copy link
Contributor

gmarek commented Jun 16, 2016

Honestly - I have no idea. @spxtr @roberthbailey

@yujuhong
Copy link
Contributor

The upstream project of the suite is kubernetes-build-1.2, so I assume the docker is v1.9.

@yujuhong
Copy link
Contributor

/cc @dchen1107 just FYI.

I am closing this issue since this happens rarely even in v1.9, and we've not seen any failure for v1.11.

@gmarek
Copy link
Contributor

gmarek commented Jun 16, 2016

@fejta @lavalamp - is there a way to prevent merge bot from reopening this issue?

@lavalamp
Copy link
Member

Yes. Make it not flaky.

image

@gmarek
Copy link
Contributor

gmarek commented Jun 17, 2016

Well - I can't really fix the Docker v1.9, so the only way I can deflake it is to disable it.

@lavalamp
Copy link
Member

Feel free to disable the test if it sees that it's running docker 1.9.

On Thu, Jun 16, 2016 at 10:36 PM, Marek Grabowski notifications@github.com
wrote:

Well - I can't really fix the Docker v1.9, so the only way I can deflake
it is to disable it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#27529 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAnglv1753XiwuRy42narMczsMEwRJkqks5qMjJbgaJpZM4I3JrV
.

@gmarek
Copy link
Contributor

gmarek commented Jun 17, 2016

The problem is that this test is just more sensitive to this error - it can appear it any test.

@yujuhong
Copy link
Contributor

From my past experience, tests that creates a huge number of pod on a single node is prone to hit this bug. However, after the workaround (i.e. always move /lib/docker/network before restarting docker) has reduced the frequency of this a lot. How often do we see this bug (@gmarek, maybe you know more)? Is it rare enough that we can ignore it for now?

@lavalamp
Copy link
Member

That impacts our submit queue. Turning off the issue filing wouldn't solve
the problem. The whole point is for the issues to annoy humans into fixing
things... If there's a docker version that fixes this then why aren't we
running with it?

On Fri, Jun 17, 2016 at 1:20 PM, Marek Grabowski notifications@github.com
wrote:

The problem is that this test is just more sensitive to this error - it
can appear it any test.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#27529 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAnglhhgNtNJ57BK9R14XQquPyDF_Hlqks5qMwGSgaJpZM4I3JrV
.

@yujuhong
Copy link
Contributor

yujuhong commented Jun 17, 2016

We do run a newer version of docker in v1.3 and HEAD. This suite uses 1.2 builds, so it shouldn't affect submit queue at all.

@lavalamp
Copy link
Member

ah, it's from a different suite. We can take the suite out of the list.

@lavalamp
Copy link
Member

But then why do we even have that suite?

@yujuhong
Copy link
Contributor

There are quite a few GKE suites which are triggered by kubernetes-build-1.2, but their names don't have "1.2" in it. I have no idea what their purpose is and why even run this test in the "subnet" suite.

@roberthbailey
Copy link
Contributor

Many GKE suites are running against the 1.2 release branch since that's the stable branch that is currently supported in production. Why are gke-test and gke-staging tests in the non-blocking section of the submit queue? They shouldn't have any relation to whether or not we should be merging changes to head, since they are pinned at a specific k8s version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants