[e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator #50945

k8s-github-robot · 2017-08-19T00:22:21Z

Failure cluster 42229f8b33f735ea0213

Error text:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:65
creating cluster role wardler
Expected error:
    <*errors.StatusError | 0xc421134380>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: ""},
            Status: "Failure",
            Message: "clusterroles.rbac.authorization.k8s.io \"wardler\" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"create\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"delete\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"deletecollection\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"get\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"list\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"patch\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"update\"]} PolicyRule{Resources:[\"flunders\"], APIGroups:[\"wardle.k8s.io\"], Verbs:[\"watch\"]} PolicyRule{NonResourceURLs:[\"*\"], Verbs:[\"get\"]}] user=&{pr-kubekins@kubernetes-jenkins-pull.iam.gserviceaccount.com  [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:[\"selfsubjectaccessreviews\"], APIGroups:[\"authorization.k8s.io\"], Verbs:[\"create\"]} PolicyRule{NonResourceURLs:[\"/api\" \"/api/*\" \"/apis\" \"/apis/*\" \"/healthz\" \"/swaggerapi\" \"/swaggerapi/*\" \"/version\"], Verbs:[\"get\"]}] ruleResolutionErrors=[]",
            Reason: "Forbidden",
            Details: {
                Name: "wardler",
                Group: "rbac.authorization.k8s.io",
                Kind: "clusterroles",
                UID: "",
                Causes: nil,
                RetryAfterSeconds: 0,
            },
            Code: 403,
        },
    }
    clusterroles.rbac.authorization.k8s.io "wardler" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["create"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["delete"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["deletecollection"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["get"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["list"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["patch"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["update"]} PolicyRule{Resources:["flunders"], APIGroups:["wardle.k8s.io"], Verbs:["watch"]} PolicyRule{NonResourceURLs:["*"], Verbs:["get"]}] user=&{pr-kubekins@kubernetes-jenkins-pull.iam.gserviceaccount.com  [system:authenticated] map[]} ownerrules=[PolicyRule{Resources:["selfsubjectaccessreviews"], APIGroups:["authorization.k8s.io"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]}] ruleResolutionErrors=[]
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:331

Failure cluster statistics:

1 tests failed, 11 jobs failed, 241 builds failed.
Failure stats cover 1 day time range '17 Aug 2017 22:57 UTC' to '18 Aug 2017 22:57 UTC'.

Top failed tests by jobs failed:

Test Name	Jobs Failed
[sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator	11

Top failed jobs by builds failed:

Job Name	Builds Failed	Latest Failure
ci-kubernetes-e2e-gci-gke-multizone	42	18 Aug 2017 22:02 UTC
ci-kubernetes-e2e-gci-gke	40	18 Aug 2017 22:00 UTC
ci-kubernetes-e2e-gke-multizone	40	18 Aug 2017 22:11 UTC

Current Status

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2017-08-19T00:22:26Z

@k8s-merge-robot
There are no sig labels on this issue. Please add a sig label by:

mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR
specifying the label manually: /sig <label>
e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. You can find the group list here and label list here.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

ericchiang · 2017-08-21T21:26:29Z

Seeing this failure on the e2e release-master-blocking

https://k8s-testgrid.appspot.com/release-master-blocking#gke

Example failure

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke/13413#sig-api-machinery-aggregator-should-be-able-to-support-the-17-sample-api-server-using-the-current-aggregator

Looks like an RBAC issue

cc @kubernetes/sig-api-machinery-bugs

jdumars · 2017-08-23T20:05:27Z

@kubernetes/sig-api-machinery-bugs this is currently blocking the alpha.3 release

dims · 2017-08-23T20:19:53Z

@cheftako looks like this new test was added by in #50347 which was merged about 6 days ago and is failing consistently

Automatic merge from submit-queue Fixed gke auth update wait condition. Lookup whoami on gke using gcloud auth list. Make sure we do not run the test on any cluster older than 1.7. **What this PR does / why we need it**: Fixes issue with aggregator e2e test on GKE **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50945 **Special notes for your reviewer**: There is a TODO, follow up will be provided when the immediate problem is resolved. **Release note**: ```release-note NONE ```

ericchiang · 2017-09-20T16:11:02Z

This has started failing again on our GKE test suite https://k8s-testgrid.appspot.com/release-master-blocking#gke

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke/15715#sig-api-machinery-aggregator-should-be-able-to-support-the-17-sample-api-server-using-the-current-aggregator

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:69
attempting to delete a newly created flunders resource
Expected error:
    <*errors.StatusError | 0xc4211b4990>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
            Status: "Failure",
            Message: "the server could not find the requested resource",
            Reason: "NotFound",
            Details: {
                Name: "",
                Group: "",
                Kind: "",
                UID: "",
                Causes: [
                    {
                        Type: "UnexpectedServerResponse",
                        Message: "unknown",
                        Field: "",
                    },
                ],
                RetryAfterSeconds: 0,
            },
            Code: 404,
        },
    }
    the server could not find the requested resource
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:430

cc @kubernetes/sig-api-machinery-test-failures

k8s-github-robot · 2017-09-20T16:11:36Z

[MILESTONENOTIFIER] Milestone Labels Complete

@k8s-merge-robot

Issue label settings:

sig/api-machinery: Issue will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/bug: Fixes a bug discovered during the current release.

Additional instructions available here The commands available for adding these labels are documented here

cheftako · 2017-09-20T20:53:48Z

/assign @cheftako

ericchiang · 2017-09-20T21:41:51Z

Seems to be flaking now with the error:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:69
Sep 20 17:49:59.789: failed to get back the correct flunders list &{map[metadata:map[selfLink:/apis/wardle.k8s.io/v1alpha1/namespaces/sample-system/flunders resourceVersion:5] kind:FlunderList apiVersion:wardle.k8s.io/v1alpha1] []} from the dynamic client
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/aggregator.go:481

https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gke/15719#sig-api-machinery-aggregator-should-be-able-to-support-the-17-sample-api-server-using-the-current-aggregator

jdumars · 2017-09-21T15:00:25Z

@cheftako @ericchiang - we need to determine (today if possible) if this is truly release blocking. If so, please add the release-blocker label. And, if not, how do we best continue work on this for 1.8.x/1.9.0?

liggitt · 2017-09-21T17:15:29Z

this test has two passes and four failures on the same commit

I'm seeing gke-specific authz grants in that test that are incorrect:

3b9485b#diff-c944d1288edcaf37beebab811603bfd8L164

That commit removed the wait for the authz grant to become effective (which can lead to flakes), and granted superuser permissions to all users, which is incorrect and invalidates any other authz-related tests run in parallel with this test

liggitt · 2017-09-21T17:22:53Z

cc @kubernetes/sig-auth-test-failures

Aggregator e2e test is intermittantly failing on GKE but not GCE. Adding the following debugging for help trace issue. Make sure we always use the same rest client. Randomly generate the flunder resource name to detect parallel tests. Print endpoints for sample-system in case multiple instances. Print original and new pods in case the pod has been restarted. Fixed import list. Remove rand seed.

ericchiang · 2017-09-21T20:46:10Z

I can add the wait back.

edit: and fix the test given admin to all authenticated users.

ericchiang · 2017-09-21T21:14:28Z

Actually after staring at this test for about half an hour I can't figure out what different users exist or what permissions they're being granted. ClientSet, InternalClientset, and AggregatorClient are all initialized from the same config so I don't see how one would be able to create an RBAC binding but another would fail later.

kubernetes/test/e2e/framework/framework.go

Lines 156 to 161 in 6808e80

    
           f.ClientSet, err = clientset.NewForConfig(config) 
        
           Expect(err).NotTo(HaveOccurred()) 
        
           f.InternalClientset, err = internalclientset.NewForConfig(config) 
        
           Expect(err).NotTo(HaveOccurred()) 
        
           f.AggregatorClient, err = aggregatorclient.NewForConfig(config) 
        
           Expect(err).NotTo(HaveOccurred())

@cheftako any thoughts here?

cheftako · 2017-09-21T22:36:01Z

I honestly think the gke specific BindClusterRole is a red-herring. It is needed so the client has permission to perform one of the setup steps. (I think it was either to create the wardler cluster role or to bind that role to the anonymous user) Once that setup step is complete we no longer need the that cluster role bound and so I don't think its related.

liggitt · 2017-09-21T22:38:28Z

I don't see how one would be able to create an RBAC binding but another would fail later.

The gke authorizer allows the “bind” verb, so the client can create a binding to the cluster-admin. It cannot create a role directly unless it has permissions via RBAC. Since we don’t have a way to determine the username associated with iclient, binding to all authenticated users is what was done as a workaround.

liggitt · 2017-09-21T22:39:34Z

I agree that the point at which the tests are failing indicate that the previous authorization issues are not the cause.

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Debug for issues #50945 Aggregator e2e test is intermittantly failing on GKE but not GCE. Adding the following debugging for help trace issue. Make sure we always use the same rest client. Randomly generate the flunder resource name to detect parallel tests. Print endpoints for sample-system in case multiple instances. Print original and new pods in case the pod has been restarted. **What this PR does / why we need it**: Adds debugging for aggregator e2e test to track down GKE flakiness. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50945 **Special notes for your reviewer**: This is primarily additional debugging information. **Release note**: ```release-note NONE ```

cheftako · 2017-09-22T01:10:54Z

/open

cheftako · 2017-09-22T01:11:10Z

/reopen

cheftako · 2017-09-22T01:11:40Z

So a lot more information to work with now but the error is still occurring. I am still looking into this.

liggitt · 2017-09-23T01:05:04Z

@cheftako any update on the investigation?

spiffxp · 2017-09-25T16:51:28Z

https://storage.googleapis.com/k8s-gubernator/triage/index.html?test=aggregator

Friendly v1.8 release team ping. This failure still seems to be happening, is this actively being worked? Does this need to be in the v1.8 milestone?

Automatic merge from submit-queue (batch tested with PRs 51648, 53030, 53009). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.. Fixed intermitant e2e aggregator test on GKE. **What this PR does / why we need it**: Issue was caused by another test cleaning up its namespace. This caused the namespace controller to try to clean up that namespace. This involves deleting all flunders under that namespace. However the sample-apiserver was not honoring the namespace filter. So the flunders for the test would randomly disappear. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #50945 **Special notes for your reviewer**: Requires we fix the container image to contain this fix to work. **Release note**: ```release-note NONE ```

…3030-upstream-release-1.8 Automatic merge from submit-queue. Automated cherry pick of #53030 Cherry pick of #53030 on release-1.8. #53030: Fixed intermittant e2e aggregator test on GKE. **What this PR does / why we need it**: Issue was caused by another test cleaning up its namespace. This caused the namespace controller to try to clean up that namespace. This involves deleting all flunders under that namespace. However the sample-apiserver was not honoring the namespace filter. So the flunders for the test would randomly disappear. Relates to issue #50945 **Special notes for your reviewer**: Requires we fix the container image to contain this fix to work.

k8s-github-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Aug 19, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 19, 2017

ericchiang changed the title ~~Failure cluster [42229f...] failed 241 builds, 11 jobs, and 1 tests over 1 days~~ [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator Aug 21, 2017

ericchiang changed the title ~~[sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator~~ [e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator Aug 21, 2017

ericchiang added the kind/e2e-test-failure label Aug 21, 2017

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/bug Categorizes issue or PR as related to a bug. labels Aug 21, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 21, 2017

ericchiang added this to the v1.8 milestone Aug 21, 2017

cheftako mentioned this issue Aug 24, 2017

Fixed gke auth update wait condition. #51235

Merged

k8s-github-robot closed this as completed in #51235 Aug 26, 2017

ericchiang reopened this Sep 20, 2017

ericchiang added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Sep 20, 2017

k8s-github-robot added the milestone-labels-complete label Sep 20, 2017

ericchiang added this to Backlog in 1.8 Failing tests Sep 20, 2017

apelisse moved this from Backlog to Need owner in 1.8 Failing tests Sep 20, 2017

k8s-ci-robot assigned cheftako Sep 20, 2017

apelisse moved this from Need owner to in-progress in 1.8 Failing tests Sep 20, 2017

cheftako mentioned this issue Sep 20, 2017

Debug for issues #50945 #52816

Merged

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Sep 21, 2017

k8s-github-robot closed this as completed in #52816 Sep 21, 2017

k8s-ci-robot reopened this Sep 22, 2017

liggitt removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Sep 23, 2017

spiffxp mentioned this issue Sep 25, 2017

v1.8.0 blocking test list #53018

Closed

cheftako mentioned this issue Sep 26, 2017

Fixed intermitant e2e aggregator test on GKE. #53030

Merged

abgworrall mentioned this issue Sep 26, 2017

Automated cherry pick of #53030 #53039

Merged

k8s-github-robot closed this as completed in #53030 Sep 26, 2017

ericchiang moved this from in-progress to Fixed/Closed in 1.8 Failing tests Nov 3, 2017

liggitt mentioned this issue May 1, 2018

remove rootscopedkinds from groupmeta #63309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator #50945

[e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator #50945

k8s-github-robot commented Aug 19, 2017

k8s-github-robot commented Aug 19, 2017

ericchiang commented Aug 21, 2017

jdumars commented Aug 23, 2017

dims commented Aug 23, 2017

ericchiang commented Sep 20, 2017

k8s-github-robot commented Sep 20, 2017

cheftako commented Sep 20, 2017

ericchiang commented Sep 20, 2017

jdumars commented Sep 21, 2017

liggitt commented Sep 21, 2017

liggitt commented Sep 21, 2017

ericchiang commented Sep 21, 2017 •

edited

ericchiang commented Sep 21, 2017 •

edited

cheftako commented Sep 21, 2017 •

edited

liggitt commented Sep 21, 2017

liggitt commented Sep 21, 2017

cheftako commented Sep 22, 2017

cheftako commented Sep 22, 2017

cheftako commented Sep 22, 2017

liggitt commented Sep 23, 2017

spiffxp commented Sep 25, 2017

[e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator #50945

[e2e test failure] [sig-api-machinery] Aggregator Should be able to support the 1.7 Sample API Server using the current Aggregator #50945

Comments

k8s-github-robot commented Aug 19, 2017

Failure cluster 42229f8b33f735ea0213

Error text:

Failure cluster statistics:

Top failed tests by jobs failed:

Top failed jobs by builds failed:

k8s-github-robot commented Aug 19, 2017

ericchiang commented Aug 21, 2017

jdumars commented Aug 23, 2017

dims commented Aug 23, 2017

ericchiang commented Sep 20, 2017

k8s-github-robot commented Sep 20, 2017

cheftako commented Sep 20, 2017

ericchiang commented Sep 20, 2017

jdumars commented Sep 21, 2017

liggitt commented Sep 21, 2017

liggitt commented Sep 21, 2017

ericchiang commented Sep 21, 2017 • edited

ericchiang commented Sep 21, 2017 • edited

cheftako commented Sep 21, 2017 • edited

liggitt commented Sep 21, 2017

liggitt commented Sep 21, 2017

cheftako commented Sep 22, 2017

cheftako commented Sep 22, 2017

cheftako commented Sep 22, 2017

liggitt commented Sep 23, 2017

spiffxp commented Sep 25, 2017

ericchiang commented Sep 21, 2017 •

edited

ericchiang commented Sep 21, 2017 •

edited

cheftako commented Sep 21, 2017 •

edited