Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deflake kube-root-ca.crt e2e test #96274

Closed
wants to merge 1 commit into from
Closed

Conversation

zshihang
Copy link
Contributor

@zshihang zshihang commented Nov 5, 2020

What type of PR is this?

/kind flake

What this PR does / why we need it:

flaky in https://k8s-testgrid.appspot.com/sig-auth-gce#gce

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2020
@zshihang
Copy link
Contributor Author

zshihang commented Nov 5, 2020

/cc @liggitt

/sig auth
/priority important-soon
/triage accepted

@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 5, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zshihang
To complete the pull request process, please assign tallclair after the PR has been reviewed.
You can assign the PR to them by writing /assign @tallclair in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member

liggitt commented Nov 5, 2020

The controller is logging many attempt to create that are getting 404 errors:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"9bbb8140-e6bd-4912-857d-031edc9e1cbe","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/persistent-local-volumes-test-6923/configmaps","verb":"create","user":{"username":"system:serviceaccount:kube-system:root-ca-cert-publisher","uid":"64d531b6-6fc8-4255-9b0a-069317c8a0d0","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"sourceIPs":["::1"],"userAgent":"kube-controller-manager/v1.20.0 (linux/amd64) kubernetes/ac62c47/system:serviceaccount:kube-system:root-ca-cert-publisher","objectRef":{"resource":"configmaps","namespace":"persistent-local-volumes-test-6923","name":"kube-root-ca.crt","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Failure","reason":"NotFound","code":404},"requestReceivedTimestamp":"2020-11-05T16:29:16.609213Z","stageTimestamp":"2020-11-05T16:29:16.663037Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding "system:controller:root-ca-cert-publisher" of ClusterRole "system:controller:root-ca-cert-publisher" to ServiceAccount "root-ca-cert-publisher/kube-system""}}

It is not handling the case where a namespace is deleted properly, which is backing up the queue with retry attempts. I think we need this:

diff --git a/pkg/controller/certificates/rootcacertpublisher/publisher.go b/pkg/controller/certificates/rootcacertpublisher/publisher.go
index 34fd3127a5d..ee8925cb3f9 100644
--- a/pkg/controller/certificates/rootcacertpublisher/publisher.go
+++ b/pkg/controller/certificates/rootcacertpublisher/publisher.go
@@ -187,6 +187,10 @@ func (c *Publisher) syncNamespace(ns string) error {
 				"ca.crt": string(c.rootCA),
 			},
 		}, metav1.CreateOptions{})
+		// don't retry a create if the namespace doesn't exist or is terminating
+		if apierrors.IsNotFound(err) || apierrors.HasStatusCause(err, v1.NamespaceTerminatingCause) {
+			return nil
+		}
 		return err
 	case err != nil:
 		return err

@liggitt
Copy link
Member

liggitt commented Nov 5, 2020

I don't think this will be needed after #96277, but keep an eye on the testgrid to be sure

@zshihang
Copy link
Contributor Author

zshihang commented Nov 6, 2020

passing after #96277

@zshihang zshihang closed this Nov 6, 2020
@zshihang zshihang deleted the token branch November 6, 2020 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants