Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.11] OCPBUGS-853: certificate-publisher: Don't publish extraneous certificates #824

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Sep 6, 2022

certificate-publisher: Delete secretIsInUse

Delete the secretIsInUse predicate for the watch on secrets. The predicate is superfluous because the map func maps the event to an empty slice anyway if no ingresscontroller uses the secret.

  • pkg/operator/controller/certificate-publisher/controller.go (New): Delete the predicates on the watch on secrets.
    (secretIsInUse): Delete function.

certificate-publisher: Simplify the map func

Refactor the secretToIngressController map func and the ingressControllersWithSecret helper. The helper is no longer used by any other functions besides secretToIngressController, so the helper can be inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

  • pkg/operator/controller/certificate-publisher/controller.go (secretToIngressController): Inline ingressControllersWithSecret.
    (ingressControllersWithSecret): Delete function.

certificate: Log reconcile requests.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Log the reconcile request.

Move "default-ingress-cert" configmap publishing

Move the reconciliation of the "default-ingress-cert" configmap from the "certificate" controller to the "certificate-publisher" controller in order to ensure that the configmap gets updated when the default-certificate secret is updated.

Before this change, the certificate-publisher controller already published the "router-certs" secret in the "openshift-config-managed" namespace. The router-certs secret is supposed to have the certificate and key of the ingresscontroller that has the cluster ingress domain, so the controller watches secrets as well as ingresscontrollers in order to update the router-certs secret when the ingresscontroller is updated to reference a different secret or when the content of the referenced secret is updated.

In contrast, the certificate controller was originally written to generate a self-signed CA and generate default certificates for ingresscontrollers using this CA, and for these purposes, the controller only needs to watch ingresscontrollers and not the secrets themselves.

#331 added reconciliation of the default-ingress-cert configmap to the certificate controller. This configmap has the default certificate of the "default" ingresscontroller. The configmap should be updated when the secret reference or content is updated. Moreover, it makes sense conceptually for the certificate-publisher controller to handle both publication tasks: both the router-certs secret as well as the default-ingress-certs configmap.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Move the call to ensureDefaultIngressCertConfigMap from here...
  • pkg/operator/controller/certificate-publisher/controller.go (Reconcile): ...to here.
  • pkg/operator/controller/certificate/publish_ca.go: Rename...
  • pkg/operator/controller/certificate-publisher/publish_ca.go: ...to this.

certificate-publisher: Don't publish extra certs

Publish the default certificate and key of only whichever ingresscontroller has the cluster ingress domain.

Before this change, the certificate-publisher controller published the certificates and keys of all ingresscontrollers, in the "router-certs" secret. However, the only component that uses the router-certs secret is the authentication operator, which only needs the certificate and key for the cluster ingress domain. Moreover, collecting the certificates and keys for all ingresscontrollers can produce a result that exceeds the maximum secret size of 1 mebibyte, causing the certificate-publisher controller to fail to create or update the router-certs secret. This PR changes the certificate-publisher controller not to publish the extraneous certificates and keys.

  • pkg/operator/controller/certificate-publisher/controller.go: Update comments to reflect that the controller only publishes the certificate and key of the ingresscontroller for the cluster ingress domain in the "router-certs" secret.
    (New): Add a new predicate to the watch on ingresscontrollers, using the new hasClusterIngressDomain method and isDefaultIngressController function.
    (secretToIngressController): Skip ingresscontrollers for domains other than the cluster ingress domain.
    (hasClusterIngressDomain): New method. Get the cluster ingress config and return true iff the given ingresscontroller has the same domain.
    (isDefaultIngressController): New function. Return true iff the given ingresscontroller is the "default" ingresscontroller.
    (Reconcile): Get the cluster ingress config, and pass it to ensureRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs.go (ensureRouterCertsGlobalSecret): Add a parameter for the ingress config, and pass the domain from the ingress config to desiredRouterCertsGlobalSecret.
    (desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress domain, use it to filter out extraneous ingresscontrollers, and publish the certificate and key for only the ingresscontroller that has the cluster ingress domain.
    (getDefaultCertificateSecretForIngressController): New function used by desiredRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newSecret): Delete the example PEM data and put the secret's name in the "tls.crt" and "tls.key" data fields of the returned secret so that the data fields differ for different secrets.
    (TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify that the router-cert secret's data fields have the expected values. Add a "default certificate, explicit" test case. Add "custom certificate" and "missing custom certificate" test cases for the default ingresscontroller with a custom default certificate. Add a "custom ingresscontroller with the cluster ingress domain" test case. Change the "no secrets" test case to use the default ingresscontroller so that the test case would expect a router-cert secret but for the missing secret for the ingresscontroller. Update the "missing secret", "extra secret", and "perfect match" test cases not to expect the router-cert secret to include data for other ingresscontrollers.
  • test/e2e/all_test.go (TestAll): Update the lists of tests.
  • test/e2e/certificate_publisher_test.go: Delete file. This deletes the TestCreateIngressControllerThenSecret and TestCreateSecretThenIngressController tests.
  • test/e2e/operator_test.go (TestUpdateDefaultIngressController): Rename...
    (TestUpdateDefaultIngressControllerSecret): ...to this. Expand the test to verify that the operator updates both the router-certs secret as well as the default-ingress-cert configmap correctly, as well as verifying that the operator does not update the router-certs secret and the configmap if the ingresscontroller is updated to reference a non-existent secret.

TestDesiredRouterCertsGlobalSecret: Use t.Run

  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (TestDesiredRouterCertsGlobalSecret): Use t.Run.

Move IsAdmitted to new util package

Move the isAdmitted function from the ingress controller package to a new util package that can be imported into other controllers.

  • pkg/operator/controller/ingress/controller.go (isAdmitted): Move from here...
  • pkg/util/ingresscontroller/ingresscontroller.go (IsAdmitted): ...to here. New file for IngressController-related util functions.

certificate-publisher: Ignore not-admitted ingresscontrollers

When determining the desired "router-certs" secret, ignore ingresscontrollers that haven't been admitted. Multiple ingresscontrollers can all have the cluster ingress domain, but only one can be admitted at a given time. Any ingresscontrollers that haven't been admitted should be ignored so that the default certificate of any one that has been admitted is published.

  • pkg/operator/controller/certificate-publisher/publish_certs.go (desiredRouterCertsGlobalSecret): Ignore ingresscontrollers that haven't been admitted.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newIngressController): Add an "admitted" parameter for specifying the status of the "Admitted" status condition.
    (TestDesiredRouterCertsGlobalSecret): Add a test case for "default certificate and custom ingresscontroller that conflicts on domain".

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 6, 2022
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-853, which is invalid:

  • expected the bug to target the "4.12.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

certificate-publisher: Delete secretIsInUse

Delete the secretIsInUse predicate for the watch on secrets. The predicate is superfluous because the map func maps the event to an empty slice anyway if no ingresscontroller uses the secret.

  • pkg/operator/controller/certificate-publisher/controller.go (New): Delete the predicates on the watch on secrets.
    (secretIsInUse): Delete function.

certificate-publisher: Simplify the map func

Refactor the secretToIngressController map func and the ingressControllersWithSecret helper. The helper is no longer used by any other functions besides secretToIngressController, so the helper can be inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

  • pkg/operator/controller/certificate-publisher/controller.go (secretToIngressController): Inline ingressControllersWithSecret.
    (ingressControllersWithSecret): Delete function.

certificate-publisher: Don't publish extraneous certificates

Publish the default certificate and key of only whichever ingresscontroller has the cluster ingress domain

Before this change, the certificate-publisher controller published the certificates and keys of all ingresscontrollers, in the "router-certs" secret. However, the only component that uses the router-certs secret is the authentication operator, which only needs the certificate and key for the cluster ingress domain. Moreover, collecting the certificates and keys for all ingresscontrollers can produce a result that exceeds the maximum secret size of 1 mebibyte, causing the certificate-publisher controller to fail to create or update the router-certs secret. This PR changes the certificate-publisher controller not to publish the extraneous certificates and keys.

  • pkg/operator/controller/certificate-publisher/controller.go: Update comments to reflect that the controller only publishes the certificate and key of the ingresscontroller for the cluster ingress domain.
    (New): Add the new hasClusterIngressDomain predicate to the watch on ingresscontrollers.
    (secretToIngressController): Skip ingresscontrollers for domains other than the cluster ingress domain.
    (hasClusterIngressDomain): New predicate method. Get the cluster ingress config and return true iff the given ingresscontroller has the same domain.
    (Reconcile): Get the cluster ingress config, and pass it to ensureRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs.go (ensureRouterCertsGlobalSecret): Add a parameter for the ingress config, and pass the domain from the ingress config to desiredRouterCertsGlobalSecret.
    (desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress domain, use it to filter out extraneous ingresscontrollers, and publish the certificate and key for only the ingresscontroller that has the cluster ingress domain.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newSecret): Delete the example PEM data and put the secret's name in the "tls.crt" and "tls.key" data fields of the returned secret so that the data fields differ for different secrets.
    (TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify that the router-cert secret's data fields have the expected values. Add a "default certificate, explicit" test case. Add "custom certificate" and "missing custom certificate" test cases for the default ingresscontroller with a custom default certificate. Add a "custom ingresscontroller with the cluster ingress domain" test case. Change the "no secrets" test case to use the default ingresscontroller so that the test case would expect a router-cert secret but for the missing secret for the ingresscontroller. Update the "missing secret", "extra secret", and "perfect match" test cases not to expect the router-cert secret to include data for other ingresscontrollers.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor Author

Miciah commented Sep 6, 2022

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 6, 2022
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-853, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah Miciah force-pushed the OCPBUGS-853-certificate-publisher-do-not-publish-extra-certs branch from 7dd8e0f to 1b88f04 Compare September 8, 2022 01:41
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-853, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

In response to this:

certificate-publisher: Delete secretIsInUse

Delete the secretIsInUse predicate for the watch on secrets. The predicate is superfluous because the map func maps the event to an empty slice anyway if no ingresscontroller uses the secret.

  • pkg/operator/controller/certificate-publisher/controller.go (New): Delete the predicates on the watch on secrets.
    (secretIsInUse): Delete function.

certificate-publisher: Simplify the map func

Refactor the secretToIngressController map func and the ingressControllersWithSecret helper. The helper is no longer used by any other functions besides secretToIngressController, so the helper can be inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

  • pkg/operator/controller/certificate-publisher/controller.go (secretToIngressController): Inline ingressControllersWithSecret.
    (ingressControllersWithSecret): Delete function.

certificate: Log reconcile requests.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Log the reconcile request.

Move "default-ingress-cert" configmap publishing

Move the reconciliation of the "default-ingress-cert" configmap from the "certificate" controller to the "certificate-publisher" controller in order to ensure that the configmap gets updated when the default-certificate secret is updated.

Before this change, the certificate-publisher controller already published the "router-certs" secret in the "openshift-config-managed" namespace. The router-certs secret is supposed to have the certificate and key of the ingresscontroller that has the cluster ingress domain, so the controller watches secrets as well as ingresscontrollers in order to update the router-certs secret when the ingresscontroller is updated to reference a different secret or when the content of the referenced secret is updated.

In contrast, the certificate controller was originally written to generate a self-signed CA and generate default certificates for ingresscontrollers using this CA, and for these purposes, the controller only needs to watch ingresscontrollers and not the secrets themselves.

#331 added reconciliation of the default-ingress-cert configmap to the certificate controller. This configmap has the default certificate of the "default" ingresscontroller. The configmap should be updated when the secret reference or content is updated. Moreover, it makes sense conceptually for the certificate-publisher controller to handle both publication tasks: both the router-certs secret as well as the default-ingress-certs configmap.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Move the call to ensureDefaultIngressCertConfigMap from here...
  • pkg/operator/controller/certificate-publisher/controller.go (Reconcile): ...to here.
  • pkg/operator/controller/certificate/publish_ca.go: Rename...
  • pkg/operator/controller/certificate-publisher/publish_ca.go: ...to this.

certificate-publisher: Don't publish extra certs

Publish the default certificate and key of only whichever ingresscontroller has the cluster ingress domain.

Before this change, the certificate-publisher controller published the certificates and keys of all ingresscontrollers, in the "router-certs" secret. However, the only component that uses the router-certs secret is the authentication operator, which only needs the certificate and key for the cluster ingress domain. Moreover, collecting the certificates and keys for all ingresscontrollers can produce a result that exceeds the maximum secret size of 1 mebibyte, causing the certificate-publisher controller to fail to create or update the router-certs secret. This PR changes the certificate-publisher controller not to publish the extraneous certificates and keys.

  • pkg/operator/controller/certificate-publisher/controller.go: Update comments to reflect that the controller only publishes the certificate and key of the ingresscontroller for the cluster ingress domain in the "router-certs" secret.
    (New): Add a new predicate to the watch on ingresscontrollers, using the new hasClusterIngressDomain method and isDefaultIngressController function.
    (secretToIngressController): Skip ingresscontrollers for domains other than the cluster ingress domain.
    (hasClusterIngressDomain): New method. Get the cluster ingress config and return true iff the given ingresscontroller has the same domain.
    (isDefaultIngressController): New function. Return true iff the given ingresscontroller is the "default" ingresscontroller.
    (Reconcile): Get the cluster ingress config, and pass it to ensureRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs.go (ensureRouterCertsGlobalSecret): Add a parameter for the ingress config, and pass the domain from the ingress config to desiredRouterCertsGlobalSecret.
    (desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress domain, use it to filter out extraneous ingresscontrollers, and publish the certificate and key for only the ingresscontroller that has the cluster ingress domain.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newSecret): Delete the example PEM data and put the secret's name in the "tls.crt" and "tls.key" data fields of the returned secret so that the data fields differ for different secrets.
    (TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify that the router-cert secret's data fields have the expected values. Add a "default certificate, explicit" test case. Add "custom certificate" and "missing custom certificate" test cases for the default ingresscontroller with a custom default certificate. Add a "custom ingresscontroller with the cluster ingress domain" test case. Change the "no secrets" test case to use the default ingresscontroller so that the test case would expect a router-cert secret but for the missing secret for the ingresscontroller. Update the "missing secret", "extra secret", and "perfect match" test cases not to expect the router-cert secret to include data for other ingresscontrollers.
  • test/e2e/all_test.go (TestAll): Update the lists of tests.
  • test/e2e/certificate_publisher_test.go: Delete file. This deletes the TestCreateIngressControllerThenSecret and TestCreateSecretThenIngressController tests.
  • test/e2e/operator_test.go (TestUpdateDefaultIngressController): Rename...
    (TestUpdateDefaultIngressControllerSecret): ...to this. Expand the test to verify that the operator updates both the router-certs secret as well as the default-ingress-cert configmap correctly, as well as verifying that the operator does not update the router-certs secret and the configmap if the ingresscontroller is updated to reference a non-existent secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor Author

Miciah commented Sep 8, 2022

TestCanaryRoute failed due to pod security. I filed https://issues.redhat.com/browse/OCPBUGS-1049 to track the issue and posted #826 to fix it.

@Miciah
Copy link
Contributor Author

Miciah commented Sep 11, 2022

#826 merged.
/test e2e-aws-operator

@Miciah Miciah force-pushed the OCPBUGS-853-certificate-publisher-do-not-publish-extra-certs branch 2 times, most recently from 87fe114 to 8fd69d3 Compare September 15, 2022 23:51
@candita
Copy link
Contributor

candita commented Sep 28, 2022

/assign @suleymanakbas91
/assign @rfredette

data = bytes.Join([][]byte{
s1.Data["tls.crt"],
s1.Data["tls.key"],
defaultCIExplicitDefaultDefaultCert = newIngressController("default", "router-certs-default", "apps.my.devcluster.openshift.com")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is a typo.

Suggested change
defaultCIExplicitDefaultDefaultCert = newIngressController("default", "router-certs-default", "apps.my.devcluster.openshift.com")
defaultCIExplicitDefaultCert = newIngressController("default", "router-certs-default", "apps.my.devcluster.openshift.com")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually isn't; it's just confusing. * grin *. I've added a comment to explain:

		// defaultCIExplicitDefaultDefaultCert is an ingresscontroller
		// named "default" that specifies an explicit reference for
		// default certificate secret, where that secret is the same
		// default one that the operator generates when none is
		// specified (the "default default certificate").

Is the comment sufficient, or would it still be less confusing if I renamed the variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clear now with the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe renaming the variable would further aid clarity because I first wondered whether the run-on of DefaultDefault was intentional. And CI is Custom Ingress? Or Cluster Ingress? Maybe we could choose names like "production", or "staging", or "dev"...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the variables: defaultCIWithDefaultCertUnspecified, defaultCIWithDefaultCertSetToDefault, and defaultCIWithDefaultCertSetToCustom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose while I'm renaming variables, I can get rid of the archaic "CI" initialism... Done.


// Verify that the router-certs secret gets updated.
previousRouterCertsSecret = routerCertsSecret.DeepCopy()
wait.PollImmediate(1*time.Second, 20*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout values look arbitrary (e.g. 10s, 15s, 20s). Would it make sense to use the same timeout values for consistency or to write a brief explanation of why a particular value is preferred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was probably some off-the-cuff reasoning about how long it might take the operator to update a resource, or assumptions that if resource A has been updated, then resource B should have been updated too. However, in lieu of detailed explanations, I've updated the polling loops all to use timeout for consistency. This increases the timeouts for some polling loops; at worst, it will take a little longer for the test to fail, and at best, this can make the test less flaky.

Delete the secretIsInUse predicate for the watch on secrets.  The predicate
is superfluous because the map func maps the event to an empty slice anyway
if no ingresscontroller uses the secret.

* pkg/operator/controller/certificate-publisher/controller.go (New): Delete
the predicates on the watch on secrets.
(secretIsInUse): Delete function.
Refactor the secretToIngressController map func and the
ingressControllersWithSecret helper.  The helper is no longer used by any
other functions besides secretToIngressController, so the helper can be
inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

* pkg/operator/controller/certificate-publisher/controller.go
(secretToIngressController): Inline ingressControllersWithSecret.
(ingressControllersWithSecret): Delete function.
* pkg/operator/controller/certificate/controller.go (Reconcile): Log the
reconcile request.
@Miciah Miciah force-pushed the OCPBUGS-853-certificate-publisher-do-not-publish-extra-certs branch from 8fd69d3 to 6a905c2 Compare October 25, 2022 20:09
@Miciah
Copy link
Contributor Author

Miciah commented Oct 25, 2022

Rebased and addressed Suleyman's comments.

@suleymanakbas91
Copy link
Contributor

/lgtm

Failures look unrelated..
/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2022
Copy link
Contributor

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me

return true
}
ic := o.(*operatorv1.IngressController)
return ic.Status.Domain == ingressConfig.Spec.Domain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a case insensitive comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When cluster-authentication-operator reads the secret, it uses (effectively) secret.Data[ingressConfig.Spec.Domain]: https://github.com/openshift/cluster-authentication-operator/blob/686e2897e0d6c4a0b2d68ef0cb8e28e56a6d190d/pkg/controllers/routercerts/controller.go#L137-L159
When cluster-ingress-operator publishes the secret, the old logic uses ic.Status.Domain as the key in the secret's data:


So we could normalize the key in the new logic, but not doing so doesn't break anything that would have worked before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, it's fine the way it is. Thanks for looking into this.

for _, ic := range controllers {
log.Info("queueing ingresscontroller", "name", ic.Name, "related", o.GetSelfLink())
for _, ic := range list.Items {
if ic.Status.Domain != ingressConfig.Spec.Domain {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a case insensitive comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis in #824 (comment) applies here. If the ingresscontroller's domain doesn't match the cluster ingress config's domain in a case-sensitive match, the end result is the same with either the old logic or the new logic (namely that cluster-authentication-operator won't find the certificate in the secret).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, sounds good.

var defaultIngressController *operatorv1.IngressController
for i := range controllers.Items {
ic := &controllers.Items[i]
if ic.Namespace == controller.DefaultOperatorNamespace && ic.Name == manifests.DefaultIngressControllerName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just reuse isDefaultIngressController(ic) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out I can! ic satisfies client.Object. I'll update that.

// ingresscontrollers could cause the secret's size to exceed
// the maximum secret size of 1 mebibyte. See
// <https://issues.redhat.com/browse/OCPBUGS-853>.
if ingresses[i].Status.Domain != clusterIngressDomain {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis in #824 (comment) applies here.

// defaultCertName can be the same as customCertName, so
// both of the following conditions can be true; see
// <https://bugzilla.redhat.com/show_bug.cgi?id=1912922>.
if defaultCertName.Name == secrets[i].Name {
Copy link
Contributor

@gcs278 gcs278 Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stared at this for a while and still it's not really making sense. We didn't fix https://bugzilla.redhat.com/show_bug.cgi?id=1912922 in this PR right? It was already fixed? How'd did the old code handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, 00f17ef fixed https://bugzilla.redhat.com/show_bug.cgi?id=1912922 but didn't require changes in this code to do so. In this code, the old logic was fine, but it is a little subtle, so I figured it would be helpful to add a comment to explain why I wrote the logic this way.

The old logic went something like this:

  1. Build nameToSecret.
  2. For each ingresscontroller,
    a. If ingress.Spec.DefaultCertificate is set, add the secret named ingress.Spec.DefaultCertificate to globalSecret.
    b. Otherwise add the secret named controller.RouterOperatorGeneratedDefaultCertificateSecretName(&ingress, operandNamespace) to globalSecret.
  3. Publish globalSecret.

The new logic, because it only cares about the default ingresscontroller, inverts this logic:

  1. For each secret,
    a. If the secret's name matches controller.RouterOperatorGeneratedDefaultCertificateSecretName(&ingress[i], operandNamespace) where &ingresses[i] is the default ingresscontroller, set cert to the secret.
    b. If the secret's name matches ingresses[i].Spec.DefaultCertificate where &ingresses[i] is the default ingresscontroller, set cert to the secret`.
  2. If cert is set, publish it in globalSecret.

Now suppose controller.RouterOperatorGeneratedDefaultCertificateSecretName(&ingresses[i], operandNamespace) is "router-certs-default" and ingresses[i].Spec.DefaultCertificate is "foo" and that both the "router-certs-default" secret and the "foo" secret exist. Then the following logic could do the wrong thing:

		for i := range secrets {
			if defaultCertName.Name == secrets[i].Name {
				cert = &secrets[i]
			}
			if customCertName != nil && customCertName.Name == secrets[i].Name {
				cert = &secrets[i]
			}
		}

The following test case (added to TestDesiredRouterCertsGlobalSecret) fails with the faulty logic:

		{
			description: "custom certificate, with secrets having the custom one and then default one",
			inputs: testInputs{
				[]operatorv1.IngressController{defaultCICustomDefaultCert},
				[]corev1.Secret{customDefaultCert, defaultCert},
			},
			output: testOutputs{
				&corev1.Secret{
					Data: map[string][]byte{"apps.my.devcluster.openshift.com": customDefaultCertData},
				},
			},
		},

If secrets has "router-certs-default" after "foo", then cert would get assigned the wrong value.

Having worked through this example, I'll add the aforementioned test case to the PR. Does my explanation make the logic clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea thanks this helps - it was definitely subtle in the existing code. I think your code is a bit more verbose, which is good.

Test case makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and refactored the tricky logic into a new function, getDefaultCertificateSecretForIngressController, with more comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new variables names makes it MUCH more readable in my opinion, thanks!

Move the reconciliation of the "default-ingress-cert" configmap from the
"certificate" controller to the "certificate-publisher" controller in order
to ensure that the configmap gets updated when the default-certificate
secret is updated.

Before this commit, the certificate-publisher controller already published
the "router-certs" secret in the "openshift-config-managed" namespace.  The
router-certs secret is supposed to have the certificate and key of the
ingresscontroller that has the cluster ingress domain, so the controller
watches secrets as well as ingresscontrollers in order to update the
router-certs secret when the ingresscontroller is updated to reference a
different secret or when the content of the referenced secret is updated.

In contrast, the certificate controller was originally written to generate
a self-signed CA and generate default certificates for ingresscontrollers
using this CA, and for these purposes, the controller only needs to watch
ingresscontrollers and not the secrets themselves.

Commit 9640767 added reconciliation of the
default-ingress-cert configmap to the certificate controller.  This
configmap has the default certificate of the "default" ingresscontroller.
The configmap should be updated when the secret reference or content is
updated.  Moreover, it makes sense conceptually for the
certificate-publisher controller to handle both publication tasks: both the
router-certs secret as well as the default-ingress-certs configmap.

* pkg/operator/controller/certificate/controller.go (Reconcile): Move the
call to ensureDefaultIngressCertConfigMap from here...
* pkg/operator/controller/certificate-publisher/controller.go (Reconcile):
...to here.
* pkg/operator/controller/certificate/publish_ca.go: Rename...
* pkg/operator/controller/certificate-publisher/publish_ca.go: ...to this.
@Miciah Miciah force-pushed the OCPBUGS-853-certificate-publisher-do-not-publish-extra-certs branch from 6a905c2 to 43e9469 Compare October 26, 2022 22:29
@candita
Copy link
Contributor

candita commented Oct 27, 2022

/assign @gcs278

Miciah and others added 2 commits October 27, 2022 10:58
Publish the default certificate and key of only whichever ingresscontroller
has the cluster ingress domain.

Before this commit, the certificate-publisher controller published the
certificates and keys of all ingresscontrollers, in the "router-certs"
secret.  However, the only component that uses the router-certs secret is
the authentication operator, which only needs the certificate and key for
the cluster ingress domain.  Moreover, collecting the certificates and keys
for all ingresscontrollers can produce a result that exceeds the maximum
secret size of 1 mebibyte, causing the certificate-publisher controller to
fail to create or update the router-certs secret.  This commit changes the
certificate-publisher controller not to publish the extraneous certificates
and keys.

This commit fixes OCPBUGS-853.

https://issues.redhat.com/browse/OCPBUGS-853

* pkg/operator/controller/certificate-publisher/controller.go: Update
comments to reflect that the controller only publishes the certificate and
key of the ingresscontroller for the cluster ingress domain in the
"router-certs" secret.
(New): Add a new predicate to the watch on ingresscontrollers, using the
new hasClusterIngressDomain method and isDefaultIngressController function.
(secretToIngressController): Skip ingresscontrollers for domains other than
the cluster ingress domain.
(hasClusterIngressDomain): New method.  Get the cluster ingress config and
return true iff the given ingresscontroller has the same domain.
(isDefaultIngressController): New function.  Return true iff the given
ingresscontroller is the "default" ingresscontroller.
(Reconcile): Get the cluster ingress config, and pass it to
ensureRouterCertsGlobalSecret.
* pkg/operator/controller/certificate-publisher/publish_certs.go
(ensureRouterCertsGlobalSecret): Add a parameter for the ingress config,
and pass the domain from the ingress config to
desiredRouterCertsGlobalSecret.
(desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress
domain, use it to filter out extraneous ingresscontrollers, and publish the
certificate and key for only the ingresscontroller that has the cluster
ingress domain.
(getDefaultCertificateSecretForIngressController): New function used by
desiredRouterCertsGlobalSecret.
* pkg/operator/controller/certificate-publisher/publish_certs_test.go
(newSecret): Delete the example PEM data and put the secret's name in the
"tls.crt" and "tls.key" data fields of the returned secret so that the data
fields differ for different secrets.
(TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify
that the router-cert secret's data fields have the expected values.  Add a
"default certificate, explicit" test case.  Add "custom certificate" and
"missing custom certificate" test cases for the default ingresscontroller
with a custom default certificate.  Add a "custom ingresscontroller with
the cluster ingress domain" test case.  Change the "no secrets" test case
to use the default ingresscontroller so that the test case would expect a
router-cert secret but for the missing secret for the ingresscontroller.
Update the "missing secret", "extra secret", and "perfect match" test cases
not to expect the router-cert secret to include data for other
ingresscontrollers.
* test/e2e/all_test.go (TestAll): Update the lists of tests.
* test/e2e/certificate_publisher_test.go: Delete file.  This deletes the
TestCreateIngressControllerThenSecret and
TestCreateSecretThenIngressController tests.
* test/e2e/operator_test.go (TestUpdateDefaultIngressController): Rename...
(TestUpdateDefaultIngressControllerSecret): ...to this.  Expand the test to
verify that the operator updates both the router-certs secret as well as
the default-ingress-cert configmap correctly, as well as verifying that the
operator *does not* update the router-certs secret and the configmap if the
ingresscontroller is updated to reference a non-existent secret.  Update
polling loops to use the same timeout value for consistency.
* pkg/operator/controller/certificate-publisher/publish_certs_test.go
(TestDesiredRouterCertsGlobalSecret): Use t.Run.

Co-authored-by: Andrew McDermott <aim@frobware.com>
@Miciah Miciah force-pushed the OCPBUGS-853-certificate-publisher-do-not-publish-extra-certs branch from 26cc90a to c07c9bb Compare October 27, 2022 14:58
@gcs278
Copy link
Contributor

gcs278 commented Oct 27, 2022

variable naming changes and test updates looks good to me
/lgtm

@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

/test e2e-azure-operator
/test e2e-gcp-operator

@frobware
Copy link
Contributor

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: frobware

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 27, 2022
@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

e2e-azure-operator failed because a bunch of image pulls failed; pods.json shows that the following pods fails to pull their images:

% jq < pods.json -r '.items|.[]|select(.status|.containerStatuses,.initContainerStatuses|.[]?|select(.state.waiting.reason=="ImagePullBackOff"))|.metadata.name'
oauth-openshift-59db778d9c-vt5rl
oauth-openshift-59db778d9c-z4sn6
console-operator-599b69fbd-fdt7v
console-operator-599b69fbd-fdt7v
node-ca-wcnff
certified-operators-5pxv8
community-operators-fpptm
redhat-operators-5d8f7
node-exporter-dml6g
node-exporter-t5cvq

/test e2e-azure-operator

@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

/test all
because #823 merged.

Move the isAdmitted function from the ingress controller package to a new
util package that can be imported into other controllers.

* pkg/operator/controller/ingress/controller.go (isAdmitted): Move from
here...
* pkg/util/ingresscontroller/ingresscontroller.go (IsAdmitted): ...to here.
New file for IngressController-related util functions.
When determining the desired "router-certs" secret, ignore
ingresscontrollers that haven't been admitted.  Multiple ingresscontrollers
can all have the cluster ingress domain, but only one can be admitted at a
given time.  Any ingresscontrollers that haven't been admitted should be
ignored so that the default certificate of any one that has been admitted
is published.

* pkg/operator/controller/certificate-publisher/publish_certs.go
(desiredRouterCertsGlobalSecret): Ignore ingresscontrollers that haven't
been admitted.
* pkg/operator/controller/certificate-publisher/publish_certs_test.go
(newIngressController): Add an "admitted" parameter for specifying the
status of the "Admitted" status condition.
(TestDesiredRouterCertsGlobalSecret): Add a test case for "default
certificate and custom ingresscontroller that conflicts on domain".
@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

/test e2e-azure-operator
/test e2e-gcp-operator

@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-853, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

certificate-publisher: Delete secretIsInUse

Delete the secretIsInUse predicate for the watch on secrets. The predicate is superfluous because the map func maps the event to an empty slice anyway if no ingresscontroller uses the secret.

  • pkg/operator/controller/certificate-publisher/controller.go (New): Delete the predicates on the watch on secrets.
    (secretIsInUse): Delete function.

certificate-publisher: Simplify the map func

Refactor the secretToIngressController map func and the ingressControllersWithSecret helper. The helper is no longer used by any other functions besides secretToIngressController, so the helper can be inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

  • pkg/operator/controller/certificate-publisher/controller.go (secretToIngressController): Inline ingressControllersWithSecret.
    (ingressControllersWithSecret): Delete function.

certificate: Log reconcile requests.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Log the reconcile request.

Move "default-ingress-cert" configmap publishing

Move the reconciliation of the "default-ingress-cert" configmap from the "certificate" controller to the "certificate-publisher" controller in order to ensure that the configmap gets updated when the default-certificate secret is updated.

Before this change, the certificate-publisher controller already published the "router-certs" secret in the "openshift-config-managed" namespace. The router-certs secret is supposed to have the certificate and key of the ingresscontroller that has the cluster ingress domain, so the controller watches secrets as well as ingresscontrollers in order to update the router-certs secret when the ingresscontroller is updated to reference a different secret or when the content of the referenced secret is updated.

In contrast, the certificate controller was originally written to generate a self-signed CA and generate default certificates for ingresscontrollers using this CA, and for these purposes, the controller only needs to watch ingresscontrollers and not the secrets themselves.

#331 added reconciliation of the default-ingress-cert configmap to the certificate controller. This configmap has the default certificate of the "default" ingresscontroller. The configmap should be updated when the secret reference or content is updated. Moreover, it makes sense conceptually for the certificate-publisher controller to handle both publication tasks: both the router-certs secret as well as the default-ingress-certs configmap.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Move the call to ensureDefaultIngressCertConfigMap from here...
  • pkg/operator/controller/certificate-publisher/controller.go (Reconcile): ...to here.
  • pkg/operator/controller/certificate/publish_ca.go: Rename...
  • pkg/operator/controller/certificate-publisher/publish_ca.go: ...to this.

certificate-publisher: Don't publish extra certs

Publish the default certificate and key of only whichever ingresscontroller has the cluster ingress domain.

Before this change, the certificate-publisher controller published the certificates and keys of all ingresscontrollers, in the "router-certs" secret. However, the only component that uses the router-certs secret is the authentication operator, which only needs the certificate and key for the cluster ingress domain. Moreover, collecting the certificates and keys for all ingresscontrollers can produce a result that exceeds the maximum secret size of 1 mebibyte, causing the certificate-publisher controller to fail to create or update the router-certs secret. This PR changes the certificate-publisher controller not to publish the extraneous certificates and keys.

  • pkg/operator/controller/certificate-publisher/controller.go: Update comments to reflect that the controller only publishes the certificate and key of the ingresscontroller for the cluster ingress domain in the "router-certs" secret.
    (New): Add a new predicate to the watch on ingresscontrollers, using the new hasClusterIngressDomain method and isDefaultIngressController function.
    (secretToIngressController): Skip ingresscontrollers for domains other than the cluster ingress domain.
    (hasClusterIngressDomain): New method. Get the cluster ingress config and return true iff the given ingresscontroller has the same domain.
    (isDefaultIngressController): New function. Return true iff the given ingresscontroller is the "default" ingresscontroller.
    (Reconcile): Get the cluster ingress config, and pass it to ensureRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs.go (ensureRouterCertsGlobalSecret): Add a parameter for the ingress config, and pass the domain from the ingress config to desiredRouterCertsGlobalSecret.
    (desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress domain, use it to filter out extraneous ingresscontrollers, and publish the certificate and key for only the ingresscontroller that has the cluster ingress domain.
    (getDefaultCertificateSecretForIngressController): New function used by desiredRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newSecret): Delete the example PEM data and put the secret's name in the "tls.crt" and "tls.key" data fields of the returned secret so that the data fields differ for different secrets.
    (TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify that the router-cert secret's data fields have the expected values. Add a "default certificate, explicit" test case. Add "custom certificate" and "missing custom certificate" test cases for the default ingresscontroller with a custom default certificate. Add a "custom ingresscontroller with the cluster ingress domain" test case. Change the "no secrets" test case to use the default ingresscontroller so that the test case would expect a router-cert secret but for the missing secret for the ingresscontroller. Update the "missing secret", "extra secret", and "perfect match" test cases not to expect the router-cert secret to include data for other ingresscontrollers.
  • test/e2e/all_test.go (TestAll): Update the lists of tests.
  • test/e2e/certificate_publisher_test.go: Delete file. This deletes the TestCreateIngressControllerThenSecret and TestCreateSecretThenIngressController tests.
  • test/e2e/operator_test.go (TestUpdateDefaultIngressController): Rename...
    (TestUpdateDefaultIngressControllerSecret): ...to this. Expand the test to verify that the operator updates both the router-certs secret as well as the default-ingress-cert configmap correctly, as well as verifying that the operator does not update the router-certs secret and the configmap if the ingresscontroller is updated to reference a non-existent secret.

TestDesiredRouterCertsGlobalSecret: Use t.Run

  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (TestDesiredRouterCertsGlobalSecret): Use t.Run.

Move IsAdmitted to new util package

Move the isAdmitted function from the ingress controller package to a new util package that can be imported into other controllers.

  • pkg/operator/controller/ingress/controller.go (isAdmitted): Move from here...
  • pkg/util/ingresscontroller/ingresscontroller.go (IsAdmitted): ...to here. New file for IngressController-related util functions.

certificate-publisher: Ignore not-admitted ingresscontrollers

When determining the desired "router-certs" secret, ignore ingresscontrollers that haven't been admitted. Multiple ingresscontrollers can all have the cluster ingress domain, but only one can be admitted at a given time. Any ingresscontrollers that haven't been admitted should be ignored so that the default certificate of any one that has been admitted is published.

  • pkg/operator/controller/certificate-publisher/publish_certs.go (desiredRouterCertsGlobalSecret): Ignore ingresscontrollers that haven't been admitted.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newIngressController): Add an "admitted" parameter for specifying the status of the "Admitted" status condition.
    (TestDesiredRouterCertsGlobalSecret): Add a test case for "default certificate and custom ingresscontroller that conflicts on domain".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gcs278
Copy link
Contributor

gcs278 commented Oct 27, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 27, 2022
@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

e2e-aws-ovn-single-node failed because [sig-arch][Late] operators should not create watch channels very often [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel] failed:

{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:446]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"kube-controller-manager-operator\" produces more watch requests than expected: watchrequestcount=294, upperbound=290, ratio=1.013793103448276",
    ]
to be empty

@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

e2e-aws-operator failed because kube-apiserver reported NodeInstallerProgressing, must-gather failed, and deprovisioning failed. However, TestUpdateDefaultIngressControllerSecret passed in e2e-aws-operator, e2e-azure-operator, and e2e-gcp-operator.

@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

/override ci/prow/e2e-aws-operator
because e2e-azure-operator and e2e-gcp-operator passed and e2e-aws-operator failed on known issues unrelated to the changes in this PR.
/override ci/prow/e2e-aws-ovn-single-node
because it failed on issues with kube-controller-manager-operator that are unrelated to the changes in this PR.
/override ci/prow/e2e-gcp-ovn-serial
because it failed on disruption issues that are unrelated to the changes in this PR.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator, ci/prow/e2e-aws-ovn-single-node, ci/prow/e2e-gcp-ovn-serial

In response to this:

/override ci/prow/e2e-aws-operator
because e2e-azure-operator and e2e-gcp-operator passed and e2e-aws-operator failed on known issues unrelated to the changes in this PR.
/override ci/prow/e2e-aws-ovn-single-node
because it failed on issues with kube-controller-manager-operator that are unrelated to the changes in this PR.
/override ci/prow/e2e-gcp-ovn-serial
because it failed on disruption issues that are unrelated to the changes in this PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Miciah
Copy link
Contributor Author

Miciah commented Oct 27, 2022

/tide refresh

@openshift-merge-robot openshift-merge-robot merged commit fb717d8 into openshift:master Oct 27, 2022
@openshift-ci-robot
Copy link
Contributor

@Miciah: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-853 has been moved to the MODIFIED state.

In response to this:

certificate-publisher: Delete secretIsInUse

Delete the secretIsInUse predicate for the watch on secrets. The predicate is superfluous because the map func maps the event to an empty slice anyway if no ingresscontroller uses the secret.

  • pkg/operator/controller/certificate-publisher/controller.go (New): Delete the predicates on the watch on secrets.
    (secretIsInUse): Delete function.

certificate-publisher: Simplify the map func

Refactor the secretToIngressController map func and the ingressControllersWithSecret helper. The helper is no longer used by any other functions besides secretToIngressController, so the helper can be inlined into secretToIngressController.

Remove selflink from log messages since OpenShift 4.8 turned off selflinks.

  • pkg/operator/controller/certificate-publisher/controller.go (secretToIngressController): Inline ingressControllersWithSecret.
    (ingressControllersWithSecret): Delete function.

certificate: Log reconcile requests.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Log the reconcile request.

Move "default-ingress-cert" configmap publishing

Move the reconciliation of the "default-ingress-cert" configmap from the "certificate" controller to the "certificate-publisher" controller in order to ensure that the configmap gets updated when the default-certificate secret is updated.

Before this change, the certificate-publisher controller already published the "router-certs" secret in the "openshift-config-managed" namespace. The router-certs secret is supposed to have the certificate and key of the ingresscontroller that has the cluster ingress domain, so the controller watches secrets as well as ingresscontrollers in order to update the router-certs secret when the ingresscontroller is updated to reference a different secret or when the content of the referenced secret is updated.

In contrast, the certificate controller was originally written to generate a self-signed CA and generate default certificates for ingresscontrollers using this CA, and for these purposes, the controller only needs to watch ingresscontrollers and not the secrets themselves.

#331 added reconciliation of the default-ingress-cert configmap to the certificate controller. This configmap has the default certificate of the "default" ingresscontroller. The configmap should be updated when the secret reference or content is updated. Moreover, it makes sense conceptually for the certificate-publisher controller to handle both publication tasks: both the router-certs secret as well as the default-ingress-certs configmap.

  • pkg/operator/controller/certificate/controller.go (Reconcile): Move the call to ensureDefaultIngressCertConfigMap from here...
  • pkg/operator/controller/certificate-publisher/controller.go (Reconcile): ...to here.
  • pkg/operator/controller/certificate/publish_ca.go: Rename...
  • pkg/operator/controller/certificate-publisher/publish_ca.go: ...to this.

certificate-publisher: Don't publish extra certs

Publish the default certificate and key of only whichever ingresscontroller has the cluster ingress domain.

Before this change, the certificate-publisher controller published the certificates and keys of all ingresscontrollers, in the "router-certs" secret. However, the only component that uses the router-certs secret is the authentication operator, which only needs the certificate and key for the cluster ingress domain. Moreover, collecting the certificates and keys for all ingresscontrollers can produce a result that exceeds the maximum secret size of 1 mebibyte, causing the certificate-publisher controller to fail to create or update the router-certs secret. This PR changes the certificate-publisher controller not to publish the extraneous certificates and keys.

  • pkg/operator/controller/certificate-publisher/controller.go: Update comments to reflect that the controller only publishes the certificate and key of the ingresscontroller for the cluster ingress domain in the "router-certs" secret.
    (New): Add a new predicate to the watch on ingresscontrollers, using the new hasClusterIngressDomain method and isDefaultIngressController function.
    (secretToIngressController): Skip ingresscontrollers for domains other than the cluster ingress domain.
    (hasClusterIngressDomain): New method. Get the cluster ingress config and return true iff the given ingresscontroller has the same domain.
    (isDefaultIngressController): New function. Return true iff the given ingresscontroller is the "default" ingresscontroller.
    (Reconcile): Get the cluster ingress config, and pass it to ensureRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs.go (ensureRouterCertsGlobalSecret): Add a parameter for the ingress config, and pass the domain from the ingress config to desiredRouterCertsGlobalSecret.
    (desiredRouterCertsGlobalSecret): Add a parameter for the cluster ingress domain, use it to filter out extraneous ingresscontrollers, and publish the certificate and key for only the ingresscontroller that has the cluster ingress domain.
    (getDefaultCertificateSecretForIngressController): New function used by desiredRouterCertsGlobalSecret.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newSecret): Delete the example PEM data and put the secret's name in the "tls.crt" and "tls.key" data fields of the returned secret so that the data fields differ for different secrets.
    (TestDesiredRouterCertsGlobalSecret): Modify test cases so that they verify that the router-cert secret's data fields have the expected values. Add a "default certificate, explicit" test case. Add "custom certificate" and "missing custom certificate" test cases for the default ingresscontroller with a custom default certificate. Add a "custom ingresscontroller with the cluster ingress domain" test case. Change the "no secrets" test case to use the default ingresscontroller so that the test case would expect a router-cert secret but for the missing secret for the ingresscontroller. Update the "missing secret", "extra secret", and "perfect match" test cases not to expect the router-cert secret to include data for other ingresscontrollers.
  • test/e2e/all_test.go (TestAll): Update the lists of tests.
  • test/e2e/certificate_publisher_test.go: Delete file. This deletes the TestCreateIngressControllerThenSecret and TestCreateSecretThenIngressController tests.
  • test/e2e/operator_test.go (TestUpdateDefaultIngressController): Rename...
    (TestUpdateDefaultIngressControllerSecret): ...to this. Expand the test to verify that the operator updates both the router-certs secret as well as the default-ingress-cert configmap correctly, as well as verifying that the operator does not update the router-certs secret and the configmap if the ingresscontroller is updated to reference a non-existent secret.

TestDesiredRouterCertsGlobalSecret: Use t.Run

  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (TestDesiredRouterCertsGlobalSecret): Use t.Run.

Move IsAdmitted to new util package

Move the isAdmitted function from the ingress controller package to a new util package that can be imported into other controllers.

  • pkg/operator/controller/ingress/controller.go (isAdmitted): Move from here...
  • pkg/util/ingresscontroller/ingresscontroller.go (IsAdmitted): ...to here. New file for IngressController-related util functions.

certificate-publisher: Ignore not-admitted ingresscontrollers

When determining the desired "router-certs" secret, ignore ingresscontrollers that haven't been admitted. Multiple ingresscontrollers can all have the cluster ingress domain, but only one can be admitted at a given time. Any ingresscontrollers that haven't been admitted should be ignored so that the default certificate of any one that has been admitted is published.

  • pkg/operator/controller/certificate-publisher/publish_certs.go (desiredRouterCertsGlobalSecret): Ignore ingresscontrollers that haven't been admitted.
  • pkg/operator/controller/certificate-publisher/publish_certs_test.go (newIngressController): Add an "admitted" parameter for specifying the status of the "Admitted" status condition.
    (TestDesiredRouterCertsGlobalSecret): Add a test case for "default certificate and custom ingresscontroller that conflicts on domain".

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor Author

Miciah commented Feb 27, 2023

/cherry-pick release-4.11

@openshift-cherrypick-robot

@Miciah: #824 failed to apply on top of branch "release-4.11":

Applying: certificate-publisher: Delete secretIsInUse
Applying: certificate-publisher: Simplify the map func
Applying: certificate: Log reconcile requests
Applying: Move "default-ingress-cert" configmap publishing
Using index info to reconstruct a base tree...
M	pkg/operator/controller/certificate/controller.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/operator/controller/certificate/controller.go
CONFLICT (content): Merge conflict in pkg/operator/controller/certificate/controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0004 Move "default-ingress-cert" configmap publishing
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah Miciah changed the title OCPBUGS-853: certificate-publisher: Don't publish extraneous certificates [release-4.11] OCPBUGS-853: certificate-publisher: Don't publish extraneous certificates Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants