Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put router-ca configmap in openshift-config-managed iff needed #110

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Jan 31, 2019

Put router-ca configmap in openshift-config-managed

  • manifests/00-cluster-role.yaml: Grant the operator create, get, list, watch, and delete access to configmaps.
  • pkg/operator/controller/controller.go (globalMachineSpecifiedConfigNamespace): New constant for the openshift-config-managed namespace.
    (ensureRouterCACertificate): Create the router-ca configmap in openshift-config-managed.
  • test/e2e/operator_test.go (TestRouterCACertificate): Get the router-ca configmap from the openshift-config-managed namespace.

Avoid shadowing the errors package

  • pkg/operator/controller/controller.go (reconcile): Rename errors variable to errs to avoid shadowing the errors package.

Delete unnecessary caching of the CA certificate

  • pkg/operator/controller/controller.go (CACert): Delete.
    (Config): Delete CACert.
    (ensureRouterCACertificate): Delete caching.

Delete router-ca configmap if it is not needed

If there exists no clusteringress that uses the generated default certificate, delete the configmap for the CA certificate.

This change makes it straightforward for other components to incorporate cluster-ingress-operator's CA certificate into their trust bundles exactly when some router has a default certificate signed by the CA certificate: The router-ca configmap will exist in the openshift-config-managed namespace iff the certificate CA should be trusted.

This is a follow-up to #109.

Related to NE-139.

  • pkg/operator/controller/controller.go (reconcile): Use the new shouldPublishRouterCA function to check whether the CA certificate should be published in a configmap, and use the new ensureRouterCAIsPublished method create or update it and the new ensureRouterCAIsUnpublished method to delete it as appropriate.
    (ensureRouterForIngress): Use the new ensureDefaultCertificateDeleted method to delete the operator-generated default certificate if .spec.defaultCertificateSecret is set on the clusteringress.
    (ensureDefaultCertificateForIngress): Delete unnecessary comment. Replace ensureRouterCACertificate with getRouterCA.
    (ensureDefaultCertificateDeleted): New function that deletes the operator-generated default certificate.
    (getRouterCA): New function to get the CA, or create it if it does not already exist, using ensureRouterCACertificate.
    (ensureRouterCACertificate): Rename from this...
    (ensureRouterCACertificateSecret): ...to this. Change from returning the CA itself to returning the secret. Delete unnecessary comment. Do not create the configmap, which is now the responsibility of the reconcile function.
    (shouldPublishRouterCA): New function that returns a Boolean value indicating whether there exists any clusteringress that uses the generated default certificate (in which case the CA certificate that was used to generate them should be published).
    (ensureRouterCAIsPublished): New function that creates or updates the configmap as needed.
    (ensureRouterCAIsUnpublished): New function that deletes the configmap.
  • pkg/operator/controller/controller_test.go (TestShouldPublishRouterCA): New test for shouldPublishRouterCA.
  • test/e2e/operator_test.go (TestClusterIngressUpdate): Make sure that the configmap exists for the default clusteringress, is deleted when the clusteringress has a custom default certificate set, and is recreated when the clusteringress is changed back to using a default certificate generated by the operator.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 31, 2019
@Miciah Miciah changed the title Delete router-ca configmap if not needed Delete router-ca configmap if not needed Jan 31, 2019
@Miciah
Copy link
Contributor Author

Miciah commented Feb 1, 2019

We got the following failure:

=== RUN   TestRouterCACertificate
--- FAIL: TestRouterCACertificate (1.23s)
	operator_test.go:671: failed to connect to router at abdc9f8d725a111e9860f0e4a9b0e5d5-1964219959.us-east-1.elb.amazonaws.com:443: x509: certificate signed by unknown authority

Reviewing the events and operator logs, I believe the following is the sequence of events that results in the failure:

  1. The operator is running, and it has created the default clusteringress and the default router deployment, which has a volume referencing the router-certs-default secret with the generated default certificate as the volume source.
  2. TestClusterIngressUpdate updates the clusteringress by changing .spec.defaultCertificateSecret from its initial setting.
  3. The operator updates the deployment to use the new secret name as its volume source. Then the operator deletes the router-certs-default secret and the router-ca configmap.
  4. The deployment controller spawns a new pod based on the updated deployment.
  5. TestClusterIngressUpdate updates the clusteringress's .spec.defaultCertificateSecret back to its initial setting.
  6. The operator updates the deployment to use the original secret name, router-certs-default, as its volume source. Then the operator generates a new CA certificate, creates a new router-ca configmap, generates a new default certificate, and creates the router-certs-default secret containing the new certificate.
  7. At this point, the old pod is still present (the deployment controller hasn't deleted it yet), and now the old pod again matches the deployment (it again specifies router-certs-default as the volume source), so the deployment controller deletes the new pod and leaves the old pod running. However, the old pod is still using the old certificate that it read from the old router-certs-default secret.
  8. TestRouterCACertificate reads the router-ca configmap.
  9. TestRouterCACertificate attempts to connect to the router using the CA certificate it reads from router-ca. This fails with "certificate signed by unknown authority" because TestRouterCACertificate is using the new CA certificate, but the old router pod is using a certificate generated using the old CA certificate.

Questions:

  1. Is the deployment controller's behavior correct?
  2. For expediency, should we work around the problem in the test? The problem should only occur when .spec.defaultCertificateSecret is rapidly changed to a new value and then reverted back to its previous value.
  3. How can we eliminate the race condition?
    a. Put a timestamp or fingerprint in the secret's name to ensure that we don't use the same name for two different certificates? The operator would need to generate the certificate before creating the deployment and update the clusteringress to refer to the generated secret name, and we would need to devise some way to determine during reconciliation whether a default certificate had been generated by the operator or by the administrator.
    b. Put a timestamp or fingerprint in an annotation in the pod spec of the deployment to ensure the deployment controller doesn't re-use an old pod when the certificate changes? Similar problems to the previous idea.
    c. Something elegant?

@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch 11 times, most recently from efe48f9 to 831c67f Compare February 2, 2019 09:08
@Miciah
Copy link
Contributor Author

Miciah commented Feb 2, 2019

/test e2e-aws-operator

@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch 2 times, most recently from 9da50d0 to 6543496 Compare February 4, 2019 23:55
@Miciah Miciah changed the title Delete router-ca configmap if not needed Delete router-ca configmap if it is not needed Feb 5, 2019
@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

/test e2e-aws

Copy link
Contributor

@ironcladlou ironcladlou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stuff I found here that needs looked at... another major thing which I expected to see here is a change to https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/manifests/manifests.go#L216 — as we had discussed, my understanding is that we should not update the clusteringress spec field for the default. Instead, if the clusteringress secret reference is nil, render a default name into the deployment. That way, we can distinguish between our defaulting and user's specification of a secret.

pkg/operator/controller/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/controller.go Outdated Show resolved Hide resolved
pkg/operator/controller/controller.go Outdated Show resolved Hide resolved
@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch from 6543496 to f343a2c Compare February 5, 2019 16:52
@Miciah Miciah changed the title Delete router-ca configmap if it is not needed Put router-ca configmap in openshift-config-managed iff needed Feb 5, 2019
@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch from f343a2c to 840b2ad Compare February 5, 2019 16:59
@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch from 840b2ad to 544b8dd Compare February 5, 2019 18:40
@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

Hm, should I move GlobalMachineSpecifiedConfigNamespace to perhaps pkg/operator/config/config.go?

@ironcladlou
Copy link
Contributor

@Miciah

Hm, should I move GlobalMachineSpecifiedConfigNamespace to perhaps pkg/operator/config/config.go?

What about rendering it from ManifestFactory like all the other assets?

@Miciah
Copy link
Contributor Author

Miciah commented Feb 5, 2019

What about rendering it from ManifestFactory like all the other assets?

We're not using the manifest factory to create the configmap or the namespace, so that seems like a strange place. Also, manifests currently just exports constants that no other package uses. It's weird.

@ironcladlou
Copy link
Contributor

/lgtm

@ironcladlou
Copy link
Contributor

/lgtm cancel

@ironcladlou
Copy link
Contributor

/retest

@ironcladlou
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2019
@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch from 544b8dd to 7382f5d Compare February 5, 2019 20:21
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2019
@ironcladlou
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2019
@ironcladlou
Copy link
Contributor

/retest

* manifests/00-cluster-role.yaml: Grant the operator create, get, list,
watch, and delete access to configmaps.
* pkg/operator/controller/controller.go (GlobalMachineSpecifiedConfigNamespace):
New constant for the "openshift-config-managed" namespace.
(ensureRouterCACertificate): Create the router-ca configmap in
openshift-config-managed.
* test/e2e/operator_test.go (TestRouterCACertificate): Get the router-ca
configmap from the openshift-config-managed namespace.
* pkg/operator/controller/controller.go (reconcile): Rename errors variable
to errs to avoid shadowing the errors package.
* pkg/operator/controller/controller.go (CACert): Delete.
(Config): Delete CACert.
(ensureRouterCACertificate): Delete caching.
If there exists no clusteringress that uses the generated default
certificate, delete the configmap for the CA certificate.

This change makes it straightforward for other components to incorporate
cluster-ingress-operator's CA certificate into their trust bundles exactly
when some router has a default certificate signed by the CA certificate:
The router-ca configmap will exist in the openshift-config-managed
namespace iff the certificate CA should be trusted.

This is a follow-up to commit 228c9b3.

Related to NE-139.

https://jira.coreos.com/browse/NE-139

* pkg/operator/controller/controller.go (reconcile): Use the new
shouldPublishRouterCA function to check whether the CA certificate should
be published in a configmap, and use the new ensureRouterCAIsPublished
method create or update it and the new ensureRouterCAIsUnpublished method
to delete it as appropriate.
(ensureRouterForIngress): Use the new ensureDefaultCertificateDeleted
method to delete the operator-generated default certificate if
.spec.defaultCertificateSecret is set on the clusteringress.
(ensureDefaultCertificateForIngress): Delete unnecessary comment.  Replace
ensureRouterCACertificate with getRouterCA.
(ensureDefaultCertificateDeleted): New function that deletes the
operator-generated default certificate.
(getRouterCA): New function to get the CA, or create it if it does not
already exist, using ensureRouterCACertificate.
(ensureRouterCACertificate): Rename from this...
(ensureRouterCACertificateSecret): ...to this.  Change from returning the
CA itself to returning the secret.  Delete unnecessary comment.  Do not
create the configmap, which is now the responsibility of the reconcile
function.
(shouldPublishRouterCA): New function that returns a Boolean value
indicating whether there exists any clusteringress that uses the generated
default certificate (in which case the CA certificate that was used to
generate them should be published).
(ensureRouterCAIsPublished): New function that creates or updates the
configmap as needed.
(ensureRouterCAIsUnpublished): New function that deletes the configmap.
* pkg/operator/controller/controller_test.go (TestShouldPublishRouterCA):
New test for shouldPublishRouterCA.
* test/e2e/operator_test.go (TestClusterIngressUpdate): Make sure that the
configmap exists for the default clusteringress, is deleted when the
clusteringress has a custom default certificate set, and is recreated when
the clusteringress is changed back to using a default certificate generated
by the operator.
@Miciah Miciah force-pushed the NE-139-delete-router-ca-configmap-if-not-needed branch from 7382f5d to c8d7b5b Compare February 6, 2019 01:12
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2019
@ironcladlou
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@enj
Copy link

enj commented Feb 6, 2019

/refresh

@enj
Copy link

enj commented Feb 6, 2019

/test all

@openshift-merge-robot
Copy link
Contributor

/retest

1 similar comment
@openshift-merge-robot
Copy link
Contributor

/retest

@openshift-merge-robot openshift-merge-robot merged commit e751702 into openshift:master Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants