Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set owner reference to secrets created by webhook cert manager #530

Merged
merged 1 commit into from
Jun 11, 2021

Conversation

thisisnotashwin
Copy link
Contributor

@thisisnotashwin thisisnotashwin commented Jun 10, 2021

Changes proposed in this PR:

When the certificate secret is created or updated, set an OwnerReference on the secret as the webhook-cert-manager deployment. This ensures that deletion of the deployment will also delete the secrets. This addresses the race condition bug that we sometimes see when re-installing consul on a cluster that had a consul deleted from it. This was because the helm delete would not delete the existing secrets with certificates. When the controller would get created with a new installation, it would mount the existing secret (which was stale) and the secret on disk would get rotated before the cert watcher started which would lead to the controller using certificates signed by a CA different from the CA bundle on the MWC which would lead to x509 errors.

This change would ensure the secrets get deleted every single time and hence, a new secret would always get created during a helm install. This also ensure an existing secret, when updated is updated with the owner ref ensuring helm upgrades or installs to a cluster with an existing secret give people the desired behavior as well.

How I've tested this PR: hashicorp/consul-helm#987

How I expect reviewers to test this PR: Code review

Checklist:

  • Tests added
  • CHANGELOG entry added (HashiCorp engineers only, community PRs should not add a changelog entry)

@@ -59,6 +62,10 @@ func (c *Command) init() {
c.flagSet = flag.NewFlagSet("", flag.ContinueOnError)
c.flagSet.StringVar(&c.flagConfigFile, "config-file", "",
"Path to a config file to read webhook configs from. This file must be in JSON format.")
c.flagSet.StringVar(&c.flagDeploymentName, "deployment-name", "",
"Name of deployment that is the owner of the secret")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what "the secret" is. I think we can just say name of the deployment this pod is running in. What we do with that is an impl detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -251,6 +281,14 @@ func (c *Command) reconcileCertificates(ctx context.Context, clientset kubernete

certSecret.Data[corev1.TLSCertKey] = bundle.Cert
certSecret.Data[corev1.TLSPrivateKeyKey] = bundle.Key
certSecret.OwnerReferences = []metav1.OwnerReference{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this is here to update existing secrets that were created before we added ownerReference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@lkysow lkysow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Did you find out what happens if the cert-manager deployment is deleted?

@thisisnotashwin
Copy link
Contributor Author

LGTM. Did you find out what happens if the cert-manager deployment is deleted?

oh oh I did!! i forgot to put it in here. you were right. the pods just keep them in memory and continue to work. it only becomes problematic if the cert expires or the pod has to restart.

Copy link
Contributor

@kschoche kschoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧀

When the certificate secret is created or updated, set an OwnerReference on the secret as the webhook-cert-manager deployment. This ensures that deletion of the deployment will also delete the secrets. This addresses the race condition bug that we sometimes see when re-installing consul on a cluster that had a consul deleted from it. This was because the helm delete would not delete the existing secrets with certificates. When the controller would get created with a new installation, it would mount the existing secret (which was stale) and the secret on disk would get rotated before the cert watcher started which would lead to the controller using certificates signed by a CA different from the CA bundle on the MWC which would lead to x509 errors.

This change would ensure the secrets get deleted every single time and hence, a new secret would always get created during a helm install. This also ensure an existing secret, when updated is updated with the owner ref ensuring helm upgrades or installs to a cluster with an existing secret give people the desired behavior as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants