Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CA issuer secret rotation #2478

Open
jroper opened this issue Dec 18, 2019 · 47 comments
Open

Allow CA issuer secret rotation #2478

jroper opened this issue Dec 18, 2019 · 47 comments
Labels
area/ca Indicates a PR directly modifies the CA Issuer code kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@jroper
Copy link

jroper commented Dec 18, 2019

Is your feature request related to a problem? Please describe.
The CA issuer will automatically rotate secrets for the secrets it generates, but what about rotating its own secret and certificate? What happens when its certificate expires? Anything trusting the secrets it generates will break. And if you give it a new secret? Well now either you have to instantly renew all certificates and make everything pick up those certificates at the same time. There has to be a better way.

Describe the solution you'd like
Provide the ability to configure multiple secrets (or at least, one secret, and additional certs). The additional certs are configured in k8s secrets as additional certs that a consumer should trust. Consumers can pick up these additional certs, and them to the list of root cas they trust. Now, it is possible to rotate ca certificates, without ever having something not trusting a certificate that should be trusted, with the following procedure:

  • Generate the new private key/cert for the ca cert, and create a new secret for it.
  • Add the secret to the ca issuers list of additional secrets. The ca should update all secrets it's generated to include this additional CA.
  • Wait for long enough for all consumers to pick the additional CA up, so now everything trusts two different root CA certificates.
  • Update the CA issuer to use the new secret as its primary secret, and move the old secret to its list of additional secrets.
  • Wait for all existing issued certificates to expire/be rotated.
  • Remove the old root CA certificate from the list of additional ca certificates in the issuer.

Describe alternatives you've considered
I don't know if there are any alternatives - as discussed above, there's no other way to rotate root certificates without risking an outage.

/kind feature

@jetstack-bot jetstack-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 18, 2019
@munnerz
Copy link
Member

munnerz commented Jan 17, 2020

Instead of the CA issuer generating its own certificate, you should use the self signed issuer type to automatically handle renewing the CA certificate.

That said, the CA issuer should have a mechanism for rotating all leafs under the root, as this is a generally useful thing.

The steps you've suggested do make sense, however I'm concerned that if cert-manager is too opinionated here then we'll build an API that works for nobody - I'd rather cert-manager provider API primitives that allow for different rotation strategies to be built, and then provide a sane default implementation perhaps like you've described.

Right now, trust distribution isn't handled by cert-manager - in some cases, reading the ca.crt from the generated secret is sufficient, but in many other cases the CA may be distributed through some other manual mechanism to end-user systems. If we rush ahead and make assumptions then I fear we'll build something that could cause more harm than good.

@retest-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 16, 2020
@meyskens
Copy link
Contributor

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2020
@meyskens
Copy link
Contributor

/area ca
/priority important-longterm

@jetstack-bot jetstack-bot added area/ca Indicates a PR directly modifies the CA Issuer code priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 24, 2020
@munnerz munnerz added this to the v0.16 milestone May 11, 2020
@munnerz munnerz modified the milestones: v0.16, Next Jul 8, 2020
@stefansedich
Copy link

stefansedich commented Nov 23, 2020

We suffer the same issue, we are using kiam and have setup an issuer with a self signed CA cert, but if the CA cert renews, the old server and agent certs are no longer valid, often resulting in an outage forcing us to remove the agent and server cert secrets to let it generate new ones.

Would be greate if there could be a dependency chain or something where we could link the agent/server certs to the CA cert and have them renew when the CA cert does.

@munnerz is that the kind of thing you were thinking? how are people handling this today, we basically have a kiam outage every 3 months resulting in some downtime and manual intervention which is becoming painful, often the CA cert will renew leaving the agent/server certs invalid until they renew which may or may not be shortly after the CA cert.

@meyskens
Copy link
Contributor

This is something we've been planning to do for a while but isn't easy to execute properly. It is on our longer term roadmap, however PRs are always welcome

@anguslees
Copy link

anguslees commented Dec 29, 2020

I think the solution here requires preserving both the old and new CA key for some overlapping period during the renewal. This also means publishing a "caBundle" rather than just a single CA in secret/ca.crt, etc.

  1. t=0: We have only the original CA. Life is simple.
  2. t=$renewalTime: Generate new CA key. We now have two CA keys (old and new), and both are valid. Clients should be given both so they can validate that remote key satisfies either CA.
  3. All new cert signatures should be performed with the "newest" key (or key with latest notAfter?)
  4. Publish both CA keys to all clients. In cert-manager, I think this means including both CA certs in Secret.data ca.crt.
  5. All (valid) certificates signed by original CA key need to be regenerated/re-signed by new CA key. I think this is best handled by making the CA rotation time longer than the leaf cert renewal time and let natural cert rotation do its thing - but it could be handled by explicitly re-signing everything earlier than expected.
  6. t=$notAfter: Original CA key has expired and can be deleted. Remove from ca.crt caBundle. Clients should ignore it anyway, because we're outside notAfter.
  7. Goto 1.

For cert-manager, I think this means we need a way to keep both keys around somehow.
This implies we need either a "secret selector" rather than just a single "secretName". Alternatively, I think we only need the old CA public key, so we could just preserve the old CA tls.crt somewhere in the ca/self-sign Issuer itself as a special-case, or preserve any existing ca.crts in the target Secret as part of leaf Cert re-generation...

(cainjector should obviously also be updated to inject multiple CA certs.)

@dobesv
Copy link
Contributor

dobesv commented May 11, 2021

For the specific case of setting up an internal CA to use for kubernetes webhooks and API services, tell the CA injector to inject from the issued certificate instead of referencing the certificate issued to the CA.

This works because:

  1. The issued certificate has a copy of the CA certificate from the time the certificate was issued, so it always matches the one referenced in the certificate.
  2. CA injector actually inject the CA certificate from the specified certificate, not the certificate that was issued (contrary to what I had previously thought)

A similar approach may be suitable for some other use cases.

@dobesv
Copy link
Contributor

dobesv commented May 11, 2021

Given the realization shared in my prior comment, you only need to re-issue certificates if clients cannot be told to trust the CA cert in the issued certificates instead of trusting the current CA cert that would be issued for issuance.

Not sure if that helps with kiam, I don't know how that is setup.

@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2021
@jetstack-bot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

@jetstack-bot jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 16, 2021
@mohag
Copy link

mohag commented Nov 4, 2021

We have an issue where we have these:

NAME                                               READY SECRET                 AGE
certificate.cert-manager.io/app1-auth-cert         True  app1-auth-tls          7d21h
certificate.cert-manager.io/namespace-root-ca-cert True  namespace-root-ca-cert 7d21h

NAME                                               READY   AGE
issuer.cert-manager.io/namespace-issuer           True    7d21h
issuer.cert-manager.io/namespace-root-ca-issuer   True    7d21h

Details on the resources:

---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: namespace-root-ca-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: namespace-root-ca-cert
spec:
  commonName: namespace-root-ca
  isCA: true
  issuerRef:
    name: namespace-root-ca-issuer
  secretName: namespace-root-ca-cert
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: namespace-issuer
spec:
  ca:
    secretName: namespace-root-ca-cert
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app1-auth-cert
spec:
  commonName: app1.namespace.svc.cluster.local
  dnsNames:
  - app1
  issuerRef:
    kind: Issuer
    name: namespace-issuer
  secretName: app1-auth-tls

The problem is, if namespace-root-ca-cert expires, app-auth-cert still contains the old, expired ca.crt. Our workaround is to delete the app1-auth-tls secret, which forces it to be regenerated (probably reissued). (the certificate in app-auth-cert is valid, but its CA in the ca.crt is not anymore (expired). (We had this with cert-manager 0.9.1 and still have it with 1.3.1 (I haven't tried newer versions yet))

Automatically reissuing the certificates issued by a CA if its certificate change seems like the most obvious solution. (I'm not exactly sure how our app validates the cert - the microservices might be using the ca.crt from its own certificate secret (I only left 1 app cert in the examples above to keep it shorter)

Edit: I'll need to double check exactly where in the app1-auth-tls secret the (expired) CA is included (when the cert is still valid, but the CA got renewed after the cert) (I troubleshooted this months ago and hoped the problem would go away when finally moving away from 0.9.1, but it didn't) (Same goed for the validation) (the certificates are somewhat manipulated in an init-container by the app, a PFX conversion IIRC)

@jetstack-bot jetstack-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 4, 2021
@diranged
Copy link

So ... I understand the issue, and how it has not yet been resolved.. quick question though. If we need to rely on making sure that our applications mount in the actual certificate authority ca.crt file, rather than the bundled copy that comes in our application-tls-secret, how are we supposed to get that data safely without granting access to the CA's Secret resource that technically includes the CA private key information?

@dobesv
Copy link
Contributor

dobesv commented Jun 16, 2022

if the renewing CA certificate is re-using the same private key

I guess you're not really rotating the CA key here, though, you're just issuing a new certificate with new expiry dates.

@illeatmyhat
Copy link

illeatmyhat commented Aug 9, 2022

That said, I do think cert-manager should address this issue so that the certs cannot outlive the CA cert, it seems like a bug to me.

chiming in to say that exactly this happened to me 3 months after setting up the documented self-signed > CA Issuer > certificate chain, and the solution to this problem is absolutely not obvious
I'm just gonna kick the can down the road, seems like the advertised solution

@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2022
@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 30, 2022
@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2023
@seh
Copy link
Contributor

seh commented Feb 28, 2023

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2023
@Jamstah
Copy link

Jamstah commented May 18, 2023

I feel like these are related:

Although the suggestion there is to manually copy issued CA certs into the bundle. I feel that an automated solution to maintain previous CA certs would still be valuable, maybe a new CRD called RotatingIssuer or something.

@kfox1111
Copy link

kfox1111 commented Jun 5, 2023

We hit this on a webhook today.

@Jiawei0227
Copy link

Just think out load on the implementation, would it be possible to add a new field in the Issuer and ClusterIssuer to have an optional field called "autoRotateLeaf: true"

In which case if the spec.ca.secretName get changed somehow, it automatically deletes all the cert it issued and re-issue new ones?

The implementation could be a reconciler watch the secrets and delete secrets that issued by this issuer.

@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2023
@LevN0
Copy link

LevN0 commented Nov 5, 2023

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2023
@jeremyharisch
Copy link

Any updates on this issue? Are there any plans to get such feature as autoRotateLeaf: true? Would be awesome to have it, since we are facing exactly that issue in our project.

Our current solution is so watch the CA for rotation in our operator and then manually delete the corresponding leaf certs.

@thomasmRavn
Copy link

thomasmRavn commented Nov 22, 2023

Hey JeremyHarisch,
I'm watching this issue because I also really want the autoRotateLeaf option but wanted to say while we wait you don't need to manually delete the leaf certs

Borrowing from anguslees' comment
#2478 (comment)

have your CA last at least twice as long as the child cert.
Then the renewbefore kicks in 2/3s of the CA's lifetime but is still valid for the lifetime of the child cert

For example
CA lasts 1 year
cert lasts 6months
day 1 new CA new cert everything is working
month 6 cert rotates with old CA, everything works
month 8 CA renews, but that's fine, the current CA is still valid for another 4 months so everything works
month 12 cert renews and takes the new CA, CA is only 4 months old at this point so does nothing

@dobesv
Copy link
Contributor

dobesv commented Nov 27, 2023

month 8 CA renews, but that's fine, the current CA is still valid for another 4 months so everything works

This isn't true the way things are now. Currently if the CA secret is re-issued the old CA certificate is not retained. If there's a process that fetches the CA directly and tries to use it to validate a cert issued by the old CA secret, it fails.

The only way to gracefully transition is if you have two CA secrets active at once, one to validate previously issued certificates and one to validate newly issued certificates. You have to keep the old CA cert until all the certificates it signed have been rotated.

Note that there's an important difference between rotating the CA certificate (the public part) and the CA secret (the private part). You can re-issue the public certificate for the CA without invalidating the existing certificates; the public and private key in this case don't change, only the metadata like issue and expiry dates, so the old signatures are still valid. If you rotate the secret part (the private key) you also have to change the public key, so the old signatures will no longer match.

The request here is to support the case where the CA secret can be rotated, not just re-issuing a new certificate for the same key pair.

In this case we want the old CA public key to be retained and used to validate certificates until the old certificates are reissued using the new CA key pair.

If that's too difficult, cert-manager could at least automatically re-issue all the previously issued certificates using the new CA key pair as quickly as possible to reduce downtime.

@tspearconquest
Copy link

tspearconquest commented Dec 5, 2023

We ran into this today. Our go-forward solution probably includes no longer creating ClusterIssuers. IIUC we can get the same effect without this problem by leveraging trust manager to make a CA bundle available for any local namespace level Issuer, so we will probably switch to namespace specific issuers if the above works as we hope.

@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2024
@holyspectral
Copy link

/remove-lifecycle stale

@jetstack-bot jetstack-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ca Indicates a PR directly modifies the CA Issuer code kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests