Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable pkiCert rendering interval #1646

Closed
fitz123 opened this issue Sep 26, 2022 · 4 comments · Fixed by #1908
Closed

Configurable pkiCert rendering interval #1646

fitz123 opened this issue Sep 26, 2022 · 4 comments · Fixed by #1908
Labels
enhancement vault Related to the Vault integration
Projects

Comments

@fitz123
Copy link

fitz123 commented Sep 26, 2022

Consul Template version

Vault v1.11.3 (17250b25303c6418c283c95b1d5a9c9f16174fe8), built 2022-08-26T10:27:10Z

Configuration

template {
  source = "/etc/vault.d/templates/dynamic-cert-chained.tpl"
  destination = "/run/vault-agent/approle/fullchain.pem"
  command = "sudo /usr/sbin/nginx -s reload || sudo /bin/systemctl start nginx"
  perms = 0640
}
{{ with pkiCert "pki/issue/approle" "common_name=approle.domain.com" "ttl=1176h" -}}
{{ .Cert -}}
{{ with secret "pki/cert/ca_chain" -}}
{{ .Data.certificate }}
{{- end }}{{ if .Key -}}
{{ .Key | writeToFile "/run/vault-agent/approle/privkey.pem" "vault" "bin" "0640" -}}
{{ end -}}
{{ end -}}

Expected behavior

Ability to set rendering interval for the pki certificates, similar to static_secret_render_interval but for pkiCert templating function.
We want to renew certificate when 15% of the TTL is reached.

Actual behavior

Currently vault-agent (consul-tempate) re-issues certificates when 85% of the secrets time-to-live (TTL) is reached and this is not configurable.

References

@eikenb
Copy link
Contributor

eikenb commented Sep 26, 2022

Hey @fitz123, thanks for taking the time to file this.

If you have a moment would you mind explaining the use case for a renewing at 15% of TTL? Not that I doubt your need or anything, it just helps me to understand the use cases so I can take them into account going forward. Thanks!

@cipherboy
Copy link

cipherboy commented Sep 26, 2022

@eikenb I think they elaborated more on the Vault issue, but they have a 7-day internal SLA on resolving Vault outages right now; Vault may be down for 6.999 days or so, but no more (theoretically).

They want to drop the certificate lifetime to be shorter, but in order to satisfy that SLA with the default 85% window, they need to have the cert lifetime be at least 7/(1-0.85) = 46 days, in order for renewal at 85% to be greater than 7 days (and thus, not risk Vault being down). Dropping to 50% would allow say, a 15-day cert to be issued (while still having that window be greater than 7 days), and a 15% would allow say, an 8-10 day cert (I believe).

@eikenb
Copy link
Contributor

eikenb commented Sep 26, 2022

Thanks for the explanation! I should have read the vault ticket as it does lay out their use case. Basically they need certs to always have 7+ days left on them so they can deal with the vault's 7 day SLA. Where using 85% would mean they would need 46 day TTLs to have that 7 day buffer.

One wrinkle that will probably come up... the TTL checking code currently has a minimum duration of 10% of the TTL. Where it compares 90% of the duration left on the TTL to 10% of the lifetime TTL and gets a new one if <10% of the lifetime. This is to keep it from using TTLs with only a very short time left. This logic would obviously need to change and I'm looking for feedback. It could be something like 10% of the configured % or maybe some fixed amount (eg. <1min). I don't want 0 as you can't manage jitter with a very low duration and we need to avoid thundering herd problems.

Any thoughts on this would be helpful when we get to implementation.

Thanks again for the explanation.

@Malshtur
Copy link

Malshtur commented Aug 4, 2023

There are two possibilities that makes sense to me : fixed amount of time and % of the lifetime TTL

  1. Fixed amount has to be limited :

    • lowerbound to handle jitter, network nightmares and thundering herd, as you said : 30 seconds or 1 minutes at the lowest seem reasonable to me as lowest possible values maybe a bit longer for scaling. 5 minutes would be fair too if scaling issues arise.
    • upperbound to handle a configuration exceeding the lifetime TTL to avoid service outage. I can't find a case where it would be pertinent. I don't know if it is possible to ensure the upperbound does no exceed 90% of the -max-ttl configured on the issuer, i wouldn't bet on this ability.
  2. Relative to the lifetime TTL would be my preferred choice but it's personal :

    • providing a valid range from 10% to 90% would cover all real problems if the lifetime TTL isn't too farfetched (eg. 30sec lifetime for a cert is a funny but not very usable in my opinion)

So if we group all of this i think we have a decent rule : the longest between (30 sec or 1 min or 5 min) and configurable % of lifetime TTL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement vault Related to the Vault integration
Projects
Roadmap
v0.31.0
4 participants