-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token renewal failure time bomb 馃挘 #48
Comments
Hi @moertel, thank you for the detailed issue description. I would prefer your second solution, to keep the code simpler and that I haven't to maintain two logics (one with the Maybe we can use an exponential backoff algorithm for the sleep time in case of an error. So that we retry after 10, 20, 40, 80, etc. seconds. When the renewal succeeds we can reset this. Other solution would be to try it every 30 seconds or so. What do you think? |
Both are fine with me. Personally, I always lean on the "making-it-configurable" side of things. One idea regarding backwards-compatibility: letting Exponential backoff sounds neat but as we can assume that we still have a valid token for quite some time it's probably overkill to try to optimise the time it takes to renew it. Let's just go with a 30-second interval. It won't hammer the Vault instance but still leave plenty of time to renew the token. I'll make PRs for both things, if you don't mind, and if you feel the introduction of |
By default, token renewals take place after 50% of the token TTL have passed. In certain scenarios this would lead to only having a single chance to renew the token successfully before it would expire. This commit decreases the time until the next renewal attempt in case the last renewal failed. Fixes ricoberger#48
Thanks for your contribution. The idea for the backwards compatibility sounds good. If you want to come up with another PR for this, you are welcome. |
For particular Vault setups, the default values for token renewals might not be a good fit. This commit adds a parametrisation, so that the time between a successful or failed token renewal and the next renewal attempt can be controlled via optional environment variables. VAULT_TOKEN_RENEWAL_INTERVAL: The time (in seconds) between a successful token renewal and the next renewal attempt. Defaults to 50% of the token lease duration. VAULT_TOKEN_RENEWAL_RETRY_INTERVAL: The time (in seconds) between a failed token renewal and the next renewal attempt. Defaults to 30 seconds. Refs ricoberger#48
Hi, your two PRs are included in the 1.8.0 release. |
Perfect, thanks a lot for the quick feedback and turnaround @ricoberger 馃憤 |
I'm using token-based auth and am wondering whether there's a design issue with how tokens are renewed.
Whatever the value of
VAULT_TOKEN_LEASE_DURATION
, with each failure to renew the token, the initial "buffer" (of the recommended 24h) will get smaller and smaller until it hits a lower limit where there's only one chance (two if you're lucky) for renewal until the token expires:vault-secrets-operator/pkg/vault/vault.go
Lines 178 to 189 in 59712f0
Example for a token with initial TTL of 24h and
VAULT_TOKEN_LEASE_DURATION
set to 6h:ttl=24h
, renew failsttl=21h
, renew failsttl=18h
, renew failsttl=15h
, renew failsttl=12h
, renew failsttl=9h
, renew failsttl=6h
, renew failsttl=3h
, renew failsttl=0h
, renew succeeds --> new TTL is 6httl=3h
, renew failsttl=0h
(let's hope we have some headroom here)So in a sense, the initial validity of the token buys us an error budget. In this case, it would be 9 or 10 failures until the token could expire (depending on whether we run into a race condition if the token is "almost expired").
While it is an unlikely event, especially in a HA Vault setup, it's definitely possible (and more likely as time progresses) and actually pretty tragic that the
sleep
is that long even in case of an error. 馃槄 I'm wondering whether you'd be open to adding aVAULT_TOKEN_RENEWAL_INTERVAL
variable that makes this configurable, or by reducing thesleep
in case of errors.The text was updated successfully, but these errors were encountered: