New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OIDC key rotations happen up to 60 seconds after they are scheduled #11438
Comments
@kalafut sorry to ping you, but after initially filing this issue I did some more investigation and it changed from what I think was a curiosity (you can't rotate JWT signing keys more frequently than 2m) to more of an issue, due to the up to 60 seconds where the as a result, I heavily rewrote the initial issue to include the increased impact |
I've put together a prototype for advertising new keys 1 The branch additionally updates the It appears to work when I run it locally: with a dev server:
and a sample of cache control values:
But the code could probably use more tests and some cleanup. Let me know if this is an approach y'all are interested in, and if so I can add tests, clean up the code and open that branch as a pull request |
I saw the OIDC description of a rotation strategy ( That strategy addresses the cases in this issue and #9201 where the Cache-Control header would be 0 or no-store, but still requires all servers verifying JWTs from an Vault issuer to successfully read the JWKS endpoint on the Vault cluster at nearly the same time (when a new KID appears), which is not ideal. Additionally, using the user provided For those reasons, I think it is still advantageous for clients to utilize a more evenly spread rotation process with new keys being published for some period prior to being used. |
@ianferguson Thanks for reporting this issue! If you would be willing to put up a PR for your branch, I would be happy to work with you to get it merged. |
@fairclothjm excellent! That branch was a very rough proof of concept, I should have time to clean it up next week and get it opened as a pull request Please do let me know if y'all have any preferences around the design or configurable options, happy to incorporate any preferences y'all have |
@ianferguson Sounds good, thanks! |
sorry I wasn't able to clean my branch up for a PR and thank you so much for taking this across the line! |
Thanks for reporting and providing the initial groundwork! |
Describe the bug
The identity backend's JWT signing keys are rotated in the PeriodicFunc of the identity backend, which is triggered every 60 seconds by the RollbackManager. The PeriodicFunc only rotates keys that are overdue for rotation, which means that in practice there is a gap of up to 60 seconds between when a key is scheduled to be rotated and when it actually is.
This becomes an issue because the
Cache-Control
header returned on calls toidentity/oidc/.well-known/keys
appears to be calculated based on the scheduled rotation time, and will returnCache-Control: no-store
for the 0 to 60 seconds between the scheduled rotation time and when the RollbackManager calls the Identity backend's PeriodicFunc and the rotation happens.When normal longer signing key rotation periods, such as 24h or 7d, are used there is still a gap between the scheduled key rotation time and the actual rotation. This means that for up to 60 seconds every service that follows the Cache-Control directives for managing their active JWKS sets for verification will potentially make a call to Vault to fetch the JWKS for every single inbound call made to the service.
The service has to choose between caching the JWKS received with a
Cache-Control: no-store
for 1 second or 1 minute and potentially rejecting some valid requests signed with a new Key ID not in the old (improperly cached) JWKS set, or potentially slamming vault with thousands of requests, one per every request received by the service.** Original Description **
note: I originally filed this issue as a quirk of sub 2 minute rotation configurations, but on further investigation it has broader consequences. The original description is below.
The identity backend allows users to configure signing key rotations as frequently as once a minute, but in practice cannot rotate keys more frequently than 1m, and will only be rotated every 2 minutes.
I believe this is due to the key rotation being handled by the
PeriodicFunc
for the identity backend, which is in turn managed/scheduled by the RollbackManager, which is fixed at a once a minute tick speed:vault/vault/rollback.go
Line 19 in a24653c
when the periodic frequency is set to 60-121 seconds (< 120), when the rollback manager runs at time + 60 seconds, it is not time to rotate the keys yet. It isn't until the RollbackManager ticks a second time (time + 120 total seconds) that the key will be eligible for rotation
To Reproduce
Steps to reproduce the behavior:
run a vault dev server with debug log level set:
In another terminal/tab configure a signing key with a rotation period of 1m (or any value from 60-119s):
if you watch the dev server logs at that point, it will print rotation/deletion secrets every 2 minutes, rather than every 1 minute:
Additionally, when polling the server's JWKS endpoint with curl, key ids will be visible for longer than the configured (
verification_ttl
+rotation_period
) time (1m each, for a total of 2m) that they should be visible forThe cache control behavior can be observed by running a command like the following:
Which will output something like
Expected behavior
Vault would ideally rotate the signing keys no later than the specified frequency. I think if the rotation frequency is 24 hours, it is better to rotate at 23 hours, 59 minutes and 59 seconds than at 24 hours and 1 second since the last rotation, all things consideredThis would both ensure that people could assert that their keys are rotated within a certain time, which may be important for compliance reasons, and more importantly would remove the period of time where services verifying JWTs issued by vault would potentially do a sideband network call per incoming call to Vault'sidentity/oidc/.well-known/keys
end point.The crossed out suggestion above would not work, because a server verifying JWTs would potentially read the JWKS keys, get a
Cache-Control: 25
header back, only to have Vault rotate and add a new signing key the next second. During those 25 seconds that one server would continue using its cached copy of the old JWKS set, and reject JWTs signed with the new key.Based on that, I think the only way to remove both the
Cache-Control: no-data
state and jitter/spread out when services retrieve new JWKS sets from Vault is to implement publish new JWKS keys well ahead of starting to sign JWTs with them as documented by @evanj in #9201.Environment:
vault status
): v1.6.2vault version
): v1.6.2Vault server configuration file(s):
none, can be done using
vault server -dev
Additional context
This issue partially duplicates the narrower half of #9201, but is focused only on the potential bug related to actual key rotation happening after scheduled rotation and doesn't include the feature request for pre-publishing future keys that #9201 includes
The text was updated successfully, but these errors were encountered: