Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] are vault credentials refreshed? #65

Closed
raffaelespazzoli opened this issue Aug 16, 2022 · 44 comments
Closed

[question] are vault credentials refreshed? #65

raffaelespazzoli opened this issue Aug 16, 2022 · 44 comments

Comments

@raffaelespazzoli
Copy link

one of the most powerful feature of Vault is the idea of secret engines: component that generate new credentials for a subsystem that needs authentication. Normally these credentials are short lived. the kv secret engine is somewhat of an exception to this rule as the key value store have unlimited lifetime.
When reading a secret from a path backed by a secret engine, vault returns the associated TTL (lease duration).
does the quarkus-vault library honors this behavior?
The expectation would be that as the TTL deadline approaches, the quarkus-vault library re-reads the secret (obtaining a new credential) and refreshes all of the beans using that property. This may involve some complicated logic in cases when for example the property are used by a datasource.

@vsevel
Copy link
Contributor

vsevel commented Aug 17, 2022

yes there is. see secret-config-cache-period
does it fit your need?

@raffaelespazzoli
Copy link
Author

no that property does not work. First, it's applied to only to a kv store (which arguably does not have the problem of short-lived credentials), secondly, it's secret store wide, while each secret can have a different TTL.
The best way to look at Vault is to consider that people will use the appropriate secret engine when possible and fall back to kv when in case the middleware they are interacting with is not covered by the available secret engine.

Normally non-kv secret engines produce short lived credentials. As an example let's take the database/postgresql engine as featured in the quarkus guide for [dynamic credentials](https://quarkiverse.github.io/quarkiverse-docs/quarkus-vault/dev/vault-datasource.html#_dynamic_database_credentials.
Once everything is correctly configured, this is what Vault returns when reading a credential:

$ vault read database/creds/my-role
Key                Value
---                -----
lease_id           database/creds/my-role/2f6a614c-4aa2-7b19-24b9-ad944a8d4de6
lease_duration     1h
lease_renewable    true
password           SsnoaA-8Tv4t34f41baD
username           v-vaultuse-my-role-x

the lease duration is 1h. The quarkus-vault library should be aware of that and schedule a refresh of this property in a bit less 1h. Beyond that time the credential will be revoked and attempting to connect/or run sql statements with a stale credentials will result in an error.
Ideally the quarkus-vault library would refresh that property and every bean that depends on it, in this case the datasource and all the connections with its connection pool.

@raffaelespazzoli
Copy link
Author

the way I'd design this feature is:

  1. if a TTL is present in the Vault response, use that TTL to refresh the property
  2. if a TTL is not present and a secret-engine wide refresh property has been defined, refresh with that cadence
  3. otherwise, never refresh that property.
    refreshing should mean reloading all the dependent beans. It's unclear to me (because of my ignorance) if this is capability that quarkus has today.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

refreshing should mean reloading all the dependent beans

This is not possible in Quarkus and would be an extremely complicated thing to attempt. Even in dev mode Quarkus is just restart, using the static build time config as the beginning state.

Using your database example, Quarkus does expire the database credentials in the appropriate time but waits until the database driver requests the credentials again to perform the actual refresh. It's the database's job to disconnect the driver when credentials are stale, and they'll be refreshed if/when that happens.

Given that... using a secret-config-cache-period smaller than your smallest KV temporary credential lease time will achieve nearly the exact same expiration functionality as we provide for "proper" temporary credentials.

@raffaelespazzoli
Copy link
Author

@kdubb I'm a bit confused by your answer. I suggest to try to break the conversation into two pieces:

  1. when should Vault credentials be refreshed?
  2. what is supposed to happen when properties are refreshed.

On the 1. Vault exposes a contract (it tells you when you should refresh) and I think the quarkus-vault plugin should honor that contract.
On 2. I get that it might be very complicated, so you SMEs tell me what is possible. I'm just saying that as a developer, I'd expect to be able to change a property (this is true for any property, not just those sourced from Vault) and have all of the dependent beans be refreshed. It makes sense and my understanding is that spring works that way, so it's technically possible.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

1 - With regard to propery temporary credentials, Vault tells you when the credentials expire not when to refresh. If a set of credentials expire, but are never needed again, what would be the use of refreshing them over and over as they expire each time?

This Vault extension honors that contract properly. The extension will not allow the use of expired temporary credentials. Once they expire, any request for that set of credentials will trigger a refresh before the credentials are returned to the requestor.

1b - Vault's KV secrets are not expired based on a time-to-live or some equivalent lease duration. The KV2 engine doesn't report a lease duration at all for its entries see here and while the KV1 engine allowed setting a time-to-live and reported it as lease_duration, it did not expire keys see here and the team purposefully didn't transfer this functionality to KV2 see here.

So, we are offering the best we can with regard to KV secrets and we are refreshing them periodically; which is all we can really do for them.

2 - Reloading a bean because a configuration value changed is not a Quarkus feature that I know of. Either way, support would be through Quarkus main not through this extension.

@vsevel
Copy link
Contributor

vsevel commented Aug 17, 2022

for what it is worth, here is a conversation we had with Luis with respect to caching credentials for the db use case.
And the associated issue.
we have never had any issue being reported on this particular use case. Do you think something is missing there?

it is worth mentioning the renew-grace-period property as well.

do you think something is still missing?

@raffaelespazzoli
Copy link
Author

@vsevel I think the conversation link you share goes exactly in the same direction as the question I am posing.
I read the renew-grace-period docs a couple of times, it's not very clear to me, but I infer it's talking about the vault token renew period, not about the credentials retrieved via that token.

we have never had any issue being reported on this particular use case. Do you think something is missing there? Well it's unclear if we preemptively honor the credential lease or we wait for something to fail and the try to reload the credentials. The latter strategy might not always work and it's now how Vault is intended to be used.

@kdubb 1 - With regard to propery temporary credentials, Vault tells you when the credentials expire not when to refresh. If a set of credentials expire, but are never needed again, what would be the use of refreshing them over and over as they expire each time? I don't think you want to make assumptions on how the code will use the credentials (just once, or repeatably over time). so it would be just better to refresh when Vault tells you to. Or perhaps add a property to whether that you need refreshing and it true, still refresh when Vault tells you to.

The extension will not allow the use of expired temporary credentials. what do you mean by that? If I have a db connection open with those credentials...

1b, I don't have an issue with how kv is implemented today. My problem is with all the other secret engines which do generate short-lived credentials. That said, I think one could have a design where you don't need to distinguish between secret engines (after all it's just a read operation to a give path) and perhaps you can just look at the returned stricture and use the algorithm I suggested to determine whether/when to refresh.

2. ok and agreed. Still very relevant to this discussion to understand all the ramifications. For example what would be the behavior when using the quarkus-rabbitmq connector and the vault rabbitmq secret engine?

@vsevel
Copy link
Contributor

vsevel commented Aug 17, 2022

it's talking about the vault token renew period, not about the credentials retrieved via that token.

correct. it is being used in the login use case, and dynamic credentials

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

Vault itself operates exactly like quarkus-vault with regard to refreshing temporary credentials. Once a lease has expired the credentials are not proactively refreshed by Vault. Vault waits until the next request to refresh ,or create new, credentials before handing them back to the requestor.

It is your job to ensure your database connections are reconnecting often enough that your temporary credentials are refreshed or they will be forcefully terminated (if the database enforces that) and upon the reconnection they will receive new credentials.

Many connection pools have a max connection lifetime to ensure your connections are reconnected at a regular interval. For example, we use pgBouncer and its server_lifetime defaults to 1 hour at which time the connections are preemptively reconnected. This saves us from any forceful disconnects because the connections are cycled at a much higher rate than the credentials.

With regard to RabbitMQ, which we also use (FYI, I made the Quarkus RabbitMQ connector and added Vault's temporary credential supports for it), it's the same situation. We cycle through connections quite regularly, way more often than the credentials.

To my eyes, this is how it is supposed to work because that's how Vault works.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

I am ignoring manual lease renewals here. A request for credentials from Vault automatically renews the lease on your current credentials, if they exist and are not expired, instead of creating new credentials. So essentially a request for renewed credentials is either a manual lease renewal or new credential creation in one request.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

@raffaelespazzoli Would it be correct to say that what you are looking for is for the extension to allow you to request/create a set of credentials and have the extension keep renewing the leases proactively until either the application is terminated or the credentials are deleted from the extension. Is that a correct assessment?

@raffaelespazzoli
Copy link
Author

@kdubb yes, but I'm open to options. There seems to be two approaches:

  1. the developer knows that credentials will expire and configures the dependent beans (database, rabbitmq connection) to refresh the credentials more often than the credentials TTL.
  2. the system understands the credential TTL and self-refreshes the credentials propagating the change to the dependent beans.

I think that ultimately both would work and if today 1. is what is supported so be it. Perhaps let's include some considerations in the docs about this issue so everyone is aware.

Option 2. seems like a better experience for developers though. So, if technically possible, it would be perhaps worth investing in it.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

I can see the appeal of (2) but it may require changes to the way other extensions work.

With (1) you request credentials whenever you need them and they are always valid for that attempt. Nothing else needs to be done to ensure credentials expire; which is kind of how things work in general.

For (2) some extensions would need to add calls to manually expire the credentials, lest they be refreshed for as long as the application is alive.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

The appeal of (2) that I see currently is specifically for other users of RabbitMQ. Our app cycles connections fairly often, but that's a function of our app.

RabbitMQ encourages a small number of very long lived connections. RabbitMQ doesn't generally use connection pools and doesn't have a connection lifetime to automatically cause stable reconnection. This use case would particularly benefit from an automatic lease renewal.

@kdubb
Copy link
Contributor

kdubb commented Aug 17, 2022

The confusion I was having initially is that none of this helps the KV case.

@vsevel
Copy link
Contributor

vsevel commented Aug 19, 2022

apparently this is not fully working: #66

@kdubb
Copy link
Contributor

kdubb commented Aug 19, 2022

I'm gonna play with this today. I think renewing the lease automatically should be relatively simple.

My only issue is that we might need a flag to determine if auto-renewal should happen or somehow base it on the type of credentials. I'm concerned about a situations where a consumer might request a "one off" set of credentials (e.g. migration?) that shouldn't be auto-renewed until the app exits.

@kdubb
Copy link
Contributor

kdubb commented Aug 19, 2022

As it stands, the "one-off" scenario might not be a huge deal because Quarkus's dynamic credentials support is based around long-lived credentials.

@vsevel
Copy link
Contributor

vsevel commented Aug 19, 2022

@raffaelespazzoli may be to get out of the generalities, may be we could focus on a particular use case that is useful to you. what would it be?

@raffaelespazzoli
Copy link
Author

@vsevel I think it would be enabling the use case with the data base / datasource credentials obtained via Vault. So understating when Vault is suggesting to refresh the credentials and having a way to roll out a new set of connections within a connection pool which use the new set of credentials. Ideally the rollout is incremental, that is: a new connection is created then an old one that is not being used is removed. and so on until no more old connection exists. This would minimize the impact to the code.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

@raffaelespazzoli I am just wondering how much this discussion is relevant to the vault extension vs agroal.
it should be the responsibility of the connection pool to provide a SPI interface for expiring credentials.
I see @kdubb you had added an expires-at property in the quarkus credentials provider.
so may be the information is already there for db, and that would be a matter of agroal using it?

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

What do you see as the issue with renewing expiring credentials automatically in the vault extension? If we think this is a worthy change over current behavior, then we're either going to implement it once in the extension or have every client implement it themselves.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

don't we need the connection pool to expire the connection and recreate it? or may be we can just extend the lease? is this what we are talking about?
we would still need the connection to be recycled by the pool before the max-ttl of the lease. ideally this would be triggered by the pool also, as opposed to the user having to set it in his configuration for the pool.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

cc @barreiro

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

I actually forgot the reason I added expires-at was because the RabbitMQ library supports expiring credentials itself. For that implementation it needs to know when the vended credentials expire. It recreates the connection exactly as we are talking about.

don't we need the connection pool to expire the connection and recreate it? or may be we can just extend the lease? is this what we are talking about?

I'm talking about extending the lease. The current implementation already extends the lease falling back to creating new credentials when it cannot do that.

The nice thing about this there is no connection expiration required. Vault won't update/delete the credentials as long as we extend the leases.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

Additionally, with auto-renewal in the extension, we could possibly treat all requests for credentials as a "create". E.g. Any configured connection expiration in connections pools would ask for new credentials on each new connection. This would ensure fresh credentials for correctly configured pools but allow other non-expiring configurations to work without changes.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

we would still need the connection to be recycled by the pool before the max-ttl of the lease

The only case this would be true is if lease renewal fails. Which I guess we have to think about.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

I guess I do not see the flow entirely. can we create a connection, add it to the pool and use it forever (e.g. for a longer time than the max-ttl on vault)?

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

According to the docs, leases can be renewed forever.

All dynamic secrets in Vault are required to have a lease. Even if the data is meant to be valid for eternity, a lease is required to force the consumer to check in routinely.

So we should be able to extend the leases indefinitely.

Also, reading through that made me realize we should be revoking our authentication token on shutdown, which will in turn revoke all leases issues with it immediately.

@raffaelespazzoli
Copy link
Author

So we should be able to extend the leases indefinitely. the secret engine configuration defines a max TTL (which may be infinite) and a default TTL (which may be infinite). the requester can request a lease with a given TTL (lesser than the max TTL) or get the default TTL. Here infinite, I think, means maxINT seconds in go-lang. All to say that you cannot always extend a lease to infinite. Security departments really want short lived credentials so in many cases secret engines will be configured with relative small maxTTLs.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

Well the docs specifically use the phrase "valid for eternity" 🤷‍♂️... It's pretty easy to test though.

If leases cannot be extended indefinitely then this discussion is moot.

If credentials are required to be invalidated then the connection pools have to check in periodically to get new credentials, which is the current behavior.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

If credentials are required to be invalidated then the connection pools have to check in periodically to get new credentials, which is the current behavior.

exactly.

my understanding was that you get an object valid for some time (lease ttl), that you can extend up to a maximum (max ttl). at that point, you will not be able to extend it further, and you will be forced to go through the re-acquisition process.
and that is what has been implemented for the authentication: every time we get the token, we check if is renewable and how much is left on the current lease. if we can we extend it, we do, else we go though a login again. we could do it asynch (in a renewal thread) so that if the app has no activity for some time we can avoid its retrieved token to expire, saving it the burden to call the vault extension on a regular basis, but once we reach the max ttl, no matter what, the app cannot use this token anymore.

the same logic applies to the dynamic credentials.

it does require the app to ask for credentials on a regular basis, and may be that is something we can do in a background thread. but if the purpose is to deactivate credentials after some max amount of time, it must be the responsibility for the app to know when to stop using the credentials and re-ask for them.

so the pool has to honor the "expires-at" property that we provide through the quarkus credentials provider (the cheap solution is to configure a fixed recycle period smaller than the lease ttl on the pool, to make sure a call will be made in time to extend the lease).

with the auto-extend, let's say that the lease ttl is 1 hour, and the max ttl is 1 day, we could extend automatically the credentials every hour, and let the pool expire its connections once a day, as opposed to every hour.

I do agree with you @raffaelespazzoli that security teams want several extensions within a limited max ttl to force apps to go through re-acquisition.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

I was under the impression max_ttl was the maximum amount of time a single lease period can be but as long as you keep refreshing, a lease can be held indefinitely.

I also agree that credentials should be short lived but I also see the value in refreshing the credentials automatically for clients that do not have that capability. This just may not be possible though.

@raffaelespazzoli
Copy link
Author

raffaelespazzoli commented Aug 22, 2022

@kdubb that depends on the secret engine. The database secret engine will mint new credentials every time a credentials are read (there is no way for it to know if it's a refresh to a new request). secret engines that create bearer tokens credentials such a jwt tokens also cannot be extended they have to be re-minted.

@vsevel
Copy link
Contributor

vsevel commented Aug 22, 2022

I was under the impression max_ttl was the maximum amount of time a single lease period can be but as long as you keep refreshing, a lease can be held indefinitely.

from what I remember and what I see in the code, no.
you get a lease, with its expiration time, and a renewable flag, like in this example.

if you approach the expiration time, you can renew/extend the lease. this will give a new expiration time, and again the flag renewable.

after some time (when you approach the max ttl), this will give a new expiration time and renewable=false. if you keep renewing, the expiration time will not change anymore, and renewable will stay false.

that is why I think we could renew/extend on behalf of the app, but not go beyond the max ttl.
we need to do some tests on the dynamic credentials engine to check on the behavior to see if that makes sense or not.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

@raffaelespazzoli

The database secret engine will mint new credentials every time a credentials are read (there is no way for it to know if it's a refresh to a new request).

I'm fairly sure this is not true. Renewing a lease is done via sys/leases/renew, it has nothing to do with the secret engine.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

Credentials are not affected by lease renewals... verified in Vault.

Generating new credentials with a read:

$ vault read database/creds/readonly
Key                Value
---                -----
lease_id           database/creds/readonly/ZIVsoSmpqVBDz6NOtJpIiJb8
lease_duration     1h
lease_renewable    true
password           GgKqzANRdZDC64-s5xZv
username           v-token-readonly-lhMTr5CtiqFeMnlm9n9u-1661200044

The role verified in PG:

$ docker exec learn-postgres psql -c "SELECT rolname FROM pg_roles;"
                     rolname                      
--------------------------------------------------
 ...
 v-token-readonly-lhMTr5CtiqFeMnlm9n9u-1661200044
(14 rows)

Renew the lease (notice it doesn't provide any new credentials):

$ vault lease renew database/creds/readonly/$LEASE_ID
Key                Value
---                -----
lease_id           database/creds/readonly/ZIVsoSmpqVBDz6NOtJpIiJb8
lease_duration     1h
lease_renewable    true

Same role still exists in PG:

docker exec learn-postgres psql -c "SELECT rolname FROM pg_roles;"
                     rolname                      
--------------------------------------------------
 v-token-readonly-lhMTr5CtiqFeMnlm9n9u-1661200044
 ...
(14 rows)

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

Max TTL is confirmed to be the maximum a lease can exist.

Generate new credentials with a read (1m ttl and 2m max_ttl):

vault read database/creds/readonly
Key                Value
---                -----
lease_id           database/creds/readonly/kMFToBm8g8lI3jRpaeUBM6RR
lease_duration     1m
lease_renewable    true
password           BGAl4VTpBmU3gQeU8-Aj
username           v-token-readonly-trWona2cFe2AgLMzevLB-1661201565

Renew lease after 1m 19s:

vault lease renew database/creds/readonly/kMFToBm8g8lI3jRpaeUBM6RR
WARNING! The following warnings were returned from Vault:

  * TTL of "1m" exceeded the effective max_ttl of "41s"; TTL value is capped
  accordingly

Key                Value
---                -----
lease_id           database/creds/readonly/kMFToBm8g8lI3jRpaeUBM6RR
lease_duration     41s
lease_renewable    true

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

I don't think there's much to do in the quarkus-vault extension. Implementing auto-renew in the extension itself would still require changes to clients.

We should document this better and maybe propose changes to Agroal & the Vert.x reactive clients.

@kdubb
Copy link
Contributor

kdubb commented Aug 22, 2022

@vsevel One thing to notice is that you can't trust the lease_renewable. I'm not even sure what it is for. If you attempt to renew after the lease has expired, you receive an HTTP error with lease not found.

@barreiro
Copy link

@vsevel setting max-lifetime to rotate the JDBC connections from the pool may not be the best approach in this case. I would recommend looking into ((AgroalDatasource) datasource).flush(GRACEFULL). If you can call that from the vault extension that would be ideal.

Agroal also has a way to flush connections that see an exception that is considered to be non-recoverable. As far as I can't tell, that is not triggered at the moment because a permission issue is not considered to be in that category.

@vsevel
Copy link
Contributor

vsevel commented Aug 24, 2022

@barreiro thanks for your input. the easiest solution (for the vault extension and quarkus) would be that the agroal credentials carry the expiration date, then the pool would be autonomous to handle the lifecycle of the connections.
specifically, it would a great addition if io.agroal.api.security.SimplePassword had an expiration date that we can set from the quarkus credentials provider.

@vsevel
Copy link
Contributor

vsevel commented Aug 24, 2022

@raffaelespazzoli in summary are you suggesting that the vault extension auto-extends automatically on behalf of the app any lease associated with all objects that were fetched by the app, without the app needing to interact the vault extension.
that would provide for the app a sense that the only ttl it needs to know is the max ttl.
this would mean keeping a cache of al leases in the extension, and getting an "lease" thread to handle their lifecycles (e.g. extending the leases in a timely fashion, notifying through a cdi event when they are about to expire, clearing the cache of expired leases).

@quarkiverse quarkiverse locked and limited conversation to collaborators Aug 31, 2022
@kdubb kdubb converted this issue into discussion #72 Aug 31, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants