Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic tidy of expired issuers #17823

Merged
merged 6 commits into from
Nov 10, 2022

Conversation

cipherboy
Copy link
Contributor

@cipherboy cipherboy commented Nov 4, 2022

To aid PKI users like Consul, which periodically rotate intermediates, and provided a little more consistency with older versions of Vault which would silently (and dangerously!) replace the configured CA on root/intermediate generation, we introduce an automatic tidy of expired issuers.

This includes a longer safety buffer (1 year) and logging of the relevant issuer information prior to deletion (certificate contents, key ID, and issuer ID/name) to allow admins to recover this value if desired, or perform further cleanup of keys.

From my PoV, removal of the issuer is thus a relatively safe operation compared to keys (which I do not feel comfortable removing) as they can always be re-imported if desired. Additionally, this is an opt-in tidy operation, not enabled by default. Lastly, most major performance penalties comes with lots of issuers within the mount, not as much large numbers of keys (as only new issuer creation/import operations are affected, unlike LIST /issuers which is a public, unauthenticated endpoint).

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>


This obviously needs:

  • Tests
  • Changelog
  • Docs

But I'm curious to get people's initial opinions on this change.

Copy link
Contributor

@kitography kitography left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting somewhere!

fields["tidy_expired_issuers"] = &framework.FieldSchema{
Type: framework.TypeBool,
Description: `Set to true to automatically remove expired issuers
past the issuer_safety_buffer. No keys will be removed as part of this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we need to add something to remove unused keys eventually?

Copy link
Contributor Author

@cipherboy cipherboy Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'm punting on that particular problem. Issuers make the mount slow (chain building, CRL building, ...), keys are always fetched by reference and don't make the mount slow. Obviously they'll be some nominal storage costs, but operators should be able to sort this out themselves eventually, if they care about the storage costs and want to remove old keys.

But even then, the number of stored leaves should exceed the stored issuer keys by several orders of magnitude.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the logging provide enough information for an operator to delete the key after an auto tidy if they want to?

Copy link
Contributor Author

@cipherboy cipherboy Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see the log message below:

https://github.com/hashicorp/vault/pull/17823/files#diff-4d3f13dea8cac8ec87e6f15d17a6bbbe798129d171059e492303070ce3903d9cR518

It contains the key ID from the issuer, allowing them to make a call from the log messages as to whether to remove that key or not (and the system will prevent that if its in use by other still-present issuers).

In the general case of tidying keys, we don't necessarily presently store when/how a key was created, so we don't know if there's an outstanding, say, ICA issuance going on that we don't know about, or if it was from a deleted issuer.

We could later remove now-unused keys from issuers if we wanted to, but it makes recovery much harder -- you can't presently re-import deleted keys if you didn't have an external backup, and we wouldn't want to log them (unlike issuers) for obvious reasons...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't want to automatically remove keys, but I don't want operators to get in a case where they'd like to remove an old key but are confused which issuer it once applied to.

Copy link
Contributor Author

@cipherboy cipherboy Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you recommend something else? Another migration to associate issuers on keys, and/or a tidy step to associate a since-deleted issuer's identifier(s) back to the key?

Two way association is only beneficial in this case; in general nothing in the PKI code (presently) goes from key->issuer, except the key deletion code.

The problem is a number of issuers won't necessarily have meaningful names, so logging the full certificate is perhaps the best approach here -- unless we do want to retain the issuer's certificate in the system and associate removed issuers' certificates with the key entry. Then they could read the key and see the list of since-deleted issuers which used that key... It wouldn't impact perf anymore but still wouldn't be ideal to interact with (as they'd likely have to parse the certificate to understand which one it was).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'd just be happy if the key_id was in the output of the issuer tidy, so they could grep it out and delete it by hand.

builtin/logical/pki/fields.go Outdated Show resolved Hide resolved
builtin/logical/pki/path_tidy.go Show resolved Hide resolved
builtin/logical/pki/path_tidy_test.go Show resolved Hide resolved
To aid PKI users like Consul, which periodically rotate intermediates,
and provided a little more consistency with older versions of Vault
which would silently (and dangerously!) replace the configured CA on
root/intermediate generation, we introduce an automatic tidy of expired
issuers.

This includes a longer safety buffer (1 year) and logging of the
relevant issuer information prior to deletion (certificate contents, key
ID, and issuer ID/name) to allow admins to recover this value if
desired, or perform further cleanup of keys.

From my PoV, removal of the issuer is thus a relatively safe operation
compared to keys (which I do not feel comfortable removing) as they can
always be re-imported if desired. Additionally, this is an opt-in tidy
operation, not enabled by default. Lastly, most major performance
penalties comes with lots of issuers within the mount, not as much
large numbers of keys (as only new issuer creation/import operations are
affected, unlike LIST /issuers which is a public, unauthenticated
endpoint).

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Copy link
Collaborator

@sgmiller sgmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small feedback

fields["tidy_expired_issuers"] = &framework.FieldSchema{
Type: framework.TypeBool,
Description: `Set to true to automatically remove expired issuers
past the issuer_safety_buffer. No keys will be removed as part of this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the logging provide enough information for an operator to delete the key after an auto tidy if they want to?

@@ -463,6 +470,15 @@ Defaults to 72 hours.`,
Default: int(defaultTidyConfig.SafetyBuffer / time.Second), // TypeDurationSecond currently requires defaults to be int
}

fields["issuer_safety_buffer"] = &framework.FieldSchema{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about issuer_tidy_grace_period? This doesn't as directly indicate safety from what, and that buffer is time related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mirrors naming of the present field, safety_buffer above -- do you recommend breaking that pattern?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, didn't notice the existing field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like something named _duration ideally, but I think it'd require a soft-migration for the other field (and use say expired_issuer_retention_duration and expired_leaf_retention_duration+safety_buffer) -- this'd let the CLI parse it correctly as a duration-typed field.

We have time to revisit doing that before 1.13 GAs as this won't be backported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just introduce the two new fields and switch the docs. Add a note about the previous field still existing and being deprecated. But that can wait for another PR.

builtin/logical/pki/path_tidy.go Outdated Show resolved Hide resolved
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants