Enable deleting API objects even when storage-level decryption is not working properly #86489

immutableT · 2019-12-20T18:42:33Z

What happened:
Users are unable to delete secrets when kms provider (which originally encrypted such secrets) can no longer decrypt them. There may be several reasons why kms provider would fail to decrypt secrets, the most common one is that users deleted/disabled the version of the key that was used to originally encrypt secrets.

What you expected to happen:
Secrets to be deleted.

How to reproduce it (as minimally and precisely as possible):

Setup a cluster with a kms provider of your choice.
Create a secret, validate that the secret is encrypted
Reboot the cluster (this is required to clear the cache of Key Encryption Keys).
Disable the key or key version that was used by the provider to encrypt the key in step 2.
Attempt to delete the secret. You should get an internal error that wraps the kms-plugin's specific error (the error will vary based on the plugin).

Note: the issue is probably not unique to kms provider, but will manifest itself in any provider when the key that was used to encrypt the secret is no longer available.
Anything else we need to know?:
I believe that the cause of this behaviour is that fact that objects' metadata needs to be updated prior to deletion, which implies the need to transform from storage. However, such transformation is not possible due to the unavailability of the KEK.
To address this issue we would need to read the metadata of the object (while processing a delete) even if the KEK is not available - after all, during a delete, we don't care about the payload.
Therefore, to enable this scenario we would need to move away from encrypting the whole object. Concretely, parts of the metadata should remain in cleartext.
I realize that this opens-up a lot of questions, and I could follow this issue up with a KEP.

Environment:

Kubernetes version (use kubectl version):
1.14

/cc @liggitt @mikedanese @enj
/sig auth

The text was updated successfully, but these errors were encountered:

liggitt · 2019-12-20T18:49:53Z

It is expected that inability to read from storage (including any required decryption from storage) will block access to the object (including read, write, and delete requests).

The deletion path requires checking things like finalizers, and finalizing controllers can take arbitrary actions via the API (reading and updating the object).

Adding support for partial object encryption would be an enhancement, not really a bug fix, and would definitely need a proposal.

immutableT · 2019-12-20T18:54:20Z

I will work on a KEP for this.

fejta-bot · 2020-03-19T21:03:10Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-04-18T21:45:20Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-05-18T22:30:24Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-05-18T22:30:38Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-03-15T12:45:22Z

@enj: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enj · 2023-03-15T15:37:02Z

/triage accepted
/assign @stlaz @ibihim

This was discussed at a high level in one of the last SIG auth meetings. The general idea is to propose a KEP (Standa and Krzysztof have been volunteered 😛) that would add two capabilities:

Have the API server return which resource is failing to decrypt/decode in a structured way as part of the returned API status error
Create a new field in delete options that allows the expression of "I want to delete this IFF there is a decrypt/decode error" (maybe with some form of dry run support?)

This would enable an external tool (kubectl plugin?) to be created (which could also be a SIG Auth sub-project) that allows an end user with sufficient access to the Kubernetes API (likely a cluster admin) to recover a cluster in which some subset of items cannot be decrypted/decoded (though maybe the tool could attempt partial recovery by asking for data from the watch cache?). An important aspect is that the determination of "is the bad state permanent/terminal" would be up to the end user (instead of the API server making decisions on behalf of the user).

Currently, direct access to etcd is required to delete items in this state.

k8s-triage-robot · 2024-03-14T16:05:52Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

immutableT added the kind/bug Categorizes issue or PR as related to a bug. label Dec 20, 2019

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Dec 20, 2019

liggitt added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 20, 2019

liggitt changed the title ~~Unable to delete a secret when kms provider fails to decrypt it.~~ Enable deleting API objects even when storage-level decryption is not working properly Dec 20, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 18, 2020

k8s-ci-robot closed this as completed May 18, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Mar 15, 2023

k8s-ci-robot reopened this Mar 15, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 15, 2023

k8s-ci-robot assigned ibihim Mar 15, 2023

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Mar 15, 2023

k8s-ci-robot assigned stlaz Mar 15, 2023

k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 15, 2023

stlaz mentioned this issue Mar 27, 2023

KEP-3926: handling undecryptable resources kubernetes/enhancements#3927

Merged

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable deleting API objects even when storage-level decryption is not working properly #86489

Enable deleting API objects even when storage-level decryption is not working properly #86489

immutableT commented Dec 20, 2019

liggitt commented Dec 20, 2019

immutableT commented Dec 20, 2019

fejta-bot commented Mar 19, 2020

fejta-bot commented Apr 18, 2020

fejta-bot commented May 18, 2020

k8s-ci-robot commented May 18, 2020

k8s-ci-robot commented Mar 15, 2023

enj commented Mar 15, 2023

k8s-triage-robot commented Mar 14, 2024

Enable deleting API objects even when storage-level decryption is not working properly #86489

Enable deleting API objects even when storage-level decryption is not working properly #86489

Comments

immutableT commented Dec 20, 2019

liggitt commented Dec 20, 2019

immutableT commented Dec 20, 2019

fejta-bot commented Mar 19, 2020

fejta-bot commented Apr 18, 2020

fejta-bot commented May 18, 2020

k8s-ci-robot commented May 18, 2020

k8s-ci-robot commented Mar 15, 2023

enj commented Mar 15, 2023

k8s-triage-robot commented Mar 14, 2024