Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable deleting API objects even when storage-level decryption is not working properly #86489

Open
immutableT opened this issue Dec 20, 2019 · 9 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/auth Categorizes an issue or PR as relevant to SIG Auth.

Comments

@immutableT
Copy link
Contributor

What happened:
Users are unable to delete secrets when kms provider (which originally encrypted such secrets) can no longer decrypt them. There may be several reasons why kms provider would fail to decrypt secrets, the most common one is that users deleted/disabled the version of the key that was used to originally encrypt secrets.

What you expected to happen:
Secrets to be deleted.

How to reproduce it (as minimally and precisely as possible):

  1. Setup a cluster with a kms provider of your choice.
  2. Create a secret, validate that the secret is encrypted
  3. Reboot the cluster (this is required to clear the cache of Key Encryption Keys).
  4. Disable the key or key version that was used by the provider to encrypt the key in step 2.
  5. Attempt to delete the secret. You should get an internal error that wraps the kms-plugin's specific error (the error will vary based on the plugin).

Note: the issue is probably not unique to kms provider, but will manifest itself in any provider when the key that was used to encrypt the secret is no longer available.
Anything else we need to know?:
I believe that the cause of this behaviour is that fact that objects' metadata needs to be updated prior to deletion, which implies the need to transform from storage. However, such transformation is not possible due to the unavailability of the KEK.
To address this issue we would need to read the metadata of the object (while processing a delete) even if the KEK is not available - after all, during a delete, we don't care about the payload.
Therefore, to enable this scenario we would need to move away from encrypting the whole object. Concretely, parts of the metadata should remain in cleartext.
I realize that this opens-up a lot of questions, and I could follow this issue up with a KEP.

Environment:

  • Kubernetes version (use kubectl version):
    1.14

/cc @liggitt @mikedanese @enj
/sig auth

@immutableT immutableT added the kind/bug Categorizes issue or PR as related to a bug. label Dec 20, 2019
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Dec 20, 2019
@liggitt liggitt added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Dec 20, 2019
@liggitt
Copy link
Member

liggitt commented Dec 20, 2019

It is expected that inability to read from storage (including any required decryption from storage) will block access to the object (including read, write, and delete requests).

The deletion path requires checking things like finalizers, and finalizing controllers can take arbitrary actions via the API (reading and updating the object).

Adding support for partial object encryption would be an enhancement, not really a bug fix, and would definitely need a proposal.

@immutableT
Copy link
Contributor Author

I will work on a KEP for this.

@liggitt liggitt changed the title Unable to delete a secret when kms provider fails to decrypt it. Enable deleting API objects even when storage-level decryption is not working properly Dec 20, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 18, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Mar 15, 2023
@k8s-ci-robot k8s-ci-robot reopened this Mar 15, 2023
@k8s-ci-robot
Copy link
Contributor

@enj: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 15, 2023
@enj
Copy link
Member

enj commented Mar 15, 2023

/triage accepted
/assign @stlaz @ibihim

This was discussed at a high level in one of the last SIG auth meetings. The general idea is to propose a KEP (Standa and Krzysztof have been volunteered 😛) that would add two capabilities:

  1. Have the API server return which resource is failing to decrypt/decode in a structured way as part of the returned API status error
  2. Create a new field in delete options that allows the expression of "I want to delete this IFF there is a decrypt/decode error" (maybe with some form of dry run support?)

This would enable an external tool (kubectl plugin?) to be created (which could also be a SIG Auth sub-project) that allows an end user with sufficient access to the Kubernetes API (likely a cluster admin) to recover a cluster in which some subset of items cannot be decrypted/decoded (though maybe the tool could attempt partial recovery by asking for data from the watch cache?). An important aspect is that the determination of "is the bad state permanent/terminal" would be up to the end user (instead of the API server making decisions on behalf of the user).

Currently, direct access to etcd is required to delete items in this state.

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Mar 15, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 15, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
Status: In Progress
Status: New KEP
Development

No branches or pull requests

8 participants