Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver #24869

chaodaiG · 2022-01-13T18:29:13Z

What would you like to be added:

Switch Kubernetes External Secrets defined under https://github.com/kubernetes/test-infra/tree/master/config/prow/cluster to External Secrets Operator, by following KES is deprecated, migrate to ESO! external-secrets/kubernetes-external-secrets#864
Update instruction at https://github.com/kubernetes/test-infra/blob/master/prow/prow_secrets.md
Announce at https://github.com/kubernetes/test-infra/blob/master/prow/ANNOUNCEMENTS.md
Announce at kubernetes-sig-testing@googlegroups.com

Why is this needed:

As announced at external-secrets/kubernetes-external-secrets#864, Kubernetes External Secret is under maintenance mode right now, the new recommendation is to migrate over to External Secrets Operator.

There hasn't been any plan of turning down Kubernetes External Secret, so we might be fine for a while, until it's either incompatible with upcoming kubernetes versions, or newer features/bug fixes are only available from External Secrets Operator.

chaodaiG · 2022-01-13T18:29:53Z

/sig testing

howardjohn · 2022-02-22T04:45:44Z

Any thoughts on SealedSecret as an alternative? Seems more gitops friendly

chaodaiG · 2022-02-22T15:23:27Z

Any thoughts on SealedSecret as an alternative? Seems more gitops friendly

I can see that https://github.com/bitnami-labs/sealed-secrets is similar to KES(Kubernetes external secret) in terms of generating kubernetes secrets from a more secure custom resources, which is not the only purpose of KES.

The original purpose of KES was introduced to solve the problem:

Kubernetes secrets were manually applied to the cluster by kubectl apply from dev machine(s)
The secret might get lost if someone accidentally update/delete the value of the secret, or even the cluster was accidentally deleted

As KES syncs secrets from major secret manager providers into kubernetes cluster, so the recovery of kubernetes secret is as simple as re-applying ExternalSecret CR into the kubectl cluster, for example

test-infra/config/prow/cluster/kubernetes_external_secrets.yaml

Line 4 in d075174

apiVersion: kubernetes-client.io/v1

In short, SealedSecret is probably not the best replacement for KES

howardjohn · 2022-02-22T15:25:50Z

My think was that external secret is really only marginally different than a developer doing `kubectl apply`. Now they are just doing `gcloud secret create` - same opaqueness as apply it seems? With sealed secret the entire state lives in git

…

On Tue, Feb 22, 2022, 7:23 AM Chao Dai ***@***.***> wrote: Any thoughts on SealedSecret as an alternative? Seems more gitops friendly I can see that https://github.com/bitnami-labs/sealed-secrets is similar to KES(Kubernetes external secret) in terms of generating kubernetes secrets from a more secure custom resources, which is not the only purpose of KES. The original purpose of KES was introduced to solve the problem: - Kubernetes secrets were manually applied to the cluster by kubectl apply from dev machine(s) - The secret might get lost if someone accidentally update/delete the value of the secret, or even the cluster was accidentally deleted As KES syncs secrets from major secret manager providers into kubernetes cluster, so the recovery of kubernetes secret is as simple as re-applying ExternalSecret CR into the kubectl cluster, for example https://github.com/kubernetes/test-infra/blob/d075174d2b9bcbe5aac9391ff306426963d2a37d/config/prow/cluster/kubernetes_external_secrets.yaml#L4 In short, SealedSecret is probably not the best replacement for KES — Reply to this email directly, view it on GitHub <#24869 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXMCW37B2TE5SY53SFDU4OS7TANCNFSM5L4SVMCQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

chaodaiG · 2022-02-22T15:37:40Z

Agree with you that both need a manual operation of either kubectl apply and gcloud secret create, the gitops side is pretty similar though, one is SealedSecret CR and the other is ExternalSecret CR, they both can live in git. However SealedSecret can not solve the problem of a user accidentally modifying the secret from the source(previous applied configuration from k8s somewhat helps in this case, but is not capable of recovering from kubectl delete SealedSecret, or even the cluster was accidentally deleted). Using KES reduces such risk level as GCP secret manager version controls secrets, so:

if someone accidentally changed the value in GCP the secrets values can still be recovered
if the cluster was accidentally deleted, secrets can still be recovered by applying git source controlled KES CR

howardjohn · 2022-02-22T16:11:22Z

I feel like you could say the same about SealedSecret though...

if someone accidentally changed the value in ~~GCP~~K8s the secrets values can still be recovered (from git)
if the cluster was accidentally deleted, secrets can still be recovered by applying git source controlled ~~KES~~SealedSecret CR

Except for "cluster deleted" I guess you would need to keep the sealed secret keys (to decrypt if cluster is deleted) somewhere, so at some point you need to bootstrap...

Anyhow I have no strong agenda either way, just wanted to throw the idea out there

chaodaiG · 2022-02-22T16:35:44Z

Thank you @howardjohn , this is really great discussion!

I think I have misunderstood SealSecret to certain extent, now that with your explanation it's a bit more clear now. So SealedSecret:

Stores secrets as "plain text" in SealedSecret CR
The private key for decrypting the secrets is only available in k8s cluster
A secret can only be created by a user running kubeseal, which would use public keys from k8s cluster

This sounds pretty good, and I can see that other than the cluster being deleted scenario this is also pretty reliable. One thing not super clear from the documentation, is that when a user runs kubeseal <mysecret.json >mysealedsecret.json as of https://github.com/bitnami-labs/sealed-secrets#usage, kubeseal needs to fetch public key from the cluster, do you happen to know whether this was true, @howardjohn ?

howardjohn · 2022-02-22T16:37:47Z

@chaodaiG I think you can fetch the pubkey once and store in git. Then a dev experience to add a secret or update would be kubeseal mysecret --key pubkey.crt > mysealedsecret.json. Then postsubmit job kubectl applys to cluster; dev never needs access to the cluster.

But one concern would be that it said the key expires are 30d... so that may not work. I do not have much practical experience with sealed secrets so not 100% sure

k8s-triage-robot · 2022-05-23T17:34:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chaodaiG · 2022-06-06T14:45:41Z

https://kubernetes.slack.com/archives/C09QZ4DQB/p1654433983124889 is one of the reasons why this should be prioritized. TLDR: syncing build clusters tokens into prow is now a crucial piece in prow working with build cluster, KES flakiness would break this and cause prow stop working with the build cluster

/remove-lifecycle stale

Prow now authenticates with build clusters with tokens that are valid for 2 days. The token is refreshed by a prow job https://prow.k8s.io/?type=periodic&job=ci-test-infra-gencred-refresh-kubeconfig and stores in GCP secret manager, KES is responsible for syncing the secrets into prow. Have observed KES being flaky at time to time, generally more than 10 days after the KES pods started running. See kubernetes#24869 (comment) This is a temporary solution aim to mitigate the issue of long running KES pods

k8s-triage-robot · 2022-09-04T15:38:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chaodaiG · 2022-09-04T15:50:05Z

/remove-lifecycle stale

chaodaiG · 2022-11-08T15:32:44Z

Build cluster token failed to sync issue happened again https://kubernetes.slack.com/archives/C7J9RP96G/p1667877096344719, this is not good.

/assign

dims · 2022-11-08T17:08:22Z

uh-oh. thanks @chaodaiG

jimangel · 2022-11-09T17:17:18Z

I don't want to derail / delay efforts going on in #27932, but has something like https://secrets-store-csi-driver.sigs.k8s.io/ been considered? We could use that with the GCP provider today and there's support for AWS, Azure, and Vault providers if we need to change.

Using the CSI driver + Google Secrets Manager (provider) would allow us to leverage Workload Identity for IAM secret access. I believe we'd also have better insight into access / auditing.

I know GCP costs are a concern, the pricing page indicates that it would be $0.06 per secret/per location and $0.03 per 10,000 access operations (and $0.05 per rotation). I don't think the costs would be astronomical, but it would be worth doing a closer inspection if we decided to pivot towards this solution.

I'm happy to demo / help move forward if we want to go that direction, however I understand the urgency and value the progress already made.

Edit: It looks like External Secrets Operator also allows us to use Secret Manager + WID if we'd like: https://external-secrets.io/v0.6.1/provider/google-secrets-manager/. I think it would come down to the easier to maintain, and more active (future-proof-ish?), solution.

chaodaiG · 2022-11-09T17:54:23Z

hi @jimangel , it's not a derail at all. iirc csi driver for GCP was in its super early release cycle when we decided to adopt Kubernetes External Secret. The proposal of transition from Kubernetes External Secret to External Secret Operator was pretty much a lazy action based on the recommendation from Kubernetes External Secret.

In terms of the cost, we don't have that many secrets and lots of access operations so I wouldn't be too worried about it.

I would be glad to take another look at that csi driver for GCP since it's ready now, will do a quick evaluation by myself in terms of operational and maintenance perspective, and will get your thoughts if there is any question come up.

chaodaiG · 2022-11-10T20:33:17Z

Had an extensive and wonderful offline discussion with @jimangel , and here is what we agreed on:

External Secret Operator works as a central proxy service, which uses a dedicated k8s cluster SA that is WI binded with a GCP SA, this GCP SA is given GCP secret manager permissions to all secrets that are used in the k8s cluster, and these secrets are synced one way into the k8s cluster, and all pods from the cluster are allowed to use any of these secrets as long as they are in the same namespace.
CSI driver works by using the authentication methods from the pods that need to mount the secrets. For GCP this means the workload identity binded cluster SA on the pods are used for authenticating with GCP secret manager.
Technically speaking CSI driver is more secure than External Secret Operator as a prowjob pod will only be able to use secrets that the SA is allowed to(we don't have a fine grained separation of different team using different SAs yet, so this is more like future proof)
Other than security boundaries, one benefit of using CSI driver is that it avoids a GCP SA from Prow service cluster be given GCP secret manager permissions in other projects, and as a result migrating or recovering would be much easier(there will be no IAM changes required from users projects)
One "downside", is that In terms of authentication it only supports workload identity for GCP, so jobs that are not using workload identity will not be able to use this feature
The other "downside/WAI", is that pod would failed to start when a secret is not available. This is expected for a Prowjob, and imo is even better than it failed due to using stale secrets that were synced from 7 days ago. For Prow services we will need to make sure that all the kubeconfig secrets are stored in the GCP project where Prow is in, to avoid the case where a user provided kubeconfig secrets being deleted in GCP causing Prow downtime.

With all those being said, I'm convinced that CSI driver is better suited for our use case. Kudos to @jimangel , thank you so much for the discussion, I felt I have learned a lot!

@BenTheElder @spiffxp @dims @ameukam @cjwagner , WDYT?

jimangel · 2022-11-10T20:37:47Z

Awesome write up @chaodaiG! Agreed, it was fun chatting.

One "downside", is that In terms of authentication it only supports workload identity for GCP, so jobs that are not using workload identity will not be able to use this feature

There are alternatives for authentication outlined here: https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp/blob/main/docs/authentication.md but the general consensus is to use WI if at all possible.

dims · 2022-11-10T20:53:56Z

@chaodaiG @jimangel Nice! +100

cjwagner · 2022-11-10T22:04:45Z

Sounds like a nice improvement to me!

ameukam · 2022-11-11T00:37:11Z

@chaodaiG @jimangel Nice idea!

Let's try it.

ameukam · 2022-11-11T00:39:30Z

@jimangel So if the secret is mounted as a volume in the pod, how this is isolated from the other pods running in the same node ?

jimangel · 2022-11-11T02:53:13Z

So if the secret is mounted as a volume in the pod, how this is isolated from the other pods running in the same node?

I believe the threat model is the same as before (or more secure). Access today is segmented by namespace (k8s "built-in" secrets). With the CSI driver, access is only permitted when all conditions are met:

A namespace scoped SecretProviderClass (CRD) defining access exists (This directs the mount to the appropriate GCP project / secret).
GCP IAM bindings to a Service Account / Workload Identity in GCP to access a specific secret resource(s).

NOTE: Any workload/job/pod in a shared k8s namespace could use the same service account to access the permitted secret(s)/SecretProviderClass. That should be no different than any pod, in the same namespace, accessing the same secret, today.

As far as what access pods on the same node have (isolation)... If any pod/actor can mount/escape a pod to access node-level storage layers; you are as screwed as you'd be if you were using Kubernetes "built-in" secrets. 😅

Let me know if I misunderstood what you're asking @ameukam!

Edit: There are a couple "security considerations" called out in the repo itself.

jimangel · 2023-01-30T16:24:23Z

Hey all! Checking in here, what would be the next steps @chaodaiG? Should we try a small-scale demo or is there somewhere to test?

chaodaiG · 2023-01-30T17:47:18Z

@cjwagner could you please take a look

k8s-triage-robot · 2023-04-30T18:37:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

michelle192837 · 2023-10-03T16:30:34Z

This is open for contribution if anyone's willing to do so. (We do keep seeing infrequent errors or flakes that require KES to be restarted, so while it's not urgent it'd be helpful!).

ameukam · 2023-10-03T16:35:38Z

@michelle192837 Assuming this need to be deployed on a Google-owned GKE cluster, one action would be to create a SA with workload identity so we can use it to retrieve the secrets from the secret manager. I think it only can be done by EngProd.

chaodaiG added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 13, 2022

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 13, 2022

k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 13, 2022

BenTheElder mentioned this issue May 21, 2022

REQUEST: Archive kubernetes-sigs/k8s-gsm-tools kubernetes/org#3432

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 23, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 6, 2022

chaodaiG mentioned this issue Jun 6, 2022

Restarts KES while deploying prow #26493

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2022

chaodaiG mentioned this issue Nov 8, 2022

Failed ci: Pod can not be created: create pod test-pod ... in cluster default: Unauthorized #27919

Closed

k8s-ci-robot assigned chaodaiG Nov 8, 2022

BenTheElder mentioned this issue Nov 8, 2022

move some k/release prow jobs to the community cluster #27921

Closed

chaodaiG mentioned this issue Nov 8, 2022

Add External Secret Operator configs #27932

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2023

This comment was marked as off-topic.

Sign in to view

michelle192837 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Sep 12, 2023

cjwagner changed the title ~~Migrate from Kubernetes External Secrets to External Secrets Operator~~ Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver #24869

Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver #24869

chaodaiG commented Jan 13, 2022

chaodaiG commented Jan 13, 2022

howardjohn commented Feb 22, 2022

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022 via email

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022

k8s-triage-robot commented May 23, 2022

chaodaiG commented Jun 6, 2022

k8s-triage-robot commented Sep 4, 2022

chaodaiG commented Sep 4, 2022

chaodaiG commented Nov 8, 2022

dims commented Nov 8, 2022

jimangel commented Nov 9, 2022 •

edited

Loading

chaodaiG commented Nov 9, 2022

chaodaiG commented Nov 10, 2022

jimangel commented Nov 10, 2022

dims commented Nov 10, 2022

cjwagner commented Nov 10, 2022

ameukam commented Nov 11, 2022

ameukam commented Nov 11, 2022

jimangel commented Nov 11, 2022 •

edited

Loading

jimangel commented Jan 30, 2023

chaodaiG commented Jan 30, 2023

k8s-triage-robot commented Apr 30, 2023

This comment was marked as off-topic.

michelle192837 commented Oct 3, 2023

ameukam commented Oct 3, 2023

Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver #24869

Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver #24869

Comments

chaodaiG commented Jan 13, 2022

chaodaiG commented Jan 13, 2022

howardjohn commented Feb 22, 2022

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022 via email

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022

chaodaiG commented Feb 22, 2022

howardjohn commented Feb 22, 2022

k8s-triage-robot commented May 23, 2022

chaodaiG commented Jun 6, 2022

k8s-triage-robot commented Sep 4, 2022

chaodaiG commented Sep 4, 2022

chaodaiG commented Nov 8, 2022

dims commented Nov 8, 2022

jimangel commented Nov 9, 2022 • edited Loading

chaodaiG commented Nov 9, 2022

chaodaiG commented Nov 10, 2022

jimangel commented Nov 10, 2022

dims commented Nov 10, 2022

cjwagner commented Nov 10, 2022

ameukam commented Nov 11, 2022

ameukam commented Nov 11, 2022

jimangel commented Nov 11, 2022 • edited Loading

jimangel commented Jan 30, 2023

chaodaiG commented Jan 30, 2023

k8s-triage-robot commented Apr 30, 2023

This comment was marked as off-topic.

michelle192837 commented Oct 3, 2023

ameukam commented Oct 3, 2023

jimangel commented Nov 9, 2022 •

edited

Loading

jimangel commented Nov 11, 2022 •

edited

Loading