Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to deploy driver to k8s cluster: error listing CSINodes: the server could not find the requested resource #420

Closed
MeghanaSrinath opened this issue Oct 25, 2019 · 8 comments

Comments

@MeghanaSrinath
Copy link

I'm trying to set up CMEK in my cluster as per the details mentioned here:
https://cloud.google.com/kubernetes-engine/docs/how-to/dynamic-provisioning-cmek#dynamically_provision_an_encrypted

I have deployed the Compute Engine Persistent Disk CSI Driver to my cluster as per the steps mentioned in:
https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/master/docs/kubernetes/development.md

Once I run the deploy-driver.sh script, at the end, I get the below output:

+ kubectl apply -v=2 -f /tmp/gcp-compute-persistent-disk-csi-driver-specs-generated.yaml
serviceaccount/csi-gce-pd-controller-sa created
serviceaccount/csi-gce-pd-node-sa created
clusterrole.rbac.authorization.k8s.io/csi-gce-pd-attacher-role created
clusterrole.rbac.authorization.k8s.io/csi-gce-pd-provisioner-role created
clusterrole.rbac.authorization.k8s.io/csi-gce-pd-resizer-role created
clusterrolebinding.rbac.authorization.k8s.io/csi-gce-pd-controller-attacher-binding created
clusterrolebinding.rbac.authorization.k8s.io/csi-gce-pd-controller-provisioner-binding created
clusterrolebinding.rbac.authorization.k8s.io/csi-gce-pd-resizer-binding created
statefulset.apps/csi-gce-pd-controller created
daemonset.apps/csi-gce-pd-node created
F1024 06:41:32.589473   27307 helpers.go:114] unable to recognize "/tmp/gcp-compute-persistent-disk-csi-driver-specs-generated.yaml": no matches for kind "PriorityClass" in version "scheduling.k8s.io/v1"
unable to recognize "/tmp/gcp-compute-persistent-disk-csi-driver-specs-generated.yaml": no matches for kind "PriorityClass" in version "scheduling.k8s.io/v1"

PFA the gcp-compute-persistent-disk-csi-driver-specs-generated.yaml file.

gcp-compute-persistent-disk-csi-driver-specs-generated.txt

I tried to change the apiVersion from scheduling.k8s.io/v1 to scheduling.k8s.io/v1beta1 and re-ran the deploy-driver.sh script. This time, the script successfully ran. Both priorityclass.scheduling.k8s.io/csi-gce-pd-controller and priorityclass.scheduling.k8s.io/csi-gce-pd-node were created.

I have then created the key/key ring and have created the below storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-gce-pd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-standard
  disk-encryption-kms-key: "projects/xx/locations/us-central1/keyRings/xx/cryptoKeys/xx

Below is the YAML for the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: encrypt-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: csi-gce-pd
  resources:
    requests:
      storage: 5Gi

However, when i apply the PVC YAML, it fails with the below error and PVC status will be at pending:

Name:          encrypted-pvc
Namespace:     ethan
StorageClass:  csi-gce-pd
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"volume.beta.kubernetes.io/storage-class":"csi-gce-pd"},"nam...
               volume.beta.kubernetes.io/storage-class: csi-gce-pd
               volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason                Age                From                                                                                Message
  ----     ------                ----               ----                                                                                -------
  Normal   Provisioning          15s (x5 over 30s)  pd.csi.storage.gke.io_csi-gce-pd-controller-0_5c51fedd-8092-4c71-aca9-5a13b566bb8a  External provisioner is provisioning volume for claim "ethan/encrypted-pvc"
  Warning  ProvisioningFailed    15s (x5 over 30s)  pd.csi.storage.gke.io_csi-gce-pd-controller-0_5c51fedd-8092-4c71-aca9-5a13b566bb8a  failed to provision volume with StorageClass "csi-gce-pd": error generating accessibility requirements: error listing CSINodes: the server could not find the requested resource
  Normal   ExternalProvisioning  3s (x4 over 30s)   persistentvolume-controller                                                         waiting for a volume to be created, either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator

Can please someone know why is this happening and what can be done to resolve this.

Kubectl version:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-gke.0", GitCommit:"569511c9540f78a94cc6a41d895c382d0946c11a", GitTreeState:"clean", BuildDate:"2019-08-21T23:28:44Z", GoVersion:"go1.11.13b4", Compiler:"gc", Platform:"linux/amd64"}
@msau42
Copy link
Contributor

msau42 commented Oct 25, 2019

Hi, thanks for reporting the issue. Version 0.5.0 of the CSI driver requires Kubernetes 1.14+. Can you try upgrading your cluster to 1.14?

@MeghanaSrinath
Copy link
Author

MeghanaSrinath commented Oct 25, 2019

I upgraded the cluster version to 1.14.7-gke.10.
kubectl version:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.7-gke.10", GitCommit:"8cea5f8ae165065f0d35e5de5dfa2f73617f02d1", GitTreeState:"clean", BuildDate:"2019-10-05T00:08:10Z", GoVersion:"go1.12.9b4", Compiler:"gc", Platform:"linux/amd64"}

However, when I re-apply the storage class and PVC YAML files, its giving a different error:

Name:          encrypted-pvc
Namespace:     gce-pd-csi-driver
StorageClass:  csi-gce-pd
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{"volume.beta.kubernetes.io/storage-class":"csi-gce-pd"},"nam...
               volume.beta.kubernetes.io/storage-class: csi-gce-pd
               volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason                Age               From                                                                                Message
  ----     ------                ----              ----                                                                                -------
  Normal   Provisioning          4s (x3 over 15s)  pd.csi.storage.gke.io_csi-gce-pd-controller-0_5c51fedd-8092-4c71-aca9-5a13b566bb8a  External provisioner is provisioning volume for claim "gce-pd-csi-driver/encrypted-pvc"
  Normal   ExternalProvisioning  2s (x2 over 15s)  persistentvolume-controller                                                         waiting for a volume to be created, either by external provisioner "pd.csi.storage.gke.io" or manually created by system administrator
  Warning  ProvisioningFailed    0s (x3 over 11s)  pd.csi.storage.gke.io_csi-gce-pd-controller-0_5c51fedd-8092-4c71-aca9-5a13b566bb8a  failed to provision volume with StorageClass "csi-gce-pd": rpc error: code = Internal desc = CreateVolume failed to create single zonal disk "pvc-1524bf19-f6f1-11e9-a706-4201ac100007": failed to insert zonal disk: unkown Insert disk error: googleapi: Error 400: Invalid resource usage: 'Cloud KMS error when using key projects/acn-devopsgcp/locations/us-central1/keyRings/testkeyring1/cryptoKeys/testkey1: Permission 'cloudkms.cryptoKeyVersions.useToEncrypt' denied on resource 'projects/acn-devopsgcp/locations/us-central1/keyRings/testkeyring1/cryptoKeys/testkey1' (or it may not exist).'., invalidResourceUsage

I have given the below roles to the service account and the key's resource identifier is also correct.
Cloud KMS CryptoKey Encrypter/Decrypter
Cloud KMS CryptoKey Encrypter
Cloud KMS CryptoKey Decrypter

@davidz627
Copy link
Contributor

@MeghanaSrinath, thanks for trying this out and sorry it has not gone smoothly for you so far.

Which service account did you give those roles to? The role roles/cloudkms.cryptoKeyEncrypterDecrypter should be granted to the "Compute Engine Service Agent" service-[PROJECT_NUMBER]@compute-system.iam.gserviceaccount.com on the project that your disks are being provisioned on.

I tried a couple different things and the only way I could reproduce the error you got was by removing the role from my compute service agent. The error was resolved when I added the role back.

@MeghanaSrinath
Copy link
Author

@davidz627 Thank you for looking into this issue.
I have created a service account -platform@acn-devopsgcp.iam.gserviceaccount.com and have given the roles to this as well.

gcloud projects get-iam-policy acn-devopsgcp  \
--flatten="bindings[].members" \
--format='table(bindings.role)' \
--filter="bindings.members:platform@acn-devopsgcp.iam.gserviceaccount.com"

This gives me the below roles, in which 'roles/cloudkms.cryptoKeyEncrypterDecrypter' role is also included:

ROLE
projects/acn-devopsgcp/roles/gcp_compute_persistent_disk_csi_driver_custom_role
roles/cloudkms.cryptoKeyDecrypter
roles/cloudkms.cryptoKeyEncrypter
roles/cloudkms.cryptoKeyEncrypterDecrypter
roles/compute.admin
roles/compute.storageAdmin
roles/container.admin
roles/iam.roleAdmin
roles/iam.securityAdmin
roles/iam.serviceAccountUser
roles/storage.admin

Is there a restriction that only the compute engine default service account [service-[PROJECT_NUMBER]@compute-system.iam.gserviceaccount.com] should be used for these scenarios? Because I have used a different service account.

@davidz627
Copy link
Contributor

@MeghanaSrinath thanks for the extra info this really helps.

The service account you've created (platform@) doesn't actually need the roles/cloudkms roles to function (assuming this account is just used as the one given to the pd driver).

Yes, the compute engine default service account is a special service account that will be used to encrypt/decrypt the disks as they are created or attached and thus requires the cloudkms roles.

You can think of it like this - In this case the PD Driver does not ever hold onto your key or perform any encryption or decryption on your behalf so it doesn't require permissions - you just give it a key "reference" that it then passes to GCE that eventually gets your key and performs the crypto actions on your behalf.

@meghanabsrinath
Copy link

@davidz627 so does this means that the default service account should have the above role instead of my 'platform@' service account? I actually have a cluster which used the 'platform@' service account during provisioning and the PVCs in the cluster are also dynamically provisioned during cluster creation. Hence, I wanted to use the same service account for the KMS encryption of the PVCs.
So as per your thread, this isn't possible since the default service account is the one which does the encryption/decryption. Please correct me if I'm wrong.
Now if I connect to my cluster (created using platform SA), during creating the encrypted PVC, the default service account is the one which comes into the picture and not the other SA. Is this right?

@davidz627
Copy link
Contributor

The compute service account does the encryption/decryption.
Screenshot from 2019-10-28 11-46-51

@MeghanaSrinath
Copy link
Author

Thank you @davidz627 ! This worked and my PVC is now bound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants