Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Volume encryption doesn't work on Amazon Linux 2 #5944

Closed
v-starodubov opened this issue May 16, 2023 · 7 comments
Closed

[BUG] Volume encryption doesn't work on Amazon Linux 2 #5944

v-starodubov opened this issue May 16, 2023 · 7 comments
Assignees
Labels
area/environment-issue User-specific related issues, ex: network, DNS, host packages, etc. area/volume-encryption Volume encryption related backport/1.4.5 backport/1.5.4 kind/bug priority/0 Must be fixed in this release (managed by PO) require/doc Require updating the longhorn.io documentation
Milestone

Comments

@v-starodubov
Copy link

v-starodubov commented May 16, 2023

Describe the bug (馃悰 if you encounter this issue)

While using Longhorn on worker nodes with the Amazon Linux 2 image, I encountered an error stating that the Longhorn CSI plugin cannot perform LUKS-related actions on volumes. For example, when attempting to mount a volume created from a PVC manifest for the first time, the pod fails to mount it due to NodeStageVolume returning an exit code 1.

To Reproduce

Steps to reproduce the behavior:

  1. Install cryptsetup and load dm_crypt module in worker nodes.
sudo yum update -y && sudo yum install -y cryptsetup && sudo modprobe dm_crypt;
  1. Create Secret with CRYPTO_KEY_VALUE. Create StorageClass with encrypted: true and CSI storage parameters that references previously created secret. Took them from this documentation page.
  2. Create PVC that will utilize newly created encrypted StorageClass:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: alpine-pod-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn-crypto-global
  1. Create example pod that will use PVC:
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: alpine
      image: alpine/curl
      command: ['sh', '-c', 'echo "Hello from alpine container $(date)" >> /data/message.txt && sleep 3600']
      volumeMounts:
        - name: data
          mountPath: /data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: alpine-pod-pvc
  1. Pod will stuck in Pending state, check log of your pod or longhorn-csi-plugin that serves luksFormat action for this volume.

Expected behavior

longhorn-csi-plugin should perform luksFormat and then luksOpen properly.
Likewise Ubuntu. I tested this steps on Ubuntu 20.04 distribution and everything works fine.

Log or Support bundle

2023-05-12T16:22:33+03:00 level=error msg="NodeStageVolume: err: rpc error: code = Internal desc = failed to encrypt device /dev/longhorn/pvc-09e8177a-fbef-4260-b9dc-dd2d5268c923 with LUKS: failed to run cryptsetup args: [-q luksFormat --type luks2 --cipher aes-xts-plain64 --hash sha256 --key-size 256 --pbkdf argon2i /dev/longhorn/pvc-09e8177a-fbef-4260-b9dc-dd2d5268c923 -d /dev/stdin] output:  error: exit status 1"
2023-05-12T16:24:35+03:00 level=info msg="NodeStageVolume: req: {\"secrets\":\"***stripped***\",\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/04bf18a9c0642f4e84640e8df708cad4512094c6e54b143ddeb37457cb01d0e2/globalmount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"encrypted\":\"true\",\"fromBackup\":\"\",\"numberOfReplicas\":\"2\",\"staleReplicaTimeout\":\"2880\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1683889693166-8081-driver.longhorn.io\"},\"volume_id\":\"pvc-09e8177a-fbef-4260-b9dc-dd2d5268c923\"}"

Environment

  • Longhorn version: 1.4.2
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: EKS 1.26
    • Number of management node in the cluster: 0
    • Number of worker node in the cluster: 2
  • Node config
    • OS type and version: AL2_x86_64 1.26.2-20230509 (ami-0ebd4e6356d0557a5)
    • CPU per node: 2 vCPU
    • Memory per node: 8 GiB
    • Disk type(e.g. SSD/NVMe): SSD GP2
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): AWS EKS
  • Number of Longhorn volumes in the cluster: 1

Additional context

I think this can be related to process of passing passphrase during crypto process. If i pass empty CRYPTO_KEY_VALUE or remove it from secret, the csi-plugin will correctly display that there is missing key value.

During Pending pod status, i can exec into corresponding longhorn-csi-plugin container and perform this steps manually, without -d /dev/stdin:

longhorn-csi-plugin-c5nd5:/ # cryptsetup -q luksFormat --type luks2 --cipher aes-xts-plain64 --hash sha256 --key-size 256 --pbkdf argon2i /dev/longhorn/pvc-8e239295-00e6-47a2-a33a-804c9f2fba1c
Enter passphrase for /dev/longhorn/pvc-8e239295-00e6-47a2-a33a-804c9f2fba1c:  <some passphrase>
longhorn-csi-plugin-c5nd5:/ # echo $?
0

This required for first-time. Next step is to perform luksOpen:

longhorn-csi-plugin-c5nd5:/ # cryptsetup luksOpen /dev/longhorn/pvc-8e239295-00e6-47a2-a33a-804c9f2fba1c pvc-8e239295-00e6-47a2-a33a-804c9f2fba1c
Enter passphrase for /dev/longhorn/pvc-8e239295-00e6-47a2-a33a-804c9f2fba1c: <some passphrase>
longhorn-csi-plugin-c5nd5:/ # echo $?
0

After this steps pod that requested volume should proceed to Running state.

@ChanYiLin
Copy link
Contributor

cc @longhorn/qa

@ChanYiLin
Copy link
Contributor

Hi @v-starodubov
Could you provide the support bundle for us to check more log and the cluster situation?
The NodeStageVolume request would be sent to longhorn csi to do the mounting and encryption.
Thanks

@v-starodubov
Copy link
Author

@ChanYiLin sure.
supportbundle_20885674-c230-4455-a3dc-b809b32b6c60_2023-05-17T08-04-21Z.zip

@ChanYiLin
Copy link
Contributor

cc @derekbit

@derekbit derekbit added this to the v1.6.0 milestone Nov 21, 2023
@derekbit derekbit added area/volume-encryption Volume encryption related investigation-needed Need to identify the case before estimating and starting the development priority/0 Must be fixed in this release (managed by PO) labels Nov 21, 2023
@derekbit
Copy link
Member

derekbit commented Nov 21, 2023

@v-starodubov
Could you check the version of cryptsetup on the host?
I think it is due to that --pbkdf is unrecognized by cryptsetup v1.x.

We will

  • improve the error message in the CSI plugin
  • add a check in the environment check script and official document.

@derekbit derekbit added area/environment-issue User-specific related issues, ex: network, DNS, host packages, etc. require/doc Require updating the longhorn.io documentation and removed investigation-needed Need to identify the case before estimating and starting the development labels Nov 21, 2023
@derekbit derekbit self-assigned this Nov 21, 2023
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Nov 21, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

#5944 (comment)

  • Does the PR include the explanation for the fix or the feature?

Improve the error message

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-manager#2308

  • Which areas/issues this PR might have potential impacts on?
    Area: encrypted volume
    Issues

@chriscchien
Copy link
Contributor

Verified pass on longhorn master (longhorn-manager 23a995) with steps

  1. Cryptsetup version is 1.7.4 on Amazon Linux 2
  2. Can reproduce on Amazon Linux 2 with cryptsetup version=1.7.4 and Ubuntu 22.04 with cryptsetup version=1.7.4, pod will stuck at ContainerCreating
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               25s               default-scheduler        Successfully assigned default/my-pod to ip-172-31-43-90
  Normal   SuccessfulAttachVolume  15s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-03c6da27-bd09-49d2-b60b-d117de538bed"
  Warning  FailedMount             7s (x5 over 15s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-03c6da27-bd09-49d2-b60b-d117de538bed" : rpc error: code = Internal desc = failed to encrypt device /dev/longhorn/pvc-03c6da27-bd09-49d2-b60b-d117de538bed with LUKS: failed to run cryptsetup, args: [-q luksFormat --type luks2 --cipher aes-xts-plain64 --hash sha256 --key-size 256 --pbkdf argon2i /dev/longhorn/pvc-03c6da27-bd09-49d2-b60b-d117de538bed -d /dev/stdin], stdout: , stderr: Usage: cryptsetup [-?vyrq] [-?|--help] [--usage] [--version] [-v|--verbose]
        [--debug] [-c|--cipher=STRING] [-h|--hash=STRING]
        [-y|--verify-passphrase] [-d|--key-file=STRING]
        [--master-key-file=STRING] [--dump-master-key] [-s|--key-size=BITS]
        [-l|--keyfile-size=bytes] [--keyfile-offset=bytes]
        [--new-keyfile-size=bytes] [--new-keyfile-offset=bytes]
        [-S|--key-slot=INT] [-b|--size=SECTORS] [-o|--offset=SECTORS]
        [-p|--skip=SECTORS] [-r|--readonly] [-i|--iter-time=msecs]
        [-q|--batch-mode] [-t|--timeout=secs] [-T|--tries=INT]
        [--align-payload=SECTORS] [--header-backup-file=STRING]
        [--use-random] [--use-urandom] [--shared] [--uuid=STRING]
        [--allow-discards] [--header=STRING] [--test-passphrase]
        [--tcrypt-hidden] [--tcrypt-system] [--tcrypt-backup] [--veracrypt]
        [-M|--type=STRING] [--force-password] [--perf-same_cpu_crypt]
        [--perf-submit_from_crypt_cpus] [OPTION...] <action> <action-specific>
--pbkdf: unknown option
: exit status 1
root@ip-172-31-37-145:/home/ubuntu# cryptsetup --version
cryptsetup 1.7.4
  1. Ubuntu 22.04 use cryptsetup 2.4.3 with test steps not hit any problem, it's a environment issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environment-issue User-specific related issues, ex: network, DNS, host packages, etc. area/volume-encryption Volume encryption related backport/1.4.5 backport/1.5.4 kind/bug priority/0 Must be fixed in this release (managed by PO) require/doc Require updating the longhorn.io documentation
Projects
Status: Resolved/Scheduled
Development

No branches or pull requests

5 participants