Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSD failing to start if encrypted and also have metadata device #13737

Closed
SlyngDK opened this issue Feb 9, 2024 · 6 comments · Fixed by #13830
Closed

OSD failing to start if encrypted and also have metadata device #13737

SlyngDK opened this issue Feb 9, 2024 · 6 comments · Fixed by #13830
Assignees
Labels

Comments

@SlyngDK
Copy link

SlyngDK commented Feb 9, 2024

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
Metadata device is not decrypted, on first pod start, because decryption of data device deletes key, before running decryption of metadata.

Expected behavior:
Expected that both data and metadata devices is decrypted the first time.
How to reproduce it (minimal and precise):

Setup using pvc with encryption enabled, with metadata device.
Use vault as KMS.

Start osd with both data and metadata device encrypted, waiting for decrypt.

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary
---
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.1
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  dashboard:
    enabled: true
    ssl: false
  mgr:
    count: 2
  storage:
    config:
      encryptedDevice: "true"
    useAllNodes: true
    useAllDevices: false
    storageClassDeviceSets:
      - name: ceph-set-2
        count: 3
        portable: false
        encrypted: true
        volumeClaimTemplates:
          - metadata:
              name: data
            spec:
              resources:
                requests:
                  storage: 1000Gi
              storageClassName: local-device
              volumeMode: Block
              accessModes:
                - ReadWriteOnce
          - metadata:
              name: metadata
            spec:
              resources:
                requests:
                  storage: 200Gi
              storageClassName: local-device-ssd
              volumeMode: Block
              accessModes:
                - ReadWriteOnce
  security:
    keyRotation:
      enabled: false
    kms:
      connectionDetails:
        KMS_PROVIDER: vault
        VAULT_ADDR: https://xxxx.xxx
        VAULT_AUTH_KUBERNETES_ROLE: xxxxxx
        VAULT_AUTH_METHOD: kubernetes
        VAULT_AUTH_MOUNT_PATH: xxxxxxx
        VAULT_BACKEND_PATH: xxxxxx
        VAULT_SECRET_ENGINE: kv

Logs to submit:

  • Crashing pod(s) logs, if necessary
➜  ~ kubectl logs -n rook rook-ceph-osd-1-5856d8cfd8-zl5nj -c encryption-open-metadata
+ CEPH_FSID=d76efac7-13da-4b9a-88ac-6d33623561db
+ PVC_NAME=ceph-set-2-metadata-06kdkr
+ KEY_FILE_PATH=/etc/ceph/luks_key
+ BLOCK_PATH=/var/lib/ceph/osd/ceph-1/block.db-tmp
+ DM_NAME=ceph-set-2-metadata-06kdkr-db-dmcrypt
+ DM_PATH=/dev/mapper/ceph-set-2-metadata-06kdkr-db-dmcrypt
+ dmsetup version
Library version:   1.02.181-RHEL8 (2021-10-20)
Driver version:    4.48.0
+ '[' -b /dev/mapper/ceph-set-2-metadata-06kdkr-db-dmcrypt ']'
+ open_encrypted_block
+ echo 'Opening encrypted device /var/lib/ceph/osd/ceph-1/block.db-tmp at /dev/mapper/ceph-set-2-metadata-06kdkr-db-dmcrypt'
Opening encrypted device /var/lib/ceph/osd/ceph-1/block.db-tmp at /dev/mapper/ceph-set-2-metadata-06kdkr-db-dmcrypt
+ cryptsetup luksOpen --verbose --disable-keyring --allow-discards --key-file /etc/ceph/luks_key /var/lib/ceph/osd/ceph-1/block.db-tmp ceph-set-2-metadata-06kdkr-db-dmcrypt
Failed to open key file.
Command failed with code -1 (wrong or missing parameters).

Cluster Status to submit:
N/A

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 22.04
  • Kernel (e.g. uname -a): 6.5.0-17-generic
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): 1.13.3
  • Storage backend version (e.g. for ceph do ceph -v): 18.2.1
  • Kubernetes version (use kubectl version): 1.26.4
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): RKE2
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@cupnes
Copy link
Contributor

cupnes commented Feb 13, 2024

I reproducing your situation.

@cupnes
Copy link
Contributor

cupnes commented Feb 14, 2024

I deleted spec.security from your cluster.yaml and it did not reproduce. Perhaps it reproduces when spec.security is written.

@SlyngDK
Copy link
Author

SlyngDK commented Feb 14, 2024

I deleted spec.security from your cluster.yaml and it did not reproduce. Perhaps it reproduces when spec.security is written.

I think it is only a problem when using encryption, and .spec.security, is part of encryption configuration.

@cupnes cupnes mentioned this issue Feb 15, 2024
6 tasks
@cupnes
Copy link
Contributor

cupnes commented Feb 15, 2024

I succeeded in reproducing your situation at #13773.

@cupnes
Copy link
Contributor

cupnes commented Feb 16, 2024

I am busy with another matter and will not be able to investigate until the end of next week.

@cupnes
Copy link
Contributor

cupnes commented Feb 22, 2024

The key file is deleted in the shell script commonly used by both encryption-open and encryption-open-metadata initContainers. Failed to open key file. in encryption-open-metadata because the key file has been deleted when encryption-open is run.

The cause has been identified. I will create PR. Instead of deleting the key file here, the container that is just to delete the key file will be added at the end of the sequence of containers that use the key file. (I will work on it from about next Tuesday due to the holiday.)

cupnes added a commit to cybozu-go/rook that referenced this issue Feb 28, 2024
The key file is deleted in the shell script commonly used by both
encryption-open and encryption-open-metadata initContainers. Failed to
open key file. in encryption-open-metadata because the key file has
been deleted when encryption-open is run.

Instead of deleting the key file, the container that is just to delete
the key file will be added at the end of the sequence of containers
that use the key file.

Fixes: rook#13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
cupnes added a commit to cybozu-go/rook that referenced this issue Feb 29, 2024
The key file is deleted in the shell script commonly used by both
encryption-open and encryption-open-metadata initContainers. Failed to
open key file. in encryption-open-metadata because the key file has
been deleted when encryption-open is run.

Instead of deleting the key file, the container that is just to delete
the key file will be added at the end of the sequence of containers
that use the key file.

Fixes: rook#13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
cupnes added a commit to cybozu-go/rook that referenced this issue Mar 6, 2024
The key file is deleted in the shell script commonly used by both
encryption-open and encryption-open-metadata initContainers. Failed to
open key file. in encryption-open-metadata because the key file has
been deleted when encryption-open is run.

Instead of deleting the key file, the container that is just to delete
the key file will be added at the end of the sequence of containers
that use the key file.

Fixes: rook#13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
cupnes added a commit to cybozu-go/rook that referenced this issue Mar 19, 2024
The key file deletion process is in the shell script commonly used by
all of encryption-open, encryption-open-metadata, and
encryption-open-wal init containers. The key file is deleted at the
encryption-open init container and encryption-open-metadata and
encryption-open-wal init containers are failed to open the key file.

The key file is in the /etc/ceph folder. Unless that folder is shared,
the key file anyway won't be available in the other init containers
even if it is not deleted by these init containers. And it will
naturally anyway be deleted after the init containers are
completed. So The key file deletion process in shell scripts is
unnecessary.

Fixes: rook#13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
mergify bot pushed a commit that referenced this issue Mar 26, 2024
The key file deletion process is in the shell script commonly used by
all of encryption-open, encryption-open-metadata, and
encryption-open-wal init containers. The key file is deleted at the
encryption-open init container and encryption-open-metadata and
encryption-open-wal init containers are failed to open the key file.

The key file is in the /etc/ceph folder. Unless that folder is shared,
the key file anyway won't be available in the other init containers
even if it is not deleted by these init containers. And it will
naturally anyway be deleted after the init containers are
completed. So The key file deletion process in shell scripts is
unnecessary.

Fixes: #13737

Signed-off-by: Yuma Ogami <yuma-ogami@cybozu.co.jp>
(cherry picked from commit cdd655e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants