Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACKPORT][v1.5.3][BUG] Failing to mount encrypted volumes v1.5.2 #7048

Closed
github-actions bot opened this issue Nov 6, 2023 · 8 comments
Closed

[BACKPORT][v1.5.3][BUG] Failing to mount encrypted volumes v1.5.2 #7048

github-actions bot opened this issue Nov 6, 2023 · 8 comments
Assignees
Labels
area/volume-encryption Volume encryption related area/volume-rwx Volume RWX related kind/backport Backport request kind/bug kind/regression Regression which has worked before priority/0 Must be fixed in this release (managed by PO) regression/1.5.2 Regression in <version> require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Milestone

Comments

@github-actions
Copy link

github-actions bot commented Nov 6, 2023

backport #7045

@github-actions github-actions bot added area/volume-encryption Volume encryption related area/volume-rwx Volume RWX related kind/backport Backport request kind/bug kind/regression Regression which has worked before priority/0 Must be fixed in this release (managed by PO) require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage labels Nov 6, 2023
@github-actions github-actions bot added this to the v1.5.3 milestone Nov 6, 2023
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Nov 6, 2023

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:

#7045 (comment)

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/longhorn-manager#2282

  • Which areas/issues this PR might have potential impacts on?
    Area: encrypted rwx volume
    Issues

@innobead innobead added the regression/1.5.2 Regression in <version> label Nov 6, 2023
@roger-ryao
Copy link

Verified on v1.5.x-head 20231113

The test steps

#7045 (comment)

  1. Add storage class kubectl apply -f https://raw.githubusercontent.com/clemenko/k8s_yaml/master/longhorn_encryption.yml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: longhorn-crypto-per-volume
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "80" # 2880 - 48 hours in minutes
  fromBackup: ""
  encrypted: "true"
  # per volume secret which utilizes the `pvc.name` and `pvc.namespace` template parameters
  csi.storage.k8s.io/provisioner-secret-name: ${pvc.name}
  csi.storage.k8s.io/provisioner-secret-namespace: ${pvc.namespace}
  csi.storage.k8s.io/node-publish-secret-name: ${pvc.name}
  csi.storage.k8s.io/node-publish-secret-namespace: ${pvc.namespace}
  csi.storage.k8s.io/node-stage-secret-name: ${pvc.name}
  csi.storage.k8s.io/node-stage-secret-namespace: ${pvc.namespace}
  1. Deploy an App that Requires Encryption kubectl apply -f https://raw.githubusercontent.com/clemenko/fleet/main/flask/flask.yaml
# clemenko
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask
  labels:
    app: flask
spec:
  replicas: 8
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
  selector:
    matchLabels:
      app: flask
  template:
    metadata:
      labels:
        app: flask
    spec:
      containers:
      - name: flask
        securityContext:
          allowPrivilegeEscalation: false
        image: clemenko/flask_simple
        #command: [ "/bin/sh", "-c", "sleep 3003003240204242" ]
        ports:
        - containerPort: 5000
        imagePullPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis
        args: ["--appendonly", "yes"]
        securityContext:
          allowPrivilegeEscalation: false
          seLinuxOptions:
            level: "s0:c123,c456"
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: redis-data
          mountPath: /data
          subPath: 
      volumes:
      - name: redis-data
        persistentVolumeClaim:
          claimName: redis
---

apiVersion: v1
kind: Secret
metadata:
  name: redis
stringData:
  CRYPTO_KEY_VALUE: "flaskisthebestdemoapplication"
  CRYPTO_KEY_PROVIDER: "secret"

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: redis
  labels:
    app: redis
spec:
  storageClassName: "longhorn-crypto-per-volume"
  accessModes: 
    - ReadWriteMany
  resources:
    requests:
      storage: 500Mi
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: flask
    kubernetes.io/name: "flask"
  name: flask
spec:
  selector:
    app: flask
  ports:
  - name: flask
    protocol: TCP
    port: 5000
    targetPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: redis
    kubernetes.io/name: "redis"
  name: redis
spec:
  selector:
    app: redis
  ports:
  - name: redis
    protocol: TCP
    port: 6379
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask
spec:
  rules:
  - host: flask.rfed.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: flask
            port:
              number: 5000
# ---
#apiVersion: ui.cattle.io/v1
#kind: NavLink
#metadata:
#  name: flask
#spec:
#  label: Flask
#  target: _blank
#  toService:
#    name: flask
#    namespace: flask
#    port: '5000'
#    scheme: http

Result Passed

  1. The deployments and services are running successfully

@khushboo-rancher
Copy link
Contributor

@roger-ryao Could you please test few more test cases?

  1. Expansion of encrypted volume.
  2. Attaching/detaching of encrypted volume. e.g. - Killing IM pod, rebooting attached node.
  3. Restoring of encrypted volumes and crashing the IM pod when the restore is in progress.

Note: Leverage the automation scripts with encrypted volume if possible.

@roger-ryao roger-ryao reopened this Nov 14, 2023
@roger-ryao
Copy link

reopen an

@roger-ryao Could you please test few more test cases?

  1. Expansion of encrypted volume.
  2. Attaching/detaching of encrypted volume. e.g. - Killing IM pod, rebooting attached node.
  3. Restoring of encrypted volumes and crashing the IM pod when the restore is in progress.

Note: Leverage the automation scripts with encrypted volume if possible.

Reopen ticket and test a few more test cases

@derekbit
Copy link
Member

@roger-ryao
Can you create e2e tests for your manual tests? We can implement them later. Thank you.

@roger-ryao
Copy link

roger-ryao commented Nov 14, 2023

Verified on v1.5.3-rc1 20231114

Result Passed

  • 1. Expansion of encrypted volume.

  • 2. Attaching/detaching of encrypted volume. e.g. rebooting attached node 5 times.

  • 3. Restoring of encrypted volumes and crashing the IM pod when the restore is in progress.

    1. Restoring Encrypted Volumes and Deleting All Instance Manager Pods in Progress: In v1.5.x, use the following command to delete all Instance Manager pods when a restore is in progress, and you observe the restore volume's replicas in a "Failed" state with "robustness" marked as "faulted," and the state as "Detached"
      kubectl -n longhorn-system delete pods -l longhorn.io/component=instance-manager --wait
    1. Restoring Encrypted Volumes and Deleting One Instance Manager Pod : We could observe one of the restored volume's replicas transitioning to a Failed state with robustness marked as Unknown, and the state as Detached. Despite this, you can still mount pods with this restored volume. The Failed replica will be automatically deleted, and a new replica will be generated, the data remains consistent.
      kubectl get pod -n longhorn-system -l longhorn.io/component -o wide | grep instance-manager | grep -E 'w1' | awk '{print $1}' | xargs kubectl -n longhorn-system delete pod

@roger-ryao Can you create e2e tests for your manual tests? We can implement them later. Thank you.

Hi @derekbit
We can track e2e tests at #7055.

@derekbit
Copy link
Member

Verified on v1.5.3-rc1 20231114

Result Passed

  • 1. Expansion of encrypted volume.
  • 2. Attaching/detaching of encrypted volume. e.g. rebooting attached node 5 times.
  • 3. Restoring of encrypted volumes and crashing the IM pod when the restore is in progress.

@roger-ryao Can you create e2e tests for your manual tests? We can implement them later. Thank you.

Hi @derekbit We can track e2e tests at #7055.

Thank you. Can you help add encrypted volume restore?

@roger-ryao
Copy link

Thank you. Can you help add encrypted volume restore?

Hi @derekbit
Create the ticket #7097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volume-encryption Volume encryption related area/volume-rwx Volume RWX related kind/backport Backport request kind/bug kind/regression Regression which has worked before priority/0 Must be fixed in this release (managed by PO) regression/1.5.2 Regression in <version> require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage
Projects
None yet
Development

No branches or pull requests

5 participants