Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph-CSI: Data retains if PV is deleted #4651

Closed
ckotzbauer opened this issue Jan 10, 2020 · 13 comments
Closed

Ceph-CSI: Data retains if PV is deleted #4651

ckotzbauer opened this issue Jan 10, 2020 · 13 comments
Labels

Comments

@ckotzbauer
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:

  • The data in the Ceph-Storage retains even if I delete the PV.

Expected behavior:

  • If I delete a PV which was provisioned from the Ceph-CSI-Driver I expect, that the data is deleted from the Ceph-Cluster.

How to reproduce it (minimal and precise):

  • Mount the Ceph-Cluster to a directory to watch the stored files from CSI.
  • Provision a PV with the Ceph-CSI-Driver
  • Delete the PV
  • The corresponding folder in the mounted directory is still there.

File(s) to submit:

  • cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v14.2.5-20191210
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: standard
        resources:
          requests:
            storage: 10Gi
  mgr:
    modules:
    - name: pg_autoscaler
      enabled: true
  dashboard:
    enabled: true
  monitoring:
    enabled: true
  priorityClassNames:
    all: infra-priority
  storage:
    topologyAware: true
    storageClassDeviceSets:
    - name: set1
      count: 1
      portable: false
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: storage-node
                operator: In
                values:
                - storage01
            topologyKey: kubernetes.io/hostname
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 100Gi
          storageClassName: standard
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    - name: set2
      count: 1
      portable: false
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: storage-node
                operator: In
                values:
                - storage02
            topologyKey: kubernetes.io/hostname
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 100Gi
          storageClassName: standard
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
    - name: set3
      count: 1
      portable: false
      placement:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: storage-node
                operator: In
                values:
                - storage03
            topologyKey: kubernetes.io/hostname
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          resources:
            requests:
              storage: 100Gi
          storageClassName: standard
          volumeMode: Block
          accessModes:
            - ReadWriteOnce
---
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: ceph-fs
  namespace: rook-ceph
spec:
  metadataPool:
    failureDomain: host
    replicated:
      size: 3
  dataPools:
    - replicated:
        size: 3
      failureDomain: host
  preservePoolsOnDelete: true
  metadataServer:
    activeCount: 2
    activeStandby: true
    placement:
       podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rook-ceph-mds
              # topologyKey: failure-domain.beta.kubernetes.io/zone can be used to spread MDS across different AZ
              # topologyKey: kubernetes.io/hostname will place MDS across different hosts
              topologyKey: kubernetes.io/hostname
  • rook-helm-chart.yaml
image:
  prefix: rook
  repository: rook/ceph
  tag: v1.2.1
  pullPolicy: Always

resources:
  limits:
    cpu: 500m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 256Mi

rbacEnable: true
pspEnable: true

csi:
  enableRbdDriver: false
  enableCephfsDriver: true
  enableGrpcMetrics: false
  enableSnapshotter: false
  cephFSPluginUpdateStrategy: OnDelete
  forceCephFSKernelClient: true

enableFlexDriver: false
enableDiscoveryDaemon: true
  • storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  clusterID: rook-ceph
  fsName: ceph-fs
  pool: ceph-fs-data0
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Retain

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 18.04 LTS
  • Kernel (e.g. uname -a): 4.15.0-1044-gke
  • Cloud provider or hardware configuration: GKE
  • Rook version (use rook version inside of a Rook Pod): 1.2.1
  • Storage backend version (e.g. for ceph do ceph -v): ceph/ceph:v14.2.5-20191210
  • Kubernetes version (use kubectl version): v1.14.8-gke.17
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): GKE
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTHY
@ckotzbauer ckotzbauer added the bug label Jan 10, 2020
@Madhu-1
Copy link
Member

Madhu-1 commented Jan 10, 2020

@code-chris have you created and deleted PV or PVC?

@ckotzbauer
Copy link
Author

Both created and both deleted

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 10, 2020

@code-chris you have created PV or PVC?

@ckotzbauer
Copy link
Author

  • I created a Deployment which triggered a dynamic provisioning of a PV and a corresponding PVC.
  • I deleted the Deployment, the PVC and the PV in this order.

I created the PV or PVC not manually. The CSI-Driver did.

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 10, 2020

have you deleted the PV manually if yes its not an issue, if no this is a cephfs issue

@ckotzbauer
Copy link
Author

ckotzbauer commented Jan 10, 2020

I deleted the PV manually, as the ReclaimPolicy of the StorageClass is Retain.
Why is this not an issue? The Cluster stores data which can never be accessed again...

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 10, 2020

the user should not delete the PV object (the provisioner as to delete the PV object after deleting the backend image), check provisioner logs

@ckotzbauer
Copy link
Author

Yes, there are logs which indicate, that the provisioner tries to sync the folders with PVs. I will test that.
When would the provisioner delete the PV and the stored data? When the PVC is deleted and the PV changes to "Released" state? This would only be correct for the "Delete" reclaimPolicy and not for "Retain"...

@Madhu-1
Copy link
Member

Madhu-1 commented Jan 10, 2020

yeah sorry i didn't noticed the reclaim policy, in that case even if you delete PV and PVC admin need to manually cleanup the backend storage. This is not a bug working as expected see https://kubernetes.io/docs/concepts/storage/persistent-volumes/#retain

@ckotzbauer
Copy link
Author

ah ok, seems I missed that. Thanks.
I thought, that the PV and the backend storage is always deleted at the same time and the policy only decides if this happens automatically or manually.
Then this is not an issue.

@ehassan1312
Copy link

I made the same mistake and deleted the PV manually .

How can I cleanup ceph ?

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 19, 2022

@ehassan1312 try https://www.mrajanna.com/tracking-pv-rados-omap-in-cephcsi/ you will find a way to track down the mapping between rbd image in the pool and the pv. if the pv is not present delete rbd image and rados object.

@reefland
Copy link

reefland commented Mar 3, 2024

I've written a script that will cross-references your existing Rook-Ceph PV's to Ceph RBD Images and list out images that are Stale/Orphaned and can be removed:

https://github.com/reefland/find-orphaned-rbd-images

Such as:

--[ RBD Image has no Persistent Volume (PV) ]----------------------------
NAME                                          PROVISIONED  USED
csi-vol-cbaa0262-461e-4fe8-a8bb-07f655bb423f       50 GiB  872 MiB
size 50 GiB in 12800 objects
snapshot_count: 0
create_timestamp: Tue Feb 27 15:42:34 2024
access_timestamp: Tue Feb 27 15:42:34 2024
modify_timestamp: Tue Feb 27 15:42:34 2024
-------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants