Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade V11.10 to V.12 - CephFS mounting issues #12843

Closed
voarsh2 opened this issue Sep 4, 2023 · 23 comments
Closed

Upgrade V11.10 to V.12 - CephFS mounting issues #12843

voarsh2 opened this issue Sep 4, 2023 · 23 comments
Labels

Comments

@voarsh2
Copy link

voarsh2 commented Sep 4, 2023

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
Volume mount should work but does not

Expected behavior:
Moutning should work
How to reproduce it (minimal and precise):
RKE2 V24 Kubernetes on Ubuntu 22.0.4 VM's, upgrade Rook from 11.10 V12

File(s) to submit:

  • Cluster CR (custom resource), typically called cluster.yaml, if necessary

Logs to submit:
Pod logs after upgrading and trying to mount volumes:

(combined from similar events): MountVolume.MountDevice failed for volume "pvc-fbef040a-92ec-4cf7-950b-9371a3526916" : 
rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: 
[-t ceph 10.43.90.103:6789,10.43.82.101:6789,10.43.210.12:6789:/volumes/csi/csi-vol-a5fd4116-242a-4b73-a9e3-0d34ca593c80/5554c604-03b5-4a70-b528-d98266f541bf 
/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/e41d352f102de3bd0dd3aed109e40cd37cf57d4066394d540f142338c4776a89/
globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-105079948,mds_namespace=ceph-filesystem,discard,_netdev] 
stderr: unable to get monitor info from DNS SRV with service name: ceph-mon 
2023-09-01T15:53:28.990+0000 7f42103e80c0 -1 failed for service _ceph-mon._tcp mount error 22 = Invalid argument
  • Operator's logs, if necessary:
  • Crashing pod(s) logs, if necessary:

Cluster Status to submit:

  • Output of krew commands, if necessary

    To get the health of the cluster, use kubectl rook-ceph health
    To get the status of the cluster, use kubectl rook-ceph ceph status
    For more details, see the Rook Krew Plugin

Environment:

  • OS (e.g. from /etc/os-release): Ubuntu 22.0.4
  • Kernel (e.g. uname -a):
  • Linux worker2 6.2.0-26-generic Pickup latest ceph with reworked commits #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • Cloud provider or hardware configuration:
  • Rook version (use rook version inside of a Rook Pod): V12
  • Storage backend version (e.g. for ceph do ceph -v): 17.2.6
  • Kubernetes version (use kubectl version): v24
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): RKE2
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
@voarsh2 voarsh2 added the bug label Sep 4, 2023
@Madhu-1
Copy link
Member

Madhu-1 commented Sep 4, 2023

@voarsh2 what is the cephcsi version in the cluster?

@voarsh2
Copy link
Author

voarsh2 commented Sep 4, 2023

@voarsh2 what is the cephcsi version in the cluster?

I did not customise the cephcsi images in v11.10 and when trying to upgrade to v12.

csi-provisioner:v3.4.0

cephcsi:v3.8.0
(On v11.10, right now)

On the Rook Ceph Cluster Helm chart, I did add mount options for RBD and CephFS - discard mountoption, I am not sure if this might be causing issues with the upgrade?

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 5, 2023

@psavva
Copy link

psavva commented Sep 11, 2023

I'm seeing the exact same issue, it existed in version 1.11, and I've upgraded to 1.12 to see if it was fixed, but, still there.

  Warning  FailedMount             13s   kubelet                  MountVolume.MountDevice failed for volume "pvc-fcbd7c07-c7b3-48c2-af6b-1dafe7c86d44" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.43.222.172:3300,10.43.216.60:3300,10.43.232.125:3300:/volumes/csi/csi-vol-7a2fc08d-5596-4986-a946-9804dc248dca/ffeaf279-7157-45ef-8d04-df971b85a2f9 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/733472829f2976f36a5125e295e87c98bf696c00cddaed3b542ee6b74ca7ed14/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1319642594,mds_namespace=myfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-09-11T11:28:54.789+0000 7fc47cdc70c0 -1 failed for service _ceph-mon._tcp
mount error: no mds server is up or the cluster is laggy

CephFS:

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ceph.rook.io/v1","kind":"CephFilesystem","metadata":{"annotations":{},"name":"myfs","namespace":"rook-ceph"},"spec":{"dataPools":[{"failureDomain":"host","name":"replicated","parameters":{"compression_mode":"none"},"replicated":{"requireSafeReplicaSize":true,"size":3}}],"metadataPool":{"parameters":{"compression_mode":"none"},"replicated":{"requireSafeReplicaSize":true,"size":3}},"metadataServer":{"activeCount":1,"activeStandby":true,"livenessProbe":{"disabled":false},"placement":{"podAntiAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["rook-ceph-mds"]}]},"topologyKey":"topology.kubernetes.io/zone"},"weight":100}],"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["rook-ceph-mds"]}]},"topologyKey":"kubernetes.io/hostname"}]}},"priorityClassName":"system-cluster-critical","startupProbe":{"disabled":false}},"preserveFilesystemOnDelete":true}}
  creationTimestamp: "2023-09-11T10:34:12Z"
  finalizers:
  - cephfilesystem.ceph.rook.io
  generation: 2
  name: myfs
  namespace: rook-ceph
  resourceVersion: "44132234"
  uid: 868dbb92-6598-4d49-8cad-80962f0d132d
spec:
  dataPools:
  - erasureCoded:
      codingChunks: 0
      dataChunks: 0
    failureDomain: host
    mirroring: {}
    name: replicated
    parameters:
      compression_mode: none
    quotas: {}
    replicated:
      requireSafeReplicaSize: true
      size: 3
    statusCheck:
      mirror: {}
  metadataPool:
    erasureCoded:
      codingChunks: 0
      dataChunks: 0
    mirroring: {}
    parameters:
      compression_mode: none
    quotas: {}
    replicated:
      requireSafeReplicaSize: true
      size: 3
    statusCheck:
      mirror: {}
  metadataServer:
    activeCount: 1
    activeStandby: true
    livenessProbe: {}
    placement:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - rook-ceph-mds
            topologyKey: topology.kubernetes.io/zone
          weight: 100
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - rook-ceph-mds
          topologyKey: kubernetes.io/hostname
    priorityClassName: system-cluster-critical
    resources: {}
    startupProbe: {}
  preserveFilesystemOnDelete: true
  statusCheck:
    mirror: {}
status:
  observedGeneration: 2
  phase: Ready

StorageClass:

kubectl get sc -o yaml rook-cephfs
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"rook-cephfs"},"mountOptions":null,"parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/controller-expand-secret-name":"rook-csi-cephfs-provisioner","csi.storage.k8s.io/controller-expand-secret-namespace":"rook-ceph","csi.storage.k8s.io/node-stage-secret-name":"rook-csi-cephfs-node","csi.storage.k8s.io/node-stage-secret-namespace":"rook-ceph","csi.storage.k8s.io/provisioner-secret-name":"rook-csi-cephfs-provisioner","csi.storage.k8s.io/provisioner-secret-namespace":"rook-ceph","fsName":"myfs","pool":"myfs-replicated"},"provisioner":"rook-ceph.cephfs.csi.ceph.com","reclaimPolicy":"Delete"}
  creationTimestamp: "2023-09-11T10:35:00Z"
  name: rook-cephfs
  resourceVersion: "44124794"
  uid: 2f1d5b42-de86-4883-a385-97deb81616f2
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  fsName: myfs
  pool: myfs-replicated
provisioner: rook-ceph.cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

ceph status:

ceph status
cluster:
id: fea9fd9e-843b-4637-beb7-bf5a6295c201
health: HEALTH_OK

services:
mon: 3 daemons, quorum a,b,c (age 6d)
mgr: b(active, since 49s), standbys: a
mds: 2/2 daemons up, 2 hot standby
osd: 3 osds: 3 up (since 25m), 3 in (since 4d)

data:
volumes: 2/2 healthy
pools: 6 pools, 145 pgs
objects: 13.44k objects, 50 GiB
usage: 98 GiB used, 122 GiB / 220 GiB avail
pgs: 145 active+clean

io:
client: 3.5 KiB/s rd, 5 op/s rd, 0 op/s wr

bash-4.4$ ceph status

  cluster:
    id:     fea9fd9e-843b-4637-beb7-bf5a6295c201
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 6d)
    mgr: b(active, since 10m), standbys: a
    mds: 2/2 daemons up, 2 hot standby
    osd: 3 osds: 3 up (since 40m), 3 in (since 4d)

  data:
    volumes: 2/2 healthy
    pools:   6 pools, 145 pgs
    objects: 13.44k objects, 50 GiB
    usage:   98 GiB used, 122 GiB / 220 GiB avail
    pgs:     145 active+clean

  io:
    client:   2.2 KiB/s rd, 170 B/s wr, 4 op/s rd, 0 op/s wr

Versions:

rook-ceph-crashcollector-autoscaled-nbg1-cpx31-610e0274160b819b         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master1         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master2         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master3         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mds-cephfs-a          req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mds-cephfs-b          req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mds-myfs-a    req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mds-myfs-b    req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mgr-a         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mgr-b         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mon-a         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mon-b         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-mon-c         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-osd-0         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-osd-3         req/upd/avl: 1/1/1      rook-version=v1.12.3
rook-ceph-osd-4         req/upd/avl: 1/1/1      rook-version=v1.12.3

myfs status:

RANK      STATE        MDS       ACTIVITY     DNS    INOS   DIRS   CAPS
 0        active      myfs-a  Reqs:    0 /s    28     22     19      6
0-s   standby-replay  myfs-b  Evts:    0 /s    19     13     10      0
      POOL         TYPE     USED  AVAIL
 myfs-metadata   metadata   605k  35.9G
myfs-replicated    data    12.0k  35.9G
MDS version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
bash-4.4$

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 12, 2023

Warning FailedMount 13s kubelet MountVolume.MountDevice failed for volume "pvc-fcbd7c07-c7b3-48c2-af6b-1dafe7c86d44" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.43.222.172:3300,10.43.216.60:3300,10.43.232.125:3300:/volumes/csi/csi-vol-7a2fc08d-5596-4986-a946-9804dc248dca/ffeaf279-7157-45ef-8d04-df971b85a2f9 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/733472829f2976f36a5125e295e87c98bf696c00cddaed3b542ee6b74ca7ed14/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1319642594,mds_namespace=myfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-09-11T11:28:54.789+0000 7fc47cdc70c0 -1 failed for service _ceph-mon._tcp
mount error: no mds server is up or the cluster is laggy

@psavva you don't have the exact error (invalid) here, it looks to be some other problem. please check https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/ can help you.

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 12, 2023

@psavva what is the kernel version on the node ?

@psavva
Copy link

psavva commented Sep 12, 2023

Hi @Madhu-1

Thank you very much for getting back to me.
The kernel version is:

$ uname -a
Linux  5.15.0-79-generic #86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 12, 2023

@psavva can you run the ceph mount command manually on the cephfsplugin container like ceph -t ceph 10.43.222.172:3300,10.43.216.60:3300,10.43.232.125:3300:/volumes/csi/csi-vol-7a2fc08d-5596-4986-a946-9804dc248dca/ffeaf279-7157-45ef-8d04-df971b85a2f9 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/733472829f2976f36a5125e295e87c98bf696c00cddaed3b542ee6b74ca7ed14/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1319642594,mds_namespace=myfs,_netdev replace keyfile and mount point and see if it works?

@psavva
Copy link

psavva commented Sep 12, 2023

Hi @Madhu-1

I've deployed the direct-mount pod and did the following:

I create a cephFS Volume, using the rook-cephfs StorageClass

NAME                              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
hcloud-volumes (default)          csi.hetzner.cloud               Delete          WaitForFirstConsumer   true                   95d
hcloud-volumes-retain (default)   csi.hetzner.cloud               Retain          WaitForFirstConsumer   true                   95d
rook-ceph-block                   rook-ceph.rbd.csi.ceph.com      Retain          Immediate              true                   69d
rook-cephfs                       rook-ceph.cephfs.csi.ceph.com   Delete          Immediate              true                   25h**
rook-cephfs-retain                rook-ceph.cephfs.csi.ceph.com   Retain          Immediate              true                   26h

Definition of the StorageClass rook-cephfs

kubectl describe sc rook-cephfs
Name:            rook-cephfs
IsDefaultClass:  No
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"rook-cephfs"},"mountOptions":null,"parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/controller-expand-secret-name":"rook-csi-cephfs-provisioner","csi.storage.k8s.io/controller-expand-secret-namespace":"rook-ceph","csi.storage.k8s.io/node-stage-secret-name":"rook-csi-cephfs-node","csi.storage.k8s.io/node-stage-secret-namespace":"rook-ceph","csi.storage.k8s.io/provisioner-secret-name":"rook-csi-cephfs-provisioner","csi.storage.k8s.io/provisioner-secret-namespace":"rook-ceph","fsName":"myfs","pool":"myfs-replicated"},"provisioner":"rook-ceph.cephfs.csi.ceph.com","reclaimPolicy":"Delete"}

Provisioner:           rook-ceph.cephfs.csi.ceph.com
Parameters:            clusterID=rook-ceph,csi.storage.k8s.io/controller-expand-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/controller-expand-secret-namespace=rook-ceph,csi.storage.k8s.io/node-stage-secret-name=rook-csi-cephfs-node,csi.storage.k8s.io/node-stage-secret-namespace=rook-ceph,csi.storage.k8s.io/provisioner-secret-name=rook-csi-cephfs-provisioner,csi.storage.k8s.io/provisioner-secret-namespace=rook-ceph,fsName=myfs,pool=myfs-replicated
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>

I've created a PVC, with the resultant PV created:

kubectl describe pv pvc-39a4627a-386e-4dfb-b013-9e577de5a2a3
Name:            pvc-39a4627a-386e-4dfb-b013-9e577de5a2a3
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
                 volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-cephfs-provisioner
                 volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    rook-cephfs
Status:          Bound
Claim:           default/cephfs-pvc
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            rook-ceph.cephfs.csi.ceph.com
    FSType:
    VolumeHandle:      0001-0009-rook-ceph-0000000000000002-0ec37091-9359-4085-86e8-87fa86d961e3
    ReadOnly:          false
    VolumeAttributes:      clusterID=rook-ceph
                           fsName=myfs
                           pool=myfs-replicated
                           storage.kubernetes.io/csiProvisionerIdentity=1694432157002-8278-rook-ceph.cephfs.csi.ceph.com
                           subvolumeName=csi-vol-0ec37091-9359-4085-86e8-87fa86d961e3
                           subvolumePath=/volumes/csi/csi-vol-0ec37091-9359-4085-86e8-87fa86d961e3/150a7371-55d7-43fe-a214-8dbf8637f580
Events:                <none>

Results of crating the PVC and the CSI creating the PV:

kubectl get events
LAST SEEN   TYPE     REASON                  OBJECT                             MESSAGE
6m35s       Normal   Provisioning            persistentvolumeclaim/cephfs-pvc   External provisioner is provisioning volume for claim "default/cephfs-pvc"
6m35s       Normal   ExternalProvisioning    persistentvolumeclaim/cephfs-pvc   waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator

I've tried mounting the volume as such using the direct-mount pod.

mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')
mkdir /tmp/rookmount
mount -t ceph $mon_endpoints:/volumes/csi/csi-vol-0ec37091-9359-4085-86e8-87fa86d961e3/150a7371-55d7-43fe-a214-8dbf8637f580 /tmp/rookmount/ -o name=csi-cephfs-node,secret=$my_secret,mds_namespace=myfs,_netdev

and the result is:

mount error: no mds server is up or the cluster is laggy
mount error: no mds server is up or the cluster is laggy

and finally, these are the running pods.

Please take note of the rook-ceph-mds-cephfs-a and b pods running:

kubectl get pods -n rook-ceph
NAME                                                              READY   STATUS      RESTARTS        AGE
csi-cephfsplugin-cknl7                                            2/2     Running     0               23h
csi-cephfsplugin-ctfvh                                            2/2     Running     0               24h
csi-cephfsplugin-kkhrr                                            2/2     Running     0               24h
csi-cephfsplugin-provisioner-668dfcf95b-2dwll                     5/5     Running     0               23h
csi-cephfsplugin-provisioner-668dfcf95b-q5fhs                     5/5     Running     0               24h
csi-cephfsplugin-z7rl9                                            2/2     Running     0               24h
csi-rbdplugin-6dz64                                               2/2     Running     0               24h
csi-rbdplugin-79zdx                                               2/2     Running     0               24h
csi-rbdplugin-provisioner-5b78f67bbb-mzqqf                        5/5     Running     0               24h
csi-rbdplugin-provisioner-5b78f67bbb-nlxcx                        5/5     Running     0               24h
csi-rbdplugin-pxhqx                                               2/2     Running     0               23h
csi-rbdplugin-tvlls                                               2/2     Running     0               24h
rook-ceph-crashcollector-autoscaled-nbg1-cpx31-7cb1f4a37657k69b   1/1     Running     0               23h
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master1-5cwgfbv   1/1     Running     0               26h
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master2-9bxxjcm   1/1     Running     0               26h
rook-ceph-crashcollector-dev3ahealthcloud-cpx21-master3-d6tbhn4   1/1     Running     0               25h
rook-ceph-mds-cephfs-a-64f5d7f945-s2hjp                           2/2     Running     0               26h
rook-ceph-mds-cephfs-b-57b6698754-kkqsl                           2/2     Running     0               26h
rook-ceph-mds-myfs-a-848c648f75-zntjn                             2/2     Running     0               23h
rook-ceph-mds-myfs-b-b45868db4-tttkq                              2/2     Running     0               24h
rook-ceph-mgr-a-78f5f77c4b-8gdcl                                  3/3     Running     0               24h
rook-ceph-mgr-b-5d7fdd94cc-j92zl                                  3/3     Running     0               24h
rook-ceph-mon-a-6f44c979c5-wsjgp                                  2/2     Running     8 (7d2h ago)    14d
rook-ceph-mon-b-bd57597f-m2w7f                                    2/2     Running     8 (7d2h ago)    14d
rook-ceph-mon-c-6c6b9b8d5-nr288                                   2/2     Running     10 (7d2h ago)   14d
rook-ceph-operator-99d76446-ks7t7                                 1/1     Running     0               24h
rook-ceph-osd-0-556cc5d4cd-b5p6b                                  2/2     Running     0               24h
rook-ceph-osd-3-5769486d67-64mq6                                  2/2     Running     0               24h
rook-ceph-osd-4-f99786c79-g4ml2                                   2/2     Running     0               24h
rook-ceph-osd-prepare-dev3ahealthcloud-cpx21-master1-rvhh5        0/1     Completed   0               3h48m
rook-ceph-osd-prepare-dev3ahealthcloud-cpx21-master2-fphlt        0/1     Completed   0               3h48m
rook-ceph-osd-prepare-dev3ahealthcloud-cpx21-master3-wvf82        0/1     Completed   0               3h48m
rook-ceph-tools-7cd4cd9c9c-8q55s                                  1/1     Running     0               38m
rook-direct-mount-594d8479fc-dnn2f                                1/1     Running     0               38m

@psavva
Copy link

psavva commented Sep 12, 2023

Results of ceph status

kubectl rook-ceph ceph status
Info: running 'ceph' command with args: [status]
  cluster:
    id:     fea9fd9e-843b-4637-beb7-bf5a6295c201
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 7d)
    mgr: b(active, since 3h), standbys: a
    mds: 2/2 daemons up, 2 hot standby
    osd: 3 osds: 3 up (since 24h), 3 in (since 5d)

  data:
    volumes: 2/2 healthy
    pools:   6 pools, 145 pgs
    objects: 13.50k objects, 50 GiB
    usage:   98 GiB used, 122 GiB / 220 GiB avail
    pgs:     145 active+clean

  io:
    client:   2.5 KiB/s rd, 4 op/s rd, 0 op/s wr

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 12, 2023

mds: 2/2 daemons up, 2 hot standby

it says you have both mds in standby mode. tagging @travisn for help

@Madhu-1
Copy link
Member

Madhu-1 commented Sep 12, 2023

rook-ceph-mds-cephfs-a-64f5d7f945-s2hjp 2/2 Running 0 26h
rook-ceph-mds-cephfs-b-57b6698754-kkqsl

Is these 2 mds getting used by anyone?

@psavva
Copy link

psavva commented Sep 12, 2023

How can i check?

Here are the logs:

root@dev3ahealthcloud-cpx21-master1:/home/deployment/infrastructure/rook/deploy/examples/csi/cephfs# kubectl logs rook-ceph-mds-cephfs-a-64f5d7f945-s2hjp -n rook-ceph
Defaulted container "mds" out of: mds, log-collector, chown-container-data-dir (init)
debug 2023-09-11T09:27:34.676+0000 7f5d6566da80  0 set uid:gid to 167:167 (ceph:ceph)
debug 2023-09-11T09:27:34.676+0000 7f5d6566da80  0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process ceph-mds, pid 13
debug 2023-09-11T09:27:34.676+0000 7f5d6566da80  1 main not setting numa affinity
debug 2023-09-11T09:27:34.676+0000 7f5d6566da80  0 pidfile_write: ignore empty --pid-file
starting mds.cephfs-a at
debug 2023-09-11T09:27:34.680+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 149 from mon.1
debug 2023-09-11T09:27:35.607+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 150 from mon.1
debug 2023-09-11T09:27:35.607+0000 7f5d5b8f5700  1 mds.cephfs-a Monitors have assigned me to become a standby.
debug 2023-09-11T09:27:35.623+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 151 from mon.1
debug 2023-09-11T09:27:35.627+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map i am now mds.0.151
debug 2023-09-11T09:27:35.627+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map state change up:standby --> up:replay
debug 2023-09-11T09:27:35.627+0000 7f5d5b8f5700  1 mds.0.151 replay_start
debug 2023-09-11T09:27:35.635+0000 7f5d550e8700  0 mds.0.cache creating system inode with ino:0x100
debug 2023-09-11T09:27:35.635+0000 7f5d550e8700  0 mds.0.cache creating system inode with ino:0x1
debug 2023-09-11T09:27:35.651+0000 7f5d540e6700  1 mds.0.151 Finished replaying journal
debug 2023-09-11T09:27:35.655+0000 7f5d540e6700  1 mds.0.151 making mds journal writeable
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 152 from mon.1
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map i am now mds.0.151
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map state change up:replay --> up:reconnect
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.0.151 reconnect_start
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.0.151 reopen_log
debug 2023-09-11T09:27:36.623+0000 7f5d5b8f5700  1 mds.0.151 reconnect_done
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 153 from mon.1
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map i am now mds.0.151
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map state change up:reconnect --> up:rejoin
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.0.151 rejoin_start
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.0.151 rejoin_joint_start
debug 2023-09-11T09:27:37.631+0000 7f5d5b8f5700  1 mds.0.151 rejoin_done
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 154 from mon.1
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map i am now mds.0.151
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.0.151 handle_mds_map state change up:rejoin --> up:active
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.0.151 recovery_done -- successful recovery!
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.0.151 active_start
debug 2023-09-11T09:27:38.635+0000 7f5d5b8f5700  1 mds.0.151 cluster recovered.
debug 2023-09-11T09:27:40.667+0000 7f5d5b8f5700  1 mds.cephfs-a Updating MDS map to version 156 from mon.1
debug 2023-09-11T09:27:41.623+0000 7f5d590f0700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request.
debug 2023-09-11T09:27:53.947+0000 7f5d5d8f9700  1 mds.cephfs-a asok_command: status {prefix=status} (starting...)
.
.
.
.debug 2023-09-12T11:53:53.944+0000 7f5d5d8f9700  1 mds.cephfs-a asok_command: status {prefix=status} (starting...)
debug 2023-09-12T11:54:04.004+0000 7f5d5d8f9700  1 mds.cephfs-a asok_command: status {prefix=status} (starting...)
debug 2023-09-12T11:54:13.947+0000 7f5d5d8f9700  1 mds.cephfs-a asok_command: status {prefix=status} (starting...)


and

kubectl logs rook-ceph-mds-cephfs-b-57b6698754-kkqsl -n rook-ceph
Defaulted container "mds" out of: mds, log-collector, chown-container-data-dir (init)
starting mds.cephfs-b at
debug 2023-09-11T09:27:39.978+0000 7f2b632b2a80  0 set uid:gid to 167:167 (ceph:ceph)
debug 2023-09-11T09:27:39.978+0000 7f2b632b2a80  0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process ceph-mds, pid 13
debug 2023-09-11T09:27:39.978+0000 7f2b632b2a80  1 main not setting numa affinity
debug 2023-09-11T09:27:39.978+0000 7f2b632b2a80  0 pidfile_write: ignore empty --pid-file
debug 2023-09-11T09:27:39.986+0000 7f2b5953a700  1 mds.cephfs-b Updating MDS map to version 154 from mon.0
debug 2023-09-11T09:27:40.662+0000 7f2b5953a700  1 mds.cephfs-b Updating MDS map to version 155 from mon.0
debug 2023-09-11T09:27:40.662+0000 7f2b5953a700  1 mds.cephfs-b Monitors have assigned me to become a standby.
debug 2023-09-11T09:27:40.674+0000 7f2b5953a700  1 mds.cephfs-b Updating MDS map to version 156 from mon.0
debug 2023-09-11T09:27:40.674+0000 7f2b5953a700  1 mds.0.0 handle_mds_map i am now mds.6033387.0 replaying mds.0.0
debug 2023-09-11T09:27:40.674+0000 7f2b5953a700  1 mds.0.0 handle_mds_map state change up:standby --> up:standby-replay
debug 2023-09-11T09:27:40.674+0000 7f2b5953a700  1 mds.0.0 replay_start
debug 2023-09-11T09:27:40.682+0000 7f2b52d2d700  0 mds.0.cache creating system inode with ino:0x100
debug 2023-09-11T09:27:40.686+0000 7f2b52d2d700  0 mds.0.cache creating system inode with ino:0x1
debug 2023-09-11T09:27:59.014+0000 7f2b5b53e700  1 mds.cephfs-b asok_command: status {prefix=status} (starting...)
debug 2023-09-11T09:28:08.925+0000 7f2b5b53e700  1 mds.cephfs-b asok_command: status {prefix=status} (starting...)
.
.
.
debug 2023-09-12T11:57:39.092+0000 7f2b5b53e700  1 mds.cephfs-b asok_command: status {prefix=status} (starting...)
debug 2023-09-12T11:57:48.979+0000 7f2b5b53e700  1 mds.cephfs-b asok_command: status {prefix=status} (starting...)
debug 2023-09-12T11:57:59.075+0000 7f2b5b53e700  1 mds.cephfs-b asok_command: status {prefix=status} (starting...)

@travisn
Copy link
Member

travisn commented Sep 12, 2023

rook-ceph-mds-cephfs-a-64f5d7f945-s2hjp 2/2 Running 0 26h
rook-ceph-mds-cephfs-b-57b6698754-kkqsl

Is these 2 mds getting used by anyone?

The MDS status looks valid. There are two filesystems, each with one active and one standby, for a total of 4 mds pods. So the ceph status at least looks perfectly healthy.

@psavva
Copy link

psavva commented Sep 12, 2023

Any ideas why it doesn't mount?
Bug?

@travisn
Copy link
Member

travisn commented Sep 12, 2023

Could anything in your cluster have changed outside of rook? Network? Kernel? Mounting issues are usually some environmental issue like that, but @Madhu-1 can speak more to those issues.

@psavva
Copy link

psavva commented Sep 12, 2023

This is a new cluster setup on Hertzner.
Ceph block works perfectly, the issue is just with cephfs.

It's literally a few weeks old.

I think they're is a bug here as I'm not the only one facing this.

@psavva
Copy link

psavva commented Sep 12, 2023

Warning FailedMount 13s kubelet MountVolume.MountDevice failed for volume "pvc-fcbd7c07-c7b3-48c2-af6b-1dafe7c86d44" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.43.222.172:3300,10.43.216.60:3300,10.43.232.125:3300:/volumes/csi/csi-vol-7a2fc08d-5596-4986-a946-9804dc248dca/ffeaf279-7157-45ef-8d04-df971b85a2f9 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/733472829f2976f36a5125e295e87c98bf696c00cddaed3b542ee6b74ca7ed14/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1319642594,mds_namespace=myfs,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-09-11T11:28:54.789+0000 7fc47cdc70c0 -1 failed for service _ceph-mon._tcp
mount error: no mds server is up or the cluster is laggy

@psavva you don't have the exact error (invalid) here, it looks to be some other problem. please check https://rook.io/docs/rook/latest/Troubleshooting/ceph-csi-common-issues/ can help you.

I'll give this a try tomorrow morning and report back here

@ivanovpavel1983
Copy link

ivanovpavel1983 commented Sep 27, 2023

Same issue with rook 1.12.4 + hostnetwork: true ((

  Type     Reason                  Age   From                     Message
  ----     ------                  ----  ----                     -------
  Normal   Scheduled               79s   default-scheduler        Successfully assigned rook-ceph/ceph-fs-test to test-node-01lp
  Normal   SuccessfulAttachVolume  78s   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-5daef63b-1a62-4a83-91d9-e2808afd2336"
  Warning  FailedMount             13s   kubelet                  MountVolume.MountDevice failed for volume "pvc-5daef63b-1a62-4a83-91d9-e2808afd2336" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.1.36.46:3300,10.1.32.39:3300,10.1.36.43:3300:/volumes/csi/csi-vol-dbd0e48c-1d80-40ca-90ef-58ed66499b08/0aba01d1-e9c7-4831-b241-ae9a7ea10116 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/f0b24e16977818f3e82d4d862ce95e130150cbcadd4234e4dc61119a3ebac744/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-643654074,mds_namespace=ceph-filesystem,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-09-27T12:19:59.992+0000 7f6f3f27a0c0 -1 failed for service _ceph-mon._tcp

@voarsh2
Copy link
Author

voarsh2 commented Oct 30, 2023

@voarsh2 see the breaking change here for cephfs PVC https://github.com/rook/rook/blob/release-1.12/Documentation/Upgrade/rook-upgrade.md#breaking-changes-in-v112

I haven't gotten around to looking at this in more detail.
But, I have hundreds of PVC's - a quick skim thru the link, all the PVC's need to have the mount option removed?
Or can I just remove the mount option in the cephfs section of the rook-ceph-cluster helm chart install?
I am assuming that all the pods and related resources are updated via helm, so I just need to tweak the PV's mountoptions?

TL;DR: I need to edit all CephFS PV's?
https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes

Here's a sample CephFS PV I have:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-cephfs-provisioner
    volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
  name: pvc-0b5e0887-b565-4db2-bf34-a97e60a7ba32
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: app-php-conf-cephfs
    namespace: default
    resourceVersion: "99198012"
    uid: 0b5e0887-b565-4db2-bf34-a97e60a7ba32
  csi:
    controllerExpandSecretRef:
      name: rook-csi-cephfs-provisioner
      namespace: rook-ceph
    driver: rook-ceph.cephfs.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      fsName: ceph-filesystem
      pool: ceph-filesystem-data0
      storage.kubernetes.io/csiProvisionerIdentity: 1684756060063-8081-rook-ceph.cephfs.csi.ceph.com
      subvolumeName: csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4
      subvolumePath: /volumes/csi/csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4/6060d98a-8827-4345-aa8b-4f806b6c7066
    volumeHandle: 0001-0009-rook-ceph-0000000000000001-837a1886-7c94-49d8-bd3d-a142a46a15f4
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ceph-filesystem

Here's a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
  name: bamboo-docker-dir
  namespace: confluence
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 500Gi
  storageClassName: ceph-filesystem
  volumeName: pvc-b85877da-9356-4da7-85dc-cba2ec5b4544

Not seeing any mount options to remove, so I am not sure what to make of it.
MountOptions are only in my storage class. https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes. My CephFS PVC's don't have a mountoption field, as you see above.......

With that in mind, I go back to the original point, which was that all I need to do is remove and recreate the storage class and remove the mountoption in the helm chart (I use discard)? If that's true, can I add back the discard option after the upgrade? I kind of need discard on.......

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 30, 2023

@voarsh2 see the breaking change here for cephfs PVC https://github.com/rook/rook/blob/release-1.12/Documentation/Upgrade/rook-upgrade.md#breaking-changes-in-v112

I haven't gotten around to looking at this in more detail. But, I have hundreds of PVC's - a quick skim thru the link, all the PVC's need to have the mount option removed? Or can I just remove the mount option in the cephfs section of the rook-ceph-cluster helm chart install? I am assuming that all the pods and related resources are updated via helm, so I just need to tweak the PV's mountoptions?

TL;DR: I need to edit all CephFS PV's? https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes

Yes you need to remove it from the already created PV's and you need to recreate the storageclass as well/or removing from helm value and doing helm upgrade can only update it in storageclass not in the existing PV's

Here's a sample CephFS PV I have:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-cephfs-provisioner
    volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
  name: pvc-0b5e0887-b565-4db2-bf34-a97e60a7ba32
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: app-php-conf-cephfs
    namespace: default
    resourceVersion: "99198012"
    uid: 0b5e0887-b565-4db2-bf34-a97e60a7ba32
  csi:
    controllerExpandSecretRef:
      name: rook-csi-cephfs-provisioner
      namespace: rook-ceph
    driver: rook-ceph.cephfs.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      fsName: ceph-filesystem
      pool: ceph-filesystem-data0
      storage.kubernetes.io/csiProvisionerIdentity: 1684756060063-8081-rook-ceph.cephfs.csi.ceph.com
      subvolumeName: csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4
      subvolumePath: /volumes/csi/csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4/6060d98a-8827-4345-aa8b-4f806b6c7066
    volumeHandle: 0001-0009-rook-ceph-0000000000000001-837a1886-7c94-49d8-bd3d-a142a46a15f4
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ceph-filesystem

Here's a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
  name: bamboo-docker-dir
  namespace: confluence
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 500Gi
  storageClassName: ceph-filesystem
  volumeName: pvc-b85877da-9356-4da7-85dc-cba2ec5b4544

Not seeing any mount options to remove, so I am not sure what to make of it. MountOptions are only in my storage class. https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes. My CephFS PVC's don't have a mountoption field, as you see above.......

With that in mind, I go back to the original point, which was that all I need to do is remove and recreate the storage class and remove the mountoption in the helm chart (I use discard)? If that's true, can I add back the discard option after the upgrade? I kind of need discard on.......

discard is not required for CephFS storageclass they are required for the RBD Storageclasses.

@voarsh2
Copy link
Author

voarsh2 commented Nov 7, 2023

@voarsh2 see the breaking change here for cephfs PVC https://github.com/rook/rook/blob/release-1.12/Documentation/Upgrade/rook-upgrade.md#breaking-changes-in-v112

I haven't gotten around to looking at this in more detail. But, I have hundreds of PVC's - a quick skim thru the link, all the PVC's need to have the mount option removed? Or can I just remove the mount option in the cephfs section of the rook-ceph-cluster helm chart install? I am assuming that all the pods and related resources are updated via helm, so I just need to tweak the PV's mountoptions?
TL;DR: I need to edit all CephFS PV's? https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes

Yes you need to remove it from the already created PV's and you need to recreate the storageclass as well/or removing from helm value and doing helm upgrade can only update it in storageclass not in the existing PV's

Here's a sample CephFS PV I have:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-cephfs-provisioner
    volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
  name: pvc-0b5e0887-b565-4db2-bf34-a97e60a7ba32
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: app-php-conf-cephfs
    namespace: default
    resourceVersion: "99198012"
    uid: 0b5e0887-b565-4db2-bf34-a97e60a7ba32
  csi:
    controllerExpandSecretRef:
      name: rook-csi-cephfs-provisioner
      namespace: rook-ceph
    driver: rook-ceph.cephfs.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      fsName: ceph-filesystem
      pool: ceph-filesystem-data0
      storage.kubernetes.io/csiProvisionerIdentity: 1684756060063-8081-rook-ceph.cephfs.csi.ceph.com
      subvolumeName: csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4
      subvolumePath: /volumes/csi/csi-vol-837a1886-7c94-49d8-bd3d-a142a46a15f4/6060d98a-8827-4345-aa8b-4f806b6c7066
    volumeHandle: 0001-0009-rook-ceph-0000000000000001-837a1886-7c94-49d8-bd3d-a142a46a15f4
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ceph-filesystem

Here's a PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
  name: bamboo-docker-dir
  namespace: confluence
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 500Gi
  storageClassName: ceph-filesystem
  volumeName: pvc-b85877da-9356-4da7-85dc-cba2ec5b4544

There is no mount option on the CephFS PVC or PV.............. ?

So... nothing to do?

Not seeing any mount options to remove, so I am not sure what to make of it. MountOptions are only in my storage class. https://github.com/ceph/ceph-csi/blob/v3.9.0/docs/ceph-csi-upgrade.md/#24-modifying-mountoptions-in-storageclass-and-persistentvolumes. My CephFS PVC's don't have a mountoption field, as you see above.......
With that in mind, I go back to the original point, which was that all I need to do is remove and recreate the storage class and remove the mountoption in the helm chart (I use discard)? If that's true, can I add back the discard option after the upgrade? I kind of need discard on.......

discard is not required for CephFS storageclass they are required for the RBD Storageclasses.

Noted. I've removed it from the helm chart (rook-ceph-cluster).

@voarsh2
Copy link
Author

voarsh2 commented Nov 16, 2023

I'm going to mark this issue as resolved.
TL;DR:
Make sure CephFS storageclass doesn't have mountoptions (they are not needed, unlike RBD), and to remove mountoptions from Persistent Volumes. I hadn't had time to script getting out the YAML's, removing it "programmatically" and kubectl applying the updates, so I had to update quite a few by hand... painful......
And... update the Helm Charts (or CRD's with Kubectl apply, if not via Helm)

@voarsh2 voarsh2 closed this as completed Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants