You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using a test cluster of 3 nodes with 4 OSDs configured per node, I am able to create at least 4 working CephFilesystem resources, which all show in the dashboard as working and the data/metadata pools are created successfully. All FS are created using the same basic template of:
However, when I attempt to create a CephNFS resource per CephFilessytem, only the first two created work. The 3rd+ resource always fails and no ganesha pods are created for that CephNFS instance. If reduce the number of CephNFS resources to 0 or 1, then I am able to create new [working] CephNFS instances regardless of the name of the instance. The template I'm using for CephNFS resources is:
Attempting to debug a failed instance, in this case project, the operator logs show the same messages over and over again about trying to scale down the deployment:
2021-07-01 16:33:12.181582 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:33:13.404965 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:33:13.404984 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:33:13.412457 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:44:08.777902 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:44:10.085936 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:44:10.085952 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:44:10.092222 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 17:00:50.102039 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 17:00:51.297503 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 17:00:51.297521 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 17:00:51.302562 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
What jumps out at me is that the deployments for the nfs pods aren't even being created. I am guessing that this is something to do with the operator and the CephNFS CRD rather than a ceph limitation.
How to reproduce it (minimal and precise):
File(s) to submit:
Cluster CR (custom resource), typically called cluster.yaml, if necessary
---
apiVersion: ceph.rook.io/v1kind: CephClustermetadata:
name: rook-cephnamespace: rook-cephspec:
cephVersion:
image: ceph/ceph:v16.2.4allowUnsupported: falsedataDirHostPath: /var/lib/rookskipUpgradeChecks: falsecontinueUpgradeAfterChecksEvenIfNotHealthy: falsemon:
count: 3allowMultiplePerNode: falsedashboard:
enabled: truessl: true # tls between ingress and svc... doesn't seem to work unless enabledcrashCollector:
disable: falsestorage:
useAllNodes: falseuseAllDevices: falseconfig:
osdsPerDevice: "4"nodes:
- name: pillan01devices:
- name: /dev/disk/by-id/nvme-Samsung_SSD_983_DCT_1.92TB_S48BNG0MB01685F
- name: pillan02devices:
- name: /dev/disk/by-id/nvme-Samsung_SSD_983_DCT_1.92TB_S48BNG0MB01695D
- name: pillan03devices:
- name: /dev/disk/by-id/nvme-Samsung_SSD_983_DCT_1.92TB_S48BNG0MB01690Hplacement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: roleoperator: Invalues:
- storage-nodetolerations:
- key: roleoperator: Equalvalue: storage-nodeeffect: NoScheduledisruptionManagement:
managePodBudgets: truecleanupPolicy:
# unset only when all data should be destroyed#confirmation: "yes-really-destroy-data"method: quickdataSource: zeroiteration: 1monitoring:
enabled: truerulesNamespace: rook-ceph
Operator's logs, if necessary
2021-07-01 15:10:30.382173 I | ceph-cluster-controller: done reconciling ceph cluster in namespace "rook-ceph"
2021-07-01 15:10:30.401054 I | op-k8sutil: Reporting Event rook-ceph:rook-ceph Normal:ReconcileSucceeded:cluster has been configured successfully
2021-07-01 16:19:23.918083 I | ceph-spec: adding finalizer "cephfilesystem.ceph.rook.io" on "scratch"
2021-07-01 16:19:23.927815 I | ceph-spec: adding finalizer "cephnfs.ceph.rook.io" on "scratch"
2021-07-01 16:19:23.950401 E | ceph-nfs-controller: failed to set nfs "scratch" status to "Created". failed to update object "scratch" status: Operation cannot be fulfilled on cephnfses.ceph.rook.io "scratch": the object has been modified; please apply your changes to the latest version and try again
2021-07-01 16:19:23.952567 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:23.954336 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:27.577730 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:27.674091 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:30.082778 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:30.100016 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:32.294382 I | ceph-file-controller: creating filesystem "scratch"
2021-07-01 16:19:32.490629 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:32.516123 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:34.776982 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:34.820994 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:37.822670 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:37.909326 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:38.177961 I | cephclient: setting pool property "compression_mode" to "none" on pool "scratch-metadata"
2021-07-01 16:19:39.590722 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:39.756775 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:40.282484 I | cephclient: creating replicated pool scratch-metadata succeeded
2021-07-01 16:19:41.482554 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:41.807457 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:43.512449 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "scratch" arguments: pool "scratch-data0" not found: failed to get pool scratch-data0 details. Error ENOENT: unrecognized pool 'scratch-data0'
. : Error ENOENT: unrecognized pool 'scratch-data0'
.
2021-07-01 16:19:44.157856 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:19:45.901887 I | ceph-nfs-controller: updating ceph nfs "scratch"
2021-07-01 16:19:46.005777 I | cephclient: getting or creating ceph auth key "client.nfs-ganesha.scratch.a"
2021-07-01 16:19:46.485502 I | cephclient: setting pool property "compression_mode" to "none" on pool "scratch-data0"
2021-07-01 16:19:46.905553 I | ceph-nfs-controller: ceph nfs deployment "rook-ceph-nfs-scratch-a" started
2021-07-01 16:19:46.920610 I | ceph-nfs-controller: ceph nfs service running at 10.43.3.201:2049
2021-07-01 16:19:46.920627 I | ceph-nfs-controller: adding ganesha "a" to grace db
2021-07-01 16:19:47.192146 I | cephclient: getting or creating ceph auth key "client.nfs-ganesha.scratch.b"
2021-07-01 16:19:48.094911 I | ceph-nfs-controller: ceph nfs deployment "rook-ceph-nfs-scratch-b" started
2021-07-01 16:19:48.111108 I | ceph-nfs-controller: ceph nfs service running at 10.43.236.90:2049
2021-07-01 16:19:48.111123 I | ceph-nfs-controller: adding ganesha "b" to grace db
2021-07-01 16:19:48.192148 I | cephclient: getting or creating ceph auth key "client.nfs-ganesha.scratch.c"
2021-07-01 16:19:48.585099 I | cephclient: creating replicated pool scratch-data0 succeeded
2021-07-01 16:19:48.585126 I | cephclient: creating filesystem "scratch" with metadata pool "scratch-metadata" and data pools [scratch-data0]
2021-07-01 16:19:49.400413 I | ceph-nfs-controller: ceph nfs deployment "rook-ceph-nfs-scratch-c" started
2021-07-01 16:19:49.419312 I | ceph-nfs-controller: ceph nfs service running at 10.43.146.137:2049
2021-07-01 16:19:49.419332 I | ceph-nfs-controller: adding ganesha "c" to grace db
2021-07-01 16:19:51.645272 I | ceph-file-controller: created filesystem "scratch" on 1 data pool(s) and metadata pool "scratch-metadata"
2021-07-01 16:19:52.278289 I | cephclient: setting allow_standby_replay for filesystem "scratch"
2021-07-01 16:19:54.722207 I | ceph-file-controller: start running mdses for filesystem "scratch"
2021-07-01 16:19:55.375638 I | cephclient: getting or creating ceph auth key "mds.scratch-a"
2021-07-01 16:19:56.065960 I | op-mds: setting mds config flags
2021-07-01 16:19:56.065983 I | op-config: setting "mds.scratch-a"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:56.624498 I | op-config: successfully set "mds.scratch-a"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:56.635807 I | cephclient: getting or creating ceph auth key "mds.scratch-b"
2021-07-01 16:19:56.826201 E | ceph-crashcollector-controller: node reconcile failed on op "unchanged": Operation cannot be fulfilled on deployments.apps "rook-ceph-crashcollector-pillan02": the object has been modified; please apply your changes to the latest version and try again
2021-07-01 16:19:57.355305 I | op-mds: setting mds config flags
2021-07-01 16:19:57.355326 I | op-config: setting "mds.scratch-b"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:57.925724 I | op-config: successfully set "mds.scratch-b"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:57.936219 I | cephclient: getting or creating ceph auth key "mds.scratch-c"
2021-07-01 16:19:58.648489 I | op-mds: setting mds config flags
2021-07-01 16:19:58.648506 I | op-config: setting "mds.scratch-c"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:59.180176 I | op-config: successfully set "mds.scratch-c"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:19:59.190779 I | cephclient: getting or creating ceph auth key "mds.scratch-d"
2021-07-01 16:19:59.890816 I | op-mds: setting mds config flags
2021-07-01 16:19:59.890835 I | op-config: setting "mds.scratch-d"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:20:00.441548 I | op-config: successfully set "mds.scratch-d"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:20:00.459033 I | cephclient: getting or creating ceph auth key "mds.scratch-e"
2021-07-01 16:20:01.149203 I | op-mds: setting mds config flags
2021-07-01 16:20:01.149225 I | op-config: setting "mds.scratch-e"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:20:01.728626 I | op-config: successfully set "mds.scratch-e"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:20:01.739705 I | cephclient: getting or creating ceph auth key "mds.scratch-f"
2021-07-01 16:20:02.443900 I | op-mds: setting mds config flags
2021-07-01 16:20:02.443917 I | op-config: setting "mds.scratch-f"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:20:02.948964 I | op-config: successfully set "mds.scratch-f"="mds_join_fs"="scratch" option to the mon configuration database
2021-07-01 16:21:42.794997 I | ceph-spec: adding finalizer "cephfilesystem.ceph.rook.io" on "project"
2021-07-01 16:21:42.805085 I | ceph-spec: adding finalizer "cephnfs.ceph.rook.io" on "project"
2021-07-01 16:21:42.816769 E | ceph-file-controller: failed to set filesystem "project" status to "Created". failed to update object "project" status: Operation cannot be fulfilled on cephfilesystems.ceph.rook.io "project": the object has been modified; please apply your changes to the latest version and try again
2021-07-01 16:21:42.824713 E | ceph-nfs-controller: failed to set nfs "project" status to "Created". failed to update object "project" status: Operation cannot be fulfilled on cephnfses.ceph.rook.io "project": the object has been modified; please apply your changes to the latest version and try again
2021-07-01 16:21:42.827122 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:42.829200 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:46.474410 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:46.495833 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:48.987998 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:49.004540 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:51.177055 I | ceph-file-controller: creating filesystem "project"
2021-07-01 16:21:51.477715 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:51.503041 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:53.081321 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:53.127035 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:56.387574 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:56.472899 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:57.186868 I | cephclient: setting pool property "compression_mode" to "none" on pool "project-metadata"
2021-07-01 16:21:58.194773 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:21:58.359896 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:21:59.282441 I | cephclient: creating replicated pool project-metadata succeeded
2021-07-01 16:21:59.782500 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:22:00.108563 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:01.876485 E | ceph-nfs-controller: failed to reconcile invalid ceph nfs "project" arguments: pool "project-data0" not found: failed to get pool project-data0 details. Error ENOENT: unrecognized pool 'project-data0'
. : Error ENOENT: unrecognized pool 'project-data0'
.
2021-07-01 16:22:02.521964 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:03.486347 I | cephclient: setting pool property "compression_mode" to "none" on pool "project-data0"
2021-07-01 16:22:04.791651 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:04.791667 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:04.805309 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:22:05.442609 I | cephclient: creating replicated pool project-data0 succeeded
2021-07-01 16:22:05.442637 I | cephclient: creating filesystem "project" with metadata pool "project-metadata" and data pools [project-data0]
2021-07-01 16:22:06.090141 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:07.786211 I | ceph-file-controller: created filesystem "project" on 1 data pool(s) and metadata pool "project-metadata"
2021-07-01 16:22:08.000733 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:08.000750 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:08.006479 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:22:08.576699 I | cephclient: setting allow_standby_replay for filesystem "project"
2021-07-01 16:22:10.573451 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:10.682985 I | ceph-file-controller: start running mdses for filesystem "project"
2021-07-01 16:22:11.991039 I | cephclient: getting or creating ceph auth key "mds.project-a"
2021-07-01 16:22:12.900225 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:12.900241 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:12.906539 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:22:13.199323 I | op-mds: setting mds config flags
2021-07-01 16:22:13.199345 I | op-config: setting "mds.project-a"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:13.734971 I | op-config: successfully set "mds.project-a"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:13.745552 I | cephclient: getting or creating ceph auth key "mds.project-b"
2021-07-01 16:22:14.445336 I | op-mds: setting mds config flags
2021-07-01 16:22:14.445353 I | op-config: setting "mds.project-b"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:14.986207 I | op-config: successfully set "mds.project-b"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:14.996729 I | cephclient: getting or creating ceph auth key "mds.project-c"
2021-07-01 16:22:15.736658 I | op-mds: setting mds config flags
2021-07-01 16:22:15.736682 I | op-config: setting "mds.project-c"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:16.244163 I | op-config: successfully set "mds.project-c"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:16.255065 I | cephclient: getting or creating ceph auth key "mds.project-d"
2021-07-01 16:22:16.944503 I | op-mds: setting mds config flags
2021-07-01 16:22:16.944526 I | op-config: setting "mds.project-d"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:17.489055 I | op-config: successfully set "mds.project-d"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:17.499472 I | cephclient: getting or creating ceph auth key "mds.project-e"
2021-07-01 16:22:18.030727 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:18.294463 I | op-mds: setting mds config flags
2021-07-01 16:22:18.294481 I | op-config: setting "mds.project-e"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:19.485315 I | op-config: successfully set "mds.project-e"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:19.497330 I | cephclient: getting or creating ceph auth key "mds.project-f"
2021-07-01 16:22:20.393645 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:20.393661 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:20.399315 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:22:20.687353 I | op-mds: setting mds config flags
2021-07-01 16:22:20.687372 I | op-config: setting "mds.project-f"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:21.232191 I | op-config: successfully set "mds.project-f"="mds_join_fs"="project" option to the mon configuration database
2021-07-01 16:22:30.646500 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:31.826078 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:31.826098 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:31.831111 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:22:52.316351 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:22:53.494495 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:22:53.494514 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:22:53.502906 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:23:34.468132 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:23:35.693922 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:23:35.693938 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:23:35.701007 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:24:57.627424 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:24:59.376441 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:24:59.376456 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:24:59.382778 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:27:43.227217 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:27:44.488701 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:27:44.488715 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:27:44.494993 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:33:12.181582 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:33:13.404965 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:33:13.404984 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:33:13.412457 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 16:44:08.777902 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 16:44:10.085936 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 16:44:10.085952 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 16:44:10.092222 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
2021-07-01 17:00:50.102039 I | op-mon: parsing mon endpoints: b=10.43.115.151:6789,c=10.43.174.94:6789,a=10.43.141.110:6789
2021-07-01 17:00:51.297503 I | ceph-nfs-controller: scaling down ceph nfs "project" from 6 to 3
2021-07-01 17:00:51.297521 I | ceph-nfs-controller: removing deployment "rook-ceph-nfs-project-f"
2021-07-01 17:00:51.302562 E | ceph-nfs-controller: failed to reconcile failed to create ceph nfs deployments: failed to scale down ceph nfs "project": failed to delete ceph nfs deployment: deployments.apps "rook-ceph-nfs-project-f" not found
Crashing pod(s) logs, if necessary
N/A
Environment:
OS (e.g. from /etc/os-release):
-bash-4.2$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
Kernel (e.g. uname -a): Linux pillan01.xxx 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Using a test cluster of 3 nodes with 4 OSDs configured per node, I am able to create at least 4 working
CephFilesystem
resources, which all show in the dashboard as working and the data/metadata pools are created successfully. All FS are created using the same basic template of:However, when I attempt to create a
CephNFS
resource perCephFilessytem
, only the first two created work. The 3rd+ resource always fails and no ganesha pods are created for thatCephNFS
instance. If reduce the number ofCephNFS
resources to 0 or 1, then I am able to create new [working]CephNFS
instances regardless of the name of the instance. The template I'm using forCephNFS
resources is:Attempting to debug a failed instance, in this case
project
, the operator logs show the same messages over and over again about trying to scale down the deployment:What jumps out at me is that the
deployment
s for the nfs pods aren't even being created. I am guessing that this is something to do with the operator and theCephNFS
CRD rather than a ceph limitation.How to reproduce it (minimal and precise):
File(s) to submit:
cluster.yaml
, if necessaryN/A
Environment:
uname -a
):Linux pillan01.xxx 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# dmidecode | grep "Product Name" Product Name: AS -1114S-WN10RT Product Name: H12SSW-NTR
rook version
inside of a Rook Pod):-bash-4.2$ kubectl exec -ti -n rook-ceph rook-ceph-operator-68f5994dd4-td75v -- rook version rook: v1.6.5 go: go1.16.3
ceph -v
):-bash-4.2$ kubectl exec -ti -n rook-ceph rook-ceph-operator-68f5994dd4-td75v -- ceph -v ceph version 16.2.2 (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable)
kubectl version
):rke 1.2.8
ceph health
in the Rook Ceph toolbox):-bash-4.2$ kubectl exec -ti -n rook-ceph rook-ceph-tools-7f4fcd8448-kknp7 -- ceph health HEALTH_OK
The text was updated successfully, but these errors were encountered: