Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When setting dashboard.ssl: false the value is left out of the CRD and the MGR is setup with ssl as true for the dashboard #13577

Closed
ADustyOldMuffin opened this issue Jan 16, 2024 · 21 comments · Fixed by #13604
Assignees

Comments

@ADustyOldMuffin
Copy link

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
When I set the dashboard.ssl: false value but enable the dashboard in the cluster CRD the MGR is still looking for SSL and when inspecting ceph config the ssl value for the dashboard is set to true.

Expected behavior:
SSL to not be configured and set to false in the config.

How to reproduce it (minimal and precise):

Create a new CephCluster and set the dashboard.ssl: false field in the CRD. Load the cluster, and check logs of the MGR and see it complaining, you can also run ceph config get mgr mgr/dashboard/ssl and view that it resolves to true.

If you output the yaml for the ceph-cluster the field dashboard.ssl will also be missing, but if you edit it and add it back then it will stay.

I am also creating the CRD via a custom helm chart.

File(s) to submit:

Cluster CRD can be any values just with the field above set.

Logs to submit:
None

@travisn
Copy link
Member

travisn commented Jan 16, 2024

Please share your cluster CR (cluster.yaml) and the rook operator log. If you change this setting to false, it's not expected that the dashboard will be configured for ssl.

@ADustyOldMuffin
Copy link
Author

cluster.yaml in file

---
# Source: cw-ceph-cluster/templates/ceph-cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: dev-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v17.2.7-20231114
    allowUnsupported: false
  dataDirHostPath: /var/lib/rook
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  waitTimeoutForHealthyOSDInMinutes: 10
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
      - name: nfs
        enabled: false
      - name: pg_autoscaler
        enabled: false
  dashboard:
    enabled: true
    ssl: false
  monitoring:
    enabled: false
    metricsDisabled: false
  network:
    connections:
      encryption:
        enabled: false
      compression:
        enabled: false
      requireMsgr2: true
    ipFamily: IPv4
    dualStack: true
  crashCollector:
    disable: false
    daysToRetain: 10
  logCollector:
    enabled: false
    periodicity: daily
    maxLogSize: 500M
  cleanupPolicy:
    confirmation: ""
    sanitizeDisks:
      method: quick
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false
  placement:
  annotations:
  labels:
  resources:
  removeOSDsIfOutAndSafeToRemove: false
  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
  storage:
    useAllNodes: true
    useAllDevices: true
    config:
      osdsPerDevice: "1"
      encryptedDevice: "true"
    onlyApplyOSDPlacement: false
    flappingRestartIntervalHours: 24
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mgr:
        disabled: false
      mon:
        disabled: false
      osd:
        disabled: false
    startupProbe:
      mgr:
        disabled: false
      mon:
        disabled: false
      osd:
        disabled: false

cluster.yaml in K8s

❯ k get cephcluster dev-ceph -o yaml            
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: dev-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v17.2.7-20231114
  cleanupPolicy:
    sanitizeDisks:
      dataSource: zero
      iteration: 1
      method: quick
  crashCollector:
    daysToRetain: 10
  dashboard:
    enabled: true
  dataDirHostPath: /var/lib/rook
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
  external: {}
  healthCheck:
    daemonHealth:
      mon:
        interval: 45s
      osd:
        interval: 1m0s
      status:
        interval: 1m0s
    livenessProbe:
      mgr: {}
      mon: {}
      osd: {}
    startupProbe:
      mgr: {}
      mon: {}
      osd: {}
  logCollector:
    maxLogSize: 500M
    periodicity: daily
  mgr:
    count: 2
    modules:
    - name: nfs
    - name: pg_autoscaler
  mon:
    count: 3
  monitoring: {}
  network:
    connections:
      compression: {}
      encryption: {}
      requireMsgr2: true
    dualStack: true
    ipFamily: IPv4
    multiClusterService: {}
  priorityClassNames:
    mgr: system-cluster-critical
    mon: system-node-critical
    osd: system-node-critical
  security:
    keyRotation:
      enabled: false
    kms: {}
  storage:
    config:
      encryptedDevice: "true"
      osdsPerDevice: "1"
    flappingRestartIntervalHours: 24
    store: {}
    useAllDevices: true
    useAllNodes: true
  waitTimeoutForHealthyOSDInMinutes: 10

@ADustyOldMuffin
Copy link
Author

logs from MGR

debug 2024-01-16T23:32:55.068+0000 7fa41f943700  0 [volumes INFO mgr_util] scanning for idle connections..
debug 2024-01-16T23:32:55.068+0000 7fa41f943700  0 [volumes INFO mgr_util] cleaning up connections: []
debug 2024-01-16T23:32:55.076+0000 7fa417132700  0 [volumes INFO mgr_util] scanning for idle connections..
debug 2024-01-16T23:32:55.076+0000 7fa417132700  0 [volumes INFO mgr_util] cleaning up connections: []
debug 2024-01-16T23:32:55.076+0000 7fa41412c700  0 [volumes INFO mgr_util] scanning for idle connections..
debug 2024-01-16T23:32:55.076+0000 7fa41412c700  0 [volumes INFO mgr_util] cleaning up connections: []
debug 2024-01-16T23:32:57.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v129: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:32:59.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v130: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:00.060+0000 7fa4321e8700  0 [dashboard INFO root] server: ssl=yes host=:: port=8443
debug 2024-01-16T23:33:00.060+0000 7fa4321e8700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured
debug 2024-01-16T23:33:01.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v131: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:03.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v132: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:05.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v133: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:05.060+0000 7fa4321e8700  0 [dashboard INFO root] server: ssl=yes host=:: port=8443
debug 2024-01-16T23:33:05.060+0000 7fa4321e8700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured
debug 2024-01-16T23:33:07.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v134: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:09.036+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v135: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail
debug 2024-01-16T23:33:10.060+0000 7fa4321e8700  0 [dashboard INFO root] server: ssl=yes host=:: port=8443
debug 2024-01-16T23:33:10.064+0000 7fa4321e8700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured
debug 2024-01-16T23:33:11.037+0000 7fa4389f5700  0 log_channel(cluster) log [DBG] : pgmap v136: 0 pgs: ; 0 B data, 0 B used, 0 B / 0 B avail

@ADustyOldMuffin
Copy link
Author

I tried a fresh install with Rook at version v1.12.10 and these are the logs from the operator

2024-01-16 23:17:12.298881 I | rookcmd: starting Rook v1.13.0-alpha.0.164.g591ddf2c3 with arguments '/usr/local/bin/rook ceph operator'
2024-01-16 23:17:12.299115 I | rookcmd: flag values: --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-level=INFO
2024-01-16 23:17:12.299122 I | cephcmd: starting Rook-Ceph operator
2024-01-16 23:17:12.687931 I | cephcmd: base ceph version inside the rook operator image is "ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)"
2024-01-16 23:17:12.696251 I | op-k8sutil: ROOK_CURRENT_NAMESPACE_ONLY="false" (env var)
2024-01-16 23:17:12.696274 I | operator: watching all namespaces for Ceph CRs
2024-01-16 23:17:12.696332 I | operator: setting up schemes
2024-01-16 23:17:12.699408 I | operator: setting up the controller-runtime manager
2024-01-16 23:17:12.699928 I | ceph-cluster-controller: successfully started
2024-01-16 23:17:12.703158 I | op-k8sutil: ROOK_DISABLE_DEVICE_HOTPLUG="false" (env var)
2024-01-16 23:17:12.703176 I | ceph-cluster-controller: enabling hotplug orchestration
2024-01-16 23:17:12.703210 I | ceph-nodedaemon-controller: successfully started
2024-01-16 23:17:12.703232 I | ceph-block-pool-controller: successfully started
2024-01-16 23:17:12.703269 I | ceph-object-store-user-controller: successfully started
2024-01-16 23:17:12.703324 I | ceph-object-realm-controller: successfully started
2024-01-16 23:17:12.703337 I | ceph-object-zonegroup-controller: successfully started
2024-01-16 23:17:12.703347 I | ceph-object-zone-controller: successfully started
2024-01-16 23:17:12.703474 I | ceph-object-controller: successfully started
2024-01-16 23:17:12.703508 I | ceph-file-controller: successfully started
2024-01-16 23:17:12.703539 I | ceph-nfs-controller: successfully started
2024-01-16 23:17:12.703563 I | ceph-rbd-mirror-controller: successfully started
2024-01-16 23:17:12.703585 I | ceph-client-controller: successfully started
2024-01-16 23:17:12.703600 I | ceph-filesystem-mirror-controller: successfully started
2024-01-16 23:17:12.703626 I | operator: rook-ceph-operator-config-controller successfully started
2024-01-16 23:17:12.703641 I | ceph-csi: rook-ceph-operator-csi-controller successfully started
2024-01-16 23:17:12.703719 I | op-bucket-prov: rook-ceph-operator-bucket-controller successfully started
2024-01-16 23:17:12.703743 I | ceph-bucket-topic: successfully started
2024-01-16 23:17:12.703754 I | ceph-bucket-notification: successfully started
2024-01-16 23:17:12.703765 I | ceph-bucket-notification: successfully started
2024-01-16 23:17:12.703780 I | ceph-fs-subvolumegroup-controller: successfully started
2024-01-16 23:17:12.703797 I | blockpool-rados-namespace-controller: successfully started
2024-01-16 23:17:12.703811 I | ceph-cosi-controller: successfully started
2024-01-16 23:17:12.704824 I | operator: starting the controller-runtime manager
2024-01-16 23:17:12.794276 I | op-k8sutil: ROOK_WATCH_FOR_NODE_FAILURE="true" (default)
2024-01-16 23:17:12.808481 I | op-k8sutil: ROOK_CEPH_COMMANDS_TIMEOUT_SECONDS="15" (configmap)
2024-01-16 23:17:12.808500 I | op-k8sutil: ROOK_LOG_LEVEL="INFO" (configmap)
2024-01-16 23:17:12.808511 I | op-k8sutil: ROOK_ENABLE_DISCOVERY_DAEMON="true" (configmap)
2024-01-16 23:17:12.808516 I | op-k8sutil: ROOK_DISCOVER_DEVICES_INTERVAL="60m" (env var)
2024-01-16 23:17:12.808521 I | op-k8sutil: DISCOVER_DAEMON_RESOURCES="" (default)
2024-01-16 23:17:12.808537 I | op-k8sutil: DISCOVER_PRIORITY_CLASS_NAME="" (default)
2024-01-16 23:17:12.812245 I | op-k8sutil: DISCOVER_TOLERATION="" (default)
2024-01-16 23:17:12.812269 I | op-k8sutil: DISCOVER_TOLERATIONS="" (default)
2024-01-16 23:17:12.812281 I | op-discover: tolerations: []
2024-01-16 23:17:12.812289 I | op-k8sutil: DISCOVER_AGENT_NODE_AFFINITY="" (default)
2024-01-16 23:17:12.812300 I | op-k8sutil: DISCOVER_AGENT_POD_LABELS="" (default)
2024-01-16 23:17:12.897445 I | ceph-csi: CSI Ceph RBD driver disabled
2024-01-16 23:17:12.897471 I | op-k8sutil: removing daemonset csi-rbdplugin if it exists
2024-01-16 23:17:12.898354 I | op-discover: rook-discover daemonset started
2024-01-16 23:17:12.898377 I | op-k8sutil: ROOK_CEPH_ALLOW_LOOP_DEVICES="false" (configmap)
2024-01-16 23:17:12.898382 I | operator: rook-ceph-operator-config-controller done reconciling
2024-01-16 23:17:12.900343 I | op-k8sutil: removing deployment csi-rbdplugin-provisioner if it exists
2024-01-16 23:17:12.910961 I | ceph-csi: successfully removed CSI Ceph RBD driver
2024-01-16 23:17:12.910977 I | ceph-csi: CSI CephFS driver disabled
2024-01-16 23:17:12.910982 I | op-k8sutil: removing daemonset csi-cephfsplugin if it exists
2024-01-16 23:17:12.917961 I | op-k8sutil: removing deployment csi-cephfsplugin-provisioner if it exists
2024-01-16 23:17:12.941082 I | ceph-csi: successfully removed CSI CephFS driver
2024-01-16 23:17:12.941097 I | ceph-csi: CSI NFS driver disabled
2024-01-16 23:17:12.941102 I | op-k8sutil: removing daemonset csi-nfsplugin if it exists
2024-01-16 23:17:12.949337 I | op-k8sutil: removing deployment csi-nfsplugin-provisioner if it exists
2024-01-16 23:17:13.107046 I | ceph-csi: successfully removed CSI NFS driver
2024-01-16 23:25:58.859117 I | ceph-spec: adding finalizer "cephcluster.ceph.rook.io" on "dev-ceph"
2024-01-16 23:25:58.860317 I | op-k8sutil: CSI_ENABLE_HOST_NETWORK="true" (configmap)
2024-01-16 23:25:58.863977 I | clusterdisruption-controller: deleted all legacy node drain canary pods
2024-01-16 23:25:58.869122 I | ceph-cluster-controller: reconciling ceph cluster in namespace "rook-ceph"
2024-01-16 23:25:58.871042 I | ceph-csi: successfully created csi config map "rook-ceph-csi-config"
2024-01-16 23:25:58.872391 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster.
2024-01-16 23:25:58.882296 I | op-k8sutil: ROOK_CSI_ENABLE_RBD="true" (configmap)
2024-01-16 23:25:58.882314 I | op-k8sutil: ROOK_CSI_ENABLE_CEPHFS="true" (configmap)
2024-01-16 23:25:58.882320 I | op-k8sutil: ROOK_CSI_ENABLE_NFS="false" (configmap)
2024-01-16 23:25:58.882325 I | op-k8sutil: ROOK_CSI_ALLOW_UNSUPPORTED_VERSION="false" (default)
2024-01-16 23:25:58.882331 I | op-k8sutil: CSI_ENABLE_READ_AFFINITY="false" (configmap)
2024-01-16 23:25:58.882351 I | op-k8sutil: CSI_CRUSH_LOCATION_LABELS="kubernetes.io/hostname,topology.kubernetes.io/region,topology.kubernetes.io/zone,topology.rook.io/chassis,topology.rook.io/rack,topology.rook.io/row,topology.rook.io/pdu,topology.rook.io/pod,topology.rook.io/room,topology.rook.io/datacenter" (default)
2024-01-16 23:25:58.882360 I | op-k8sutil: CSI_FORCE_CEPHFS_KERNEL_CLIENT="true" (configmap)
2024-01-16 23:25:58.882365 I | op-k8sutil: CSI_GRPC_TIMEOUT_SECONDS="150" (configmap)
2024-01-16 23:25:58.882372 I | op-k8sutil: CSI_CEPHFS_LIVENESS_METRICS_PORT="9081" (default)
2024-01-16 23:25:58.882380 I | op-k8sutil: CSIADDONS_PORT="9070" (default)
2024-01-16 23:25:58.882391 I | op-k8sutil: CSI_RBD_LIVENESS_METRICS_PORT="9080" (default)
2024-01-16 23:25:58.882396 I | op-k8sutil: CSI_ENABLE_LIVENESS="false" (default)
2024-01-16 23:25:58.882400 I | op-k8sutil: CSI_PLUGIN_PRIORITY_CLASSNAME="system-node-critical" (configmap)
2024-01-16 23:25:58.882407 I | op-k8sutil: CSI_PROVISIONER_PRIORITY_CLASSNAME="system-cluster-critical" (configmap)
2024-01-16 23:25:58.882412 I | op-k8sutil: CSI_ENABLE_OMAP_GENERATOR="false" (configmap)
2024-01-16 23:25:58.882416 I | op-k8sutil: CSI_ENABLE_RBD_SNAPSHOTTER="true" (configmap)
2024-01-16 23:25:58.882420 I | op-k8sutil: CSI_ENABLE_CEPHFS_SNAPSHOTTER="true" (configmap)
2024-01-16 23:25:58.882424 I | op-k8sutil: CSI_ENABLE_NFS_SNAPSHOTTER="true" (configmap)
2024-01-16 23:25:58.884814 I | op-k8sutil: CSI_ENABLE_CSIADDONS="false" (configmap)
2024-01-16 23:25:58.884833 I | op-k8sutil: CSI_ENABLE_TOPOLOGY="false" (configmap)
2024-01-16 23:25:58.884838 I | op-k8sutil: CSI_ENABLE_ENCRYPTION="false" (configmap)
2024-01-16 23:25:58.884842 I | op-k8sutil: CSI_ENABLE_METADATA="false" (configmap)
2024-01-16 23:25:58.884847 I | op-k8sutil: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY="RollingUpdate" (default)
2024-01-16 23:25:58.884852 I | op-k8sutil: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY_MAX_UNAVAILABLE="1" (default)
2024-01-16 23:25:58.884857 I | op-k8sutil: CSI_NFS_PLUGIN_UPDATE_STRATEGY="RollingUpdate" (default)
2024-01-16 23:25:58.884862 I | op-k8sutil: CSI_RBD_PLUGIN_UPDATE_STRATEGY="RollingUpdate" (default)
2024-01-16 23:25:58.884867 I | op-k8sutil: CSI_RBD_PLUGIN_UPDATE_STRATEGY_MAX_UNAVAILABLE="1" (default)
2024-01-16 23:25:58.884871 I | op-k8sutil: CSI_PLUGIN_ENABLE_SELINUX_HOST_MOUNT="false" (configmap)
2024-01-16 23:25:58.884874 I | ceph-csi: Kubernetes version is 1.28
2024-01-16 23:25:58.884880 I | op-k8sutil: CSI_LOG_LEVEL="" (default)
2024-01-16 23:25:58.884884 I | op-k8sutil: CSI_SIDECAR_LOG_LEVEL="" (default)
2024-01-16 23:25:58.888980 I | ceph-spec: detecting the ceph image version for image quay.io/ceph/ceph:v17.2.7-20231114...
2024-01-16 23:25:58.890855 I | op-k8sutil: CSI_PROVISIONER_REPLICAS="2" (configmap)
2024-01-16 23:25:58.890881 I | op-k8sutil: ROOK_CSI_CEPH_IMAGE="quay.io/cephcsi/cephcsi:v3.10.1" (default)
2024-01-16 23:25:58.890888 I | op-k8sutil: ROOK_CSI_REGISTRAR_IMAGE="registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.1" (default)
2024-01-16 23:25:58.890899 I | op-k8sutil: ROOK_CSI_PROVISIONER_IMAGE="registry.k8s.io/sig-storage/csi-provisioner:v3.6.2" (default)
2024-01-16 23:25:58.890907 I | op-k8sutil: ROOK_CSI_ATTACHER_IMAGE="registry.k8s.io/sig-storage/csi-attacher:v4.4.2" (default)
2024-01-16 23:25:58.890913 I | op-k8sutil: ROOK_CSI_SNAPSHOTTER_IMAGE="registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2" (default)
2024-01-16 23:25:58.890919 I | op-k8sutil: ROOK_CSI_RESIZER_IMAGE="registry.k8s.io/sig-storage/csi-resizer:v1.9.2" (default)
2024-01-16 23:25:58.890925 I | op-k8sutil: ROOK_CSI_KUBELET_DIR_PATH="/var/lib/kubelet" (default)
2024-01-16 23:25:58.890932 I | op-k8sutil: ROOK_CSIADDONS_IMAGE="quay.io/csiaddons/k8s-sidecar:v0.7.0" (configmap)
2024-01-16 23:25:58.890956 I | op-k8sutil: CSI_TOPOLOGY_DOMAIN_LABELS="" (default)
2024-01-16 23:25:58.890960 I | op-k8sutil: ROOK_CSI_CEPHFS_POD_LABELS="" (default)
2024-01-16 23:25:58.890965 I | op-k8sutil: ROOK_CSI_NFS_POD_LABELS="" (default)
2024-01-16 23:25:58.890970 I | op-k8sutil: ROOK_CSI_RBD_POD_LABELS="" (default)
2024-01-16 23:25:58.890974 I | op-k8sutil: CSI_CLUSTER_NAME="" (default)
2024-01-16 23:25:58.890980 I | op-k8sutil: ROOK_CSI_IMAGE_PULL_POLICY="IfNotPresent" (configmap)
2024-01-16 23:25:58.890984 I | op-k8sutil: CSI_CEPHFS_KERNEL_MOUNT_OPTIONS="" (default)
2024-01-16 23:25:58.890989 I | op-k8sutil: CSI_CEPHFS_ATTACH_REQUIRED="true" (configmap)
2024-01-16 23:25:58.891033 I | op-k8sutil: CSI_RBD_ATTACH_REQUIRED="true" (configmap)
2024-01-16 23:25:58.891037 I | op-k8sutil: CSI_NFS_ATTACH_REQUIRED="true" (configmap)
2024-01-16 23:25:58.891047 I | ceph-csi: detecting the ceph csi image version for image "quay.io/cephcsi/cephcsi:v3.10.1"
2024-01-16 23:25:58.891100 I | op-k8sutil: CSI_PROVISIONER_TOLERATIONS="" (default)
2024-01-16 23:25:58.891114 I | op-k8sutil: CSI_PROVISIONER_NODE_AFFINITY="" (default)
2024-01-16 23:26:01.623404 I | ceph-spec: detected ceph image version: "17.2.7-0 quincy"
2024-01-16 23:26:01.623430 I | ceph-cluster-controller: validating ceph version from provided image
2024-01-16 23:26:01.627337 I | ceph-cluster-controller: cluster "rook-ceph": version "17.2.7-0 quincy" detected for image "quay.io/ceph/ceph:v17.2.7-20231114"
2024-01-16 23:26:01.651682 E | ceph-spec: failed to update cluster condition to {Type:Progressing Status:True Reason:ClusterProgressing Message:Configuring the Ceph cluster LastHeartbeatTime:2024-01-16 23:26:01.644874854 +0000 UTC m=+529.392364902 LastTransitionTime:2024-01-16 23:26:01.644874746 +0000 UTC m=+529.392364805}. failed to update object "rook-ceph/dev-ceph" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "dev-ceph": the object has been modified; please apply your changes to the latest version and try again
2024-01-16 23:26:01.721402 I | ceph-cluster-controller: created placeholder configmap for ceph overrides "rook-config-override"
2024-01-16 23:26:01.757333 I | op-mon: start running mons
2024-01-16 23:26:01.896717 I | ceph-spec: creating mon secrets for a new cluster
2024-01-16 23:26:01.917164 I | op-mon: existing maxMonID not found or failed to load. configmaps "rook-ceph-mon-endpoints" not found
2024-01-16 23:26:02.015817 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":[],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data: mapping:{"node":{}} maxMonId:-1 outOfQuorum:]
2024-01-16 23:26:02.814056 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2024-01-16 23:26:02.814244 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2024-01-16 23:26:04.414445 I | op-mon: targeting the mon count 3
2024-01-16 23:26:04.423809 I | op-mon: created canary deployment rook-ceph-mon-a-canary
2024-01-16 23:26:04.429583 I | op-mon: created canary deployment rook-ceph-mon-b-canary
2024-01-16 23:26:04.437234 I | op-mon: created canary deployment rook-ceph-mon-c-canary
2024-01-16 23:26:05.215363 I | op-mon: canary monitor deployment rook-ceph-mon-a-canary scheduled to k3s-server-3
2024-01-16 23:26:05.215396 I | op-mon: mon a assigned to node k3s-server-3
2024-01-16 23:26:05.415124 I | op-mon: canary monitor deployment rook-ceph-mon-b-canary scheduled to k3s-server-1
2024-01-16 23:26:05.415148 I | op-mon: mon b assigned to node k3s-server-1
2024-01-16 23:26:05.614948 I | op-mon: canary monitor deployment rook-ceph-mon-c-canary scheduled to k3s-server-4
2024-01-16 23:26:05.614972 I | op-mon: mon c assigned to node k3s-server-4
2024-01-16 23:26:05.620563 I | op-mon: cleaning up canary monitor deployment "rook-ceph-mon-b-canary"
2024-01-16 23:26:05.626368 I | op-mon: cleaning up canary monitor deployment "rook-ceph-mon-a-canary"
2024-01-16 23:26:05.637506 I | op-mon: cleaning up canary monitor deployment "rook-ceph-mon-c-canary"
2024-01-16 23:26:05.645327 I | op-mon: creating mon a
2024-01-16 23:26:05.819870 I | op-mon: mon "a" cluster IP is 10.43.175.16
2024-01-16 23:26:06.415519 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:a=10.43.175.16:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:-1 outOfQuorum:]
2024-01-16 23:26:06.415642 I | op-mon: monitor endpoints changed, updating the bootstrap peer token
2024-01-16 23:26:06.415700 I | op-mon: monitor endpoints changed, updating the bootstrap peer token
2024-01-16 23:26:07.015097 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2024-01-16 23:26:07.015346 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2024-01-16 23:26:07.421614 I | op-mon: 0 of 1 expected mons are ready. creating or updating deployments without checking quorum in attempt to achieve a healthy mon cluster
2024-01-16 23:26:07.614858 I | op-mon: updating maxMonID from -1 to 0
2024-01-16 23:26:08.415817 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:a=10.43.175.16:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:0 outOfQuorum:]
2024-01-16 23:26:08.415854 I | op-mon: waiting for mon quorum with [a]
2024-01-16 23:26:08.619143 I | op-mon: mons running: [a]
2024-01-16 23:26:09.014728 I | ceph-spec: parsing mon endpoints: a=10.43.175.16:3300
2024-01-16 23:26:09.014801 I | op-k8sutil: ROOK_OBC_WATCH_OPERATOR_NAMESPACE="true" (configmap)
2024-01-16 23:26:09.014809 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "rook-ceph.ceph.rook.io/bucket"
2024-01-16 23:26:09.015830 I | op-bucket-prov: successfully reconciled bucket provisioner
I0116 23:26:09.015957       1 manager.go:135] "msg"="starting provisioner" "logger"="objectbucket.io/provisioner-manager" "name"="rook-ceph.ceph.rook.io/bucket"
2024-01-16 23:26:26.963071 I | ceph-csi: Detected ceph CSI image version: "v3.10.1"
2024-01-16 23:26:26.968294 I | op-k8sutil: CSI_PLUGIN_TOLERATIONS="" (default)
2024-01-16 23:26:26.968314 I | op-k8sutil: CSI_PLUGIN_NODE_AFFINITY="" (default)
2024-01-16 23:26:26.968320 I | op-k8sutil: CSI_RBD_PLUGIN_TOLERATIONS="" (default)
2024-01-16 23:26:26.968325 I | op-k8sutil: CSI_RBD_PLUGIN_NODE_AFFINITY="" (default)
2024-01-16 23:26:26.968338 I | op-k8sutil: CSI_RBD_PLUGIN_RESOURCE="- name : driver-registrar\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n- name : csi-rbdplugin\n  resource:\n    requests:\n      memory: 512Mi\n      cpu: 250m\n    limits:\n      memory: 1Gi\n      cpu: 500m\n- name : liveness-prometheus\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n" (configmap)
2024-01-16 23:26:26.968597 I | op-k8sutil: CSI_RBD_PLUGIN_VOLUME="" (default)
2024-01-16 23:26:26.968611 I | op-k8sutil: CSI_RBD_PLUGIN_VOLUME_MOUNT="" (default)
2024-01-16 23:26:27.358354 I | op-k8sutil: CSI_RBD_PROVISIONER_TOLERATIONS="" (default)
2024-01-16 23:26:27.358381 I | op-k8sutil: CSI_RBD_PROVISIONER_NODE_AFFINITY="" (default)
2024-01-16 23:26:27.358404 I | op-k8sutil: CSI_RBD_PROVISIONER_RESOURCE="- name : csi-provisioner\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-resizer\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-attacher\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-snapshotter\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-rbdplugin\n  resource:\n    requests:\n      memory: 512Mi\n      cpu: 250m\n    limits:\n      memory: 1Gi\n      cpu: 500m\n- name : csi-omap-generator\n  resource:\n    requests:\n      memory: 512Mi\n      cpu: 250m\n    limits:\n      memory: 1Gi\n      cpu: 500m\n- name : liveness-prometheus\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n" (configmap)
2024-01-16 23:26:27.531291 I | ceph-csi: successfully started CSI Ceph RBD driver
2024-01-16 23:26:27.531323 I | op-k8sutil: CSI_CEPHFS_PLUGIN_TOLERATIONS="" (default)
2024-01-16 23:26:27.531329 I | op-k8sutil: CSI_CEPHFS_PLUGIN_NODE_AFFINITY="" (default)
2024-01-16 23:26:27.531346 I | op-k8sutil: CSI_CEPHFS_PLUGIN_RESOURCE="- name : driver-registrar\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n- name : csi-cephfsplugin\n  resource:\n    requests:\n      memory: 512Mi\n      cpu: 250m\n    limits:\n      memory: 1Gi\n      cpu: 500m\n- name : liveness-prometheus\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n" (configmap)
2024-01-16 23:26:27.531584 I | op-k8sutil: CSI_CEPHFS_PLUGIN_VOLUME="" (default)
2024-01-16 23:26:27.531599 I | op-k8sutil: CSI_CEPHFS_PLUGIN_VOLUME_MOUNT="" (default)
2024-01-16 23:26:27.554101 I | op-k8sutil: CSI_CEPHFS_PROVISIONER_TOLERATIONS="" (default)
2024-01-16 23:26:27.554125 I | op-k8sutil: CSI_CEPHFS_PROVISIONER_NODE_AFFINITY="" (default)
2024-01-16 23:26:27.554148 I | op-k8sutil: CSI_CEPHFS_PROVISIONER_RESOURCE="- name : csi-provisioner\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-resizer\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-attacher\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-snapshotter\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 100m\n    limits:\n      memory: 256Mi\n      cpu: 200m\n- name : csi-cephfsplugin\n  resource:\n    requests:\n      memory: 512Mi\n      cpu: 250m\n    limits:\n      memory: 1Gi\n      cpu: 500m\n- name : liveness-prometheus\n  resource:\n    requests:\n      memory: 128Mi\n      cpu: 50m\n    limits:\n      memory: 256Mi\n      cpu: 100m\n" (configmap)
2024-01-16 23:26:27.569094 I | ceph-csi: successfully started CSI CephFS driver
2024-01-16 23:26:27.569124 I | op-k8sutil: CSI_RBD_FSGROUPPOLICY="File" (configmap)
2024-01-16 23:26:27.625262 I | ceph-csi: CSIDriver object created for driver "rook-ceph.rbd.csi.ceph.com"
2024-01-16 23:26:27.625297 I | op-k8sutil: CSI_CEPHFS_FSGROUPPOLICY="File" (configmap)
2024-01-16 23:26:27.717257 I | ceph-csi: CSIDriver object created for driver "rook-ceph.cephfs.csi.ceph.com"
2024-01-16 23:26:27.717281 I | ceph-csi: CSI NFS driver disabled
2024-01-16 23:26:27.717287 I | op-k8sutil: removing daemonset csi-nfsplugin if it exists
2024-01-16 23:26:27.767088 I | op-k8sutil: removing deployment csi-nfsplugin-provisioner if it exists
2024-01-16 23:26:27.810837 I | ceph-csi: successfully removed CSI NFS driver
2024-01-16 23:26:28.848525 I | op-mon: mons running: [a]
2024-01-16 23:26:49.107311 I | op-mon: mons running: [a]
2024-01-16 23:27:04.229092 I | op-mon: Monitors in quorum: [a]
2024-01-16 23:27:04.229121 I | op-mon: mons created: 1
2024-01-16 23:27:04.810087 I | op-mon: waiting for mon quorum with [a]
2024-01-16 23:27:04.825748 I | op-mon: mons running: [a]
2024-01-16 23:27:05.399057 I | op-mon: Monitors in quorum: [a]
2024-01-16 23:27:05.399400 I | op-config: applying ceph settings:
[global]
mon allow pool size one = true
mon allow pool delete   = true
mon cluster log file    = 
2024-01-16 23:27:05.937050 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:05.937397 I | op-config: applying ceph settings:
[global]
log to file = false
2024-01-16 23:27:06.536802 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:06.536924 I | op-config: deleting "log file" option from the mon configuration database
2024-01-16 23:27:07.092489 I | op-config: successfully deleted "log file" option from the mon configuration database
2024-01-16 23:27:07.092520 I | op-mon: creating mon b
2024-01-16 23:27:07.120755 I | op-mon: mon "a" cluster IP is 10.43.175.16
2024-01-16 23:27:07.128631 I | op-mon: mon "b" cluster IP is 10.43.110.154
2024-01-16 23:27:07.149072 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300","10.43.110.154:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:a=10.43.175.16:3300,b=10.43.110.154:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:0 outOfQuorum:]
2024-01-16 23:27:07.297156 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2024-01-16 23:27:07.297458 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2024-01-16 23:27:07.703729 I | op-mon: 1 of 2 expected mon deployments exist. creating new deployment(s).
2024-01-16 23:27:07.710257 I | op-mon: deployment for mon rook-ceph-mon-a already exists. updating if needed
2024-01-16 23:27:07.720674 I | op-k8sutil: deployment "rook-ceph-mon-a" did not change, nothing to update
2024-01-16 23:27:07.896917 I | op-mon: updating maxMonID from 0 to 1
2024-01-16 23:27:08.698540 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300","10.43.110.154:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:a=10.43.175.16:3300,b=10.43.110.154:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:1 outOfQuorum:]
2024-01-16 23:27:08.698566 I | op-mon: waiting for mon quorum with [a b]
2024-01-16 23:27:09.101930 I | op-mon: mon b is not yet running
2024-01-16 23:27:09.101955 I | op-mon: mons running: [a]
2024-01-16 23:27:09.623237 I | op-mon: Monitors in quorum: [a]
2024-01-16 23:27:09.623261 I | op-mon: mons created: 2
2024-01-16 23:27:10.198481 I | op-mon: waiting for mon quorum with [a b]
2024-01-16 23:27:10.216099 I | op-mon: mon b is not yet running
2024-01-16 23:27:10.216119 I | op-mon: mons running: [a]
2024-01-16 23:27:15.233532 I | op-mon: mons running: [a b]
2024-01-16 23:27:17.190444 I | op-mon: Monitors in quorum: [a b]
2024-01-16 23:27:17.191080 I | op-config: applying ceph settings:
[global]
mon allow pool delete   = true
mon cluster log file    = 
mon allow pool size one = true
2024-01-16 23:27:17.727676 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:17.727987 I | op-config: applying ceph settings:
[global]
log to file = false
2024-01-16 23:27:18.306640 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:18.306718 I | op-config: deleting "log file" option from the mon configuration database
2024-01-16 23:27:18.888538 I | op-config: successfully deleted "log file" option from the mon configuration database
2024-01-16 23:27:18.888567 I | op-mon: creating mon c
2024-01-16 23:27:18.913132 I | op-mon: mon "a" cluster IP is 10.43.175.16
2024-01-16 23:27:18.935996 I | op-mon: mon "b" cluster IP is 10.43.110.154
2024-01-16 23:27:18.943777 I | op-mon: mon "c" cluster IP is 10.43.12.65
2024-01-16 23:27:19.294541 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300","10.43.110.154:3300","10.43.12.65:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:c=10.43.12.65:3300,a=10.43.175.16:3300,b=10.43.110.154:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:1 outOfQuorum:]
2024-01-16 23:27:19.892631 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2024-01-16 23:27:19.892830 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2024-01-16 23:27:20.300491 I | op-mon: 2 of 3 expected mon deployments exist. creating new deployment(s).
2024-01-16 23:27:20.306215 I | op-mon: deployment for mon rook-ceph-mon-a already exists. updating if needed
2024-01-16 23:27:20.313994 I | op-k8sutil: deployment "rook-ceph-mon-a" did not change, nothing to update
2024-01-16 23:27:20.319383 I | op-mon: deployment for mon rook-ceph-mon-b already exists. updating if needed
2024-01-16 23:27:20.327939 I | op-k8sutil: deployment "rook-ceph-mon-b" did not change, nothing to update
2024-01-16 23:27:20.492679 I | op-mon: updating maxMonID from 1 to 2
2024-01-16 23:27:21.493581 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.43.175.16:3300","10.43.110.154:3300","10.43.12.65:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}] data:c=10.43.12.65:3300,a=10.43.175.16:3300,b=10.43.110.154:3300 mapping:{"node":{"a":{"Name":"k3s-server-3","Hostname":"k3s-server-3","Address":"10.135.210.140"},"b":{"Name":"k3s-server-1","Hostname":"k3s-server-1","Address":"10.135.210.150"},"c":{"Name":"k3s-server-4","Hostname":"k3s-server-4","Address":"10.135.208.221"}}} maxMonId:2 outOfQuorum:]
2024-01-16 23:27:21.493608 I | op-mon: waiting for mon quorum with [a b c]
2024-01-16 23:27:22.098099 I | op-mon: mon c is not yet running
2024-01-16 23:27:22.098127 I | op-mon: mons running: [a b]
2024-01-16 23:27:22.634094 I | op-mon: Monitors in quorum: [a b]
2024-01-16 23:27:22.634157 I | op-mon: mons created: 3
2024-01-16 23:27:23.212045 I | op-mon: waiting for mon quorum with [a b c]
2024-01-16 23:27:23.239944 I | op-mon: mon c is not yet running
2024-01-16 23:27:23.239964 I | op-mon: mons running: [a b]
2024-01-16 23:27:28.265995 I | op-mon: mons running: [a b c]
2024-01-16 23:27:29.405587 I | op-mon: Monitors in quorum: [a b c]
2024-01-16 23:27:29.405932 I | op-config: applying ceph settings:
[global]
mon allow pool delete   = true
mon cluster log file    = 
mon allow pool size one = true
2024-01-16 23:27:29.937015 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:29.937311 I | op-config: applying ceph settings:
[global]
log to file = false
2024-01-16 23:27:30.531495 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:30.531567 I | op-config: deleting "log file" option from the mon configuration database
2024-01-16 23:27:31.135989 I | op-config: successfully deleted "log file" option from the mon configuration database
2024-01-16 23:27:31.136034 I | ceph-spec: not applying network settings for cluster "rook-ceph" ceph networks
2024-01-16 23:27:31.138364 I | cephclient: getting or creating ceph auth key "client.csi-rbd-provisioner"
2024-01-16 23:27:31.751663 I | cephclient: getting or creating ceph auth key "client.csi-rbd-node"
2024-01-16 23:27:32.351052 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2024-01-16 23:27:32.983954 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"
2024-01-16 23:27:33.619536 I | ceph-csi: created kubernetes csi secrets for cluster "rook-ceph"
2024-01-16 23:27:33.619566 I | cephclient: getting or creating ceph auth key "client.crash"
2024-01-16 23:27:34.198160 I | ceph-nodedaemon-controller: created kubernetes crash collector secret for cluster "rook-ceph"
2024-01-16 23:27:34.198191 I | cephclient: getting or creating ceph auth key "client.ceph-exporter"
2024-01-16 23:27:34.769244 I | ceph-nodedaemon-controller: created kubernetes exporter secret for cluster "rook-ceph"
2024-01-16 23:27:34.769275 I | op-config: deleting "ms_cluster_mode" option from the mon configuration database
2024-01-16 23:27:35.315072 I | op-config: successfully deleted "ms_cluster_mode" option from the mon configuration database
2024-01-16 23:27:35.315098 I | op-config: deleting "ms_service_mode" option from the mon configuration database
2024-01-16 23:27:35.834944 I | op-config: successfully deleted "ms_service_mode" option from the mon configuration database
2024-01-16 23:27:35.834976 I | op-config: deleting "ms_client_mode" option from the mon configuration database
2024-01-16 23:27:36.415492 I | op-config: successfully deleted "ms_client_mode" option from the mon configuration database
2024-01-16 23:27:36.415528 I | op-config: deleting "rbd_default_map_options" option from the mon configuration database
2024-01-16 23:27:36.992547 I | op-config: successfully deleted "rbd_default_map_options" option from the mon configuration database
2024-01-16 23:27:36.992905 I | op-config: applying ceph settings:
[global]
rbd_default_map_options = ms_mode=prefer-crc
2024-01-16 23:27:37.547967 I | op-config: successfully applied settings to the mon configuration database
2024-01-16 23:27:37.548043 I | op-config: deleting "ms_osd_compress_mode" option from the mon configuration database
2024-01-16 23:27:38.127274 I | op-config: successfully deleted "ms_osd_compress_mode" option from the mon configuration database
2024-01-16 23:27:38.127300 I | cephclient: create rbd-mirror bootstrap peer token "client.rbd-mirror-peer"
2024-01-16 23:27:38.127305 I | cephclient: getting or creating ceph auth key "client.rbd-mirror-peer"
2024-01-16 23:27:38.771034 I | cephclient: successfully created rbd-mirror bootstrap peer token for cluster "dev-ceph"
2024-01-16 23:27:38.784677 I | op-mgr: start running mgr
2024-01-16 23:27:38.789677 I | cephclient: getting or creating ceph auth key "mgr.a"
2024-01-16 23:27:39.406035 I | cephclient: getting or creating ceph auth key "mgr.b"
2024-01-16 23:27:40.124593 I | op-config: setting "mon"="auth_allow_insecure_global_id_reclaim"="false" option to the mon configuration database
2024-01-16 23:27:40.921045 I | op-config: successfully set "mon"="auth_allow_insecure_global_id_reclaim"="false" option to the mon configuration database
2024-01-16 23:27:40.921068 I | op-config: insecure global ID is now disabled
2024-01-16 23:28:01.969333 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mgr-a"
2024-01-16 23:28:01.973876 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mgr-b"
2024-01-16 23:28:02.024032 I | op-mgr: successful modules: balancer
2024-01-16 23:28:02.033189 I | op-osd: start running osds in namespace "rook-ceph"
2024-01-16 23:28:02.033389 I | op-osd: wait timeout for healthy OSDs during upgrade or restart is "10m0s"
2024-01-16 23:28:02.087651 I | op-osd: start provisioning the OSDs on PVCs, if needed
2024-01-16 23:28:02.092341 I | op-osd: no storageClassDeviceSets defined to configure OSDs on PVCs
2024-01-16 23:28:02.092429 I | op-osd: start provisioning the OSDs on nodes, if needed
2024-01-16 23:28:02.189971 I | op-osd: 5 of the 5 storage nodes are valid
2024-01-16 23:28:02.401493 I | op-osd: started OSD provisioning job for node "k3s-server-0"
2024-01-16 23:28:03.003840 I | op-osd: started OSD provisioning job for node "k3s-server-3"
2024-01-16 23:28:03.688745 I | op-osd: started OSD provisioning job for node "k3s-server-2"
2024-01-16 23:28:04.400988 I | op-osd: started OSD provisioning job for node "k3s-server-1"
2024-01-16 23:28:05.001338 I | op-osd: started OSD provisioning job for node "k3s-server-4"
2024-01-16 23:28:05.193681 I | op-osd: OSD orchestration status for node k3s-server-0 is "starting"
2024-01-16 23:28:05.193733 I | op-osd: OSD orchestration status for node k3s-server-1 is "starting"
2024-01-16 23:28:05.193743 I | op-osd: OSD orchestration status for node k3s-server-4 is "starting"
2024-01-16 23:28:05.193877 I | op-osd: OSD orchestration status for node k3s-server-3 is "orchestrating"
2024-01-16 23:28:05.193894 I | op-osd: OSD orchestration status for node k3s-server-2 is "orchestrating"
2024-01-16 23:28:05.388067 I | op-osd: OSD orchestration status for node k3s-server-0 is "orchestrating"
2024-01-16 23:28:06.188096 I | op-osd: OSD orchestration status for node k3s-server-1 is "orchestrating"
2024-01-16 23:28:06.546610 I | op-mgr: successful modules: prometheus
2024-01-16 23:28:06.889104 I | op-osd: OSD orchestration status for node k3s-server-4 is "orchestrating"
2024-01-16 23:28:07.225792 I | op-osd: OSD orchestration status for node k3s-server-3 is "completed"
2024-01-16 23:28:07.365776 I | op-osd: OSD orchestration status for node k3s-server-2 is "completed"
2024-01-16 23:28:07.549921 I | op-osd: OSD orchestration status for node k3s-server-0 is "completed"
2024-01-16 23:28:07.617714 E | op-mgr: failed modules: "mgr module(s) from the spec". failed to disable mgr module "pg_autoscaler": failed to enable mgr module "pg_autoscaler": exit status 22
2024-01-16 23:28:08.415008 I | op-osd: OSD orchestration status for node k3s-server-1 is "completed"
2024-01-16 23:28:08.983829 I | op-osd: OSD orchestration status for node k3s-server-4 is "completed"
2024-01-16 23:28:10.552676 I | op-mgr: setting ceph dashboard "admin" login creds
2024-01-16 23:28:11.800434 I | op-osd: finished running OSDs in namespace "rook-ceph"
2024-01-16 23:28:11.800459 I | ceph-cluster-controller: done reconciling ceph cluster in namespace "rook-ceph"
2024-01-16 23:28:11.808239 E | ceph-spec: failed to update cluster condition to {Type:Ready Status:True Reason:ClusterCreated Message:Cluster created successfully LastHeartbeatTime:2024-01-16 23:28:11.800493659 +0000 UTC m=+659.547983709 LastTransitionTime:2024-01-16 23:28:11.80049354 +0000 UTC m=+659.547983626}. failed to update object "rook-ceph/dev-ceph" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "dev-ceph": the object has been modified; please apply your changes to the latest version and try again
2024-01-16 23:28:11.808363 I | ceph-cluster-controller: reporting cluster telemetry
2024-01-16 23:28:11.815804 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "rook-ceph"
2024-01-16 23:28:11.815832 I | op-osd: ceph osd status in namespace "rook-ceph" check interval "1m0s"
2024-01-16 23:28:11.815837 I | ceph-cluster-controller: enabling ceph osd monitoring goroutine for cluster "rook-ceph"
2024-01-16 23:28:11.815844 I | ceph-cluster-controller: ceph status check interval is 1m0s
2024-01-16 23:28:11.815849 I | ceph-cluster-controller: enabling ceph status monitoring goroutine for cluster "rook-ceph"
2024-01-16 23:28:25.588494 I | exec: exec timeout waiting for process ceph to return. Sending interrupt signal to the process
2024-01-16 23:28:25.606893 I | cephclient: command failed for set dashboard creds. trying again...
2024-01-16 23:28:31.507407 I | ceph-cluster-controller: reporting node telemetry
2024-01-16 23:28:45.609384 I | exec: exec timeout waiting for process ceph to return. Sending interrupt signal to the process
2024-01-16 23:28:45.627610 I | cephclient: command failed for set dashboard creds. trying again...
2024-01-16 23:28:57.997310 I | op-mon: checking if multiple mons are on the same node
2024-01-16 23:29:05.628981 I | exec: exec timeout waiting for process ceph to return. Sending interrupt signal to the process
2024-01-16 23:29:05.647720 I | cephclient: command failed for set dashboard creds. trying again...
2024-01-16 23:29:25.649124 I | exec: exec timeout waiting for process ceph to return. Sending interrupt signal to the process
2024-01-16 23:29:25.668178 I | cephclient: command failed for set dashboard creds. trying again...
2024-01-16 23:29:45.669905 I | exec: exec timeout waiting for process ceph to return. Sending interrupt signal to the process
2024-01-16 23:29:45.687845 I | cephclient: command failed for set dashboard creds. trying again...
2024-01-16 23:29:50.688985 E | op-mgr: failed modules: "dashboard". failed to initialize dashboard: failed to set login credentials for the ceph dashboard: failed to set login creds on mgr: max command retries exceeded

@reefland
Copy link

reefland commented Jan 17, 2024

I'm troubleshooting a different issue unrelated, but I did notice this issue as well when reviewing Ceph dashboard docs, was surprised to see this mismatch.

In my cluster values.yaml:

  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    urlPrefix: /ceph-dashboard
    ssl: false

Yet, when I check with Ceph:

$ ceph config get mgr mgr/dashboard/ssl
true

Regardless, my dashboard is up and available.

@rkachach rkachach self-assigned this Jan 17, 2024
@ADustyOldMuffin
Copy link
Author

So it was noted that the Operator is setting the mgr/dashboard/ssl config option on each MGR individually instead of overall. Double confirming this, but maybe we should just set it overall and allow others to set it individually if they really care?

@travisn
Copy link
Member

travisn commented Jan 17, 2024

Note that the pg_autoscaler cannot be disabled anymore, so you can remove it from your config to avoid this error:

2024-01-16 23:28:07.617714 E | op-mgr: failed modules: "mgr module(s) from the spec". 
failed to disable mgr module "pg_autoscaler": failed to enable mgr module "pg_autoscaler": exit status 22
$ ceph mgr module ls
MODULE                              
...
pg_autoscaler         on (always on)

@travisn
Copy link
Member

travisn commented Jan 17, 2024

So it was noted that the Operator is setting the mgr/dashboard/ssl config option on each MGR individually instead of overall. Double confirming this, but maybe we should just set it overall and allow others to set it individually if they really care?

If the ssl setting is working as expected, then you can just change that setting and no need to override this setting for mgr.a.

Back to the original issue, so you're not seeing the correct mgr/dashboard/ssl setting on mgr.a? Something is gone awry in that cluster if all the false values are being treated as true.

@reefland
Copy link

Note that the pg_autoscaler cannot be disabled anymore, so you can remove it from your config to avoid this error:

2024-01-16 23:28:07.617714 E | op-mgr: failed modules: "mgr module(s) from the spec". 
failed to disable mgr module "pg_autoscaler": failed to enable mgr module "pg_autoscaler": exit status 22
$ ceph mgr module ls
MODULE                              
...
pg_autoscaler         on (always on)

Still listed in Rook values.yaml example, should probably be removed if no longer optional:

    modules:
      # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
      # are already enabled by other settings in the cluster CR.
      - name: pg_autoscaler
        enabled: true

@rkachach
Copy link
Contributor

rkachach commented Jan 18, 2024

So it was noted that the Operator is setting the mgr/dashboard/ssl config option on each MGR individually instead of overall. Double confirming this, but maybe we should just set it overall and allow others to set it individually if they really care?

I did some testing on my local cluster and was not able to reproduce the issue you are describing: dashboard is configured according to what the user states in cluster YAML file (either true or false):

  dashboard:
    enabled: true
    ssl: false

Besides, I think you was experimenting a mismatch because you was not consulting the correct mgr instance configuration. Internally the configuration is stored for each mgr instance. At any moment the dashboard picks the configuration from the active mgr. For example, if you active mgr is mgr.a then you will see the current value by running:

bash-4.4$ ceph config get mgr.a mgr/dashboard/ssl
true

In summary: to avoid confusion please try to adjust the value in the cluster.yaml, apply the new spec and see if the dashboard reacts as expected. If it doesn't then it's a BUG. The way how ceph and dashboard stores internally its config is a little bit tricky and may lead to confusion. I won't recommend changing the conf manually specially in rook based systems.

@ADustyOldMuffin
Copy link
Author

@rkachach I was checking the cluster wide setting not any specific mgr, what's the benefit of setting it on the active manager and not the cluster wide setting? If you fail over Rook then had to apply all settings to the new mgr instead of it just having all of the config.

@rkachach
Copy link
Contributor

@rkachach I was checking the cluster wide setting not any specific mgr, what's the benefit of setting it on the active manager and not the cluster wide setting? If you fail over Rook then had to apply all settings to the new mgr instead of it just having all of the config.

@ADustyOldMuffin I need to look into it in more detail, but that would take some time.

@ADustyOldMuffin
Copy link
Author

👍🏻 I checked and Rook detected the fail over fast enough that I didn't notice anything, but it does have to re-apply all settings.

I can keep this open for looking into possibly applying it at a higher level than the individual daemons or I think we can close this.

@travisn
Copy link
Member

travisn commented Jan 18, 2024

Still listed in Rook values.yaml example, should probably be removed if no longer optional:

Ah yes, that's an old example we need to remove. Now the autoscale configuration is a per-pool ceph config.

@makiv2
Copy link

makiv2 commented Jan 21, 2024

You should set a port. After hours I found out that the configuration for ssl: false is bugged if you do not provide a port.

Values file that result in correct ssl: false behavior.

cephClusterSpec:
  dashboard:
    enabled: true
    port: 8080
    ssl: false

Results:

  • Service port open 8080
  • mgr handling requests on port 8080

Values file that resulted in missconfiguration in the mgr.

cephClusterSpec:
  dashboard:
    enabled: true
    ssl: false

Results:

  • Service port open 7000 (default configuration for the ssl disabled key value pair.)
  • mgr handling requests on port 8443 (with ssl enabled)

Expected results:

  • Service port open 7000 (default configuration for the ssl disabled key value pair.)
  • mgr handling requests on port 7000 (with ssl disabled)

@chenlein
Copy link

I encountered the same issue on version 1.13.2. Regardless of how I modify the spec.dashboard.ssl and spec.dashboard.port in the CephCluster, the logs of the MGR container consistently show server: ssl=yes host=:: port=8443, and the rook-ceph-mgr-dashboard service always displays 7000.

@chenlein
Copy link

I believe this issue may be related to #10110. After creating and applying /usr/lib/systemd/system/containerd.service.d/LimitNOFILE.conf, I redeployed Rook Ceph, and everything is working fine now.

rkachach added a commit to rkachach/rook that referenced this issue Jan 22, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
rkachach added a commit to rkachach/rook that referenced this issue Jan 22, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
rkachach added a commit to rkachach/rook that referenced this issue Jan 22, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
rkachach added a commit to rkachach/rook that referenced this issue Jan 22, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
rkachach added a commit to rkachach/rook that referenced this issue Jan 22, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
@rkachach
Copy link
Contributor

👍🏻 I checked and Rook detected the fail over fast enough that I didn't notice anything, but it does have to re-apply all settings.

I can keep this open for looking into possibly applying it at a higher level than the individual daemons or I think we can close this.

@ADustyOldMuffin Your findings were key to narrow down the issue. In fact, the issue has to do with the way Rook was handling dashboard configuration by using per-daemon conf which is different from the behavior of the dashboard on cephadm deployments where global mgr conf section is used. This also explains some old weird BUGs related to the dashboard configuration inconsistencies.

I've opened a PR with a candidate fix 👍

rkachach added a commit to rkachach/rook that referenced this issue Jan 23, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
rkachach added a commit to rkachach/rook that referenced this issue Jan 25, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: rook#13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
mergify bot pushed a commit that referenced this issue Jan 26, 2024
previously, the dashboard parameters supported by Rook were stored in the
daemon configuration section (mgr.X, for example). This differs from
Cephadm-based deployments, where all configurations are stored in the
global mgr configuration section. This variance could result in
configuration mismatches between the active and standby dashboards.
Furthermore, all Ceph dashboard documentation exclusively points to
the global mgr configuration section and makes no use of individual
daemons sections.

Fixes: #13577

Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 57e3937)
@Ramblurr
Copy link

Something is still not working here.

I am making a fresh rook-ceph install using rook-ceph-cluster helm chart version v1.13.4 (which the above PR should be in AFAICT)

My mgrs are complaining:

 rook-ceph-mgr-b-667bbcc7cc-p4jnd mgr debug 2024-02-14T11:47:34.919+0000 7f84f2752700  0 [dashboard INFO root] server: ssl=yes host=:: port=8443
rook-ceph-mgr-b-667bbcc7cc-p4jnd mgr debug 2024-02-14T11:47:34.920+0000 7f84f2752700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured            

My helm chart values has:

      dashboard:
        enabled: true
        ssl: false
        port: 8080 # ref: https://github.com/rook/rook/issues/13577#issuecomment-1902466353
        urlPrefix: /

However the CephCluster was created with ssl missing. I edited the cluster resource to add ssl: false. But despite that value sticking, the mgrs are still expecting ssl to be enabled.

@rkachach
Copy link
Contributor

@Ramblurr I'll try to reproduce the issue using your yaml values

@rkachach
Copy link
Contributor

rkachach commented Feb 14, 2024

@Ramblurr was this a temporal circumstance or dashboard didn't manage to be configured correctly?

I tried to simulate the same setup on my env and I can see that in the first startup the certificates are not yet there but then it get fixed in the following mg restart and the dashboard is eventually correctly configured:

Logs from my mgr:

k logs rook-ceph-mgr-a-5b5b779647-8n26p | grep -e "Config not ready" -e ssl                          ✔ 
Defaulted container "mgr" out of: mgr, watch-active, log-collector, chown-container-data-dir (init)
debug 2024-02-14T13:15:58.999+0000 7f5b7e9e4700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443
debug 2024-02-14T13:15:59.002+0000 7f5b7e9e4700  0 [dashboard INFO root] Config not ready to serve, waiting: no certificate configured
debug 2024-02-14T13:16:24.207+0000 7f02a338c700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443
debug 2024-02-14T13:18:33.862+0000 7f1244e95700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443
debug 2024-02-14T13:19:29.048+0000 7fddcb60f700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443
debug 2024-02-14T13:24:57.345+0000 7f0ab520c700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443

I also tested switching off SSL on the cluster by setting ssl: false and is working correctly:

k logs rook-ceph-mgr-a-6477b5d4b5-ck7gw | grep -e "Config not ready" -e ssl                          ✔ 
Defaulted container "mgr" out of: mgr, watch-active, log-collector, chown-container-data-dir (init)
debug 2024-02-14T13:34:58.797+0000 7fb29783a700  0 [dashboard INFO root] server: ssl=yes host=0.0.0.0 port=8443
debug 2024-02-14T13:37:02.775+0000 7f0f4a07e700  0 [dashboard INFO root] server: ssl=no host=0.0.0.0 port=7000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants