You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: This behavior started after I upgraded Kubernetes from 1.23 to 1.24
As soon as my nodes get any pod assigned with an RBD PVC, it starts spamming the following lines in /var/log/syslog:
Jan 21 09:19:17 am5-k8s-node-01 systemd[3013626]: message repeated 8 times: [ Failed to set up mount unit: Invalid argument]
Jan 21 09:19:17 am5-k8s-node-01 systemd[1]: Failed to set up mount unit: Invalid argument
In December I did some investigation and it looks like it all starts with te following logs:
...
Dec 31 13:15:14 am5-k8s-node-01 kernel: [ 876.509387] libceph: mon0 (1)10.152.183.102:6789 session established
Dec 31 13:15:14 am5-k8s-node-01 kernel: [ 876.510483] libceph: client56622929 fsid b6f6bb22-151b-4bc1-bf92-1b2b68eee1d3
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.610420] rbd: rbd1: capacity 8589934592 features 0x1
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.614436] rbd: rbd4: capacity 268435456000 features 0x1
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.618442] rbd: rbd2: capacity 8589934592 features 0x1
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.642357] rbd: rbd0: capacity 8589934592 features 0x1
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.642362] rbd: rbd3: capacity 107374182400 features 0x1
Dec 31 13:15:15 am5-k8s-node-01 multipathd[1146]: rbd1: HDIO_GETGEO failed with 25
Dec 31 13:15:15 am5-k8s-node-01 multipathd[1146]: rbd1: failed to get udev uid: Invalid argument
Dec 31 13:15:15 am5-k8s-node-01 multipathd[1146]: rbd1: failed to get unknown uid: Invalid argument
Dec 31 13:15:15 am5-k8s-node-01 microk8s.daemon-kubelite[1400]: I1231 13:15:15.165343 1400 operation_generator.go:1555] "Controller attach succeeded for volume \"pvc-f3482fb2-a883-46ad-b47b-07d3cc6973cf\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000002-23e9562c-6bfd-11ed-aa40-c67613b008e6\") pod \"redis-replicas-0\" (UID: \"9cff0ba0-b4a2-4ba6-8c76-e95d8c800658\") device path: \"\"" pod="<redacted>/redis-replicas-0"
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.756571] EXT4-fs (rbd1): mounted filesystem with ordered data mode. Opts: (null)
Dec 31 13:15:15 am5-k8s-node-01 systemd[2443]: Failed to set up mount unit: Invalid argument
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.791752] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.831471] EXT4-fs (rbd4): mounted filesystem with ordered data mode. Opts: (null)
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.872417] EXT4-fs (rbd3): mounted filesystem with ordered data mode. Opts: (null)
Dec 31 13:15:15 am5-k8s-node-01 kernel: [ 876.894296] EXT4-fs (rbd2): mounted filesystem with ordered data mode. Opts: (null)
Dec 31 13:15:15 am5-k8s-node-01 systemd[1]: Failed to set up mount unit: Invalid argument
...
However it looks like all PVC's are correctly mounted to the pods.
I have a feeling the mount path might be too long. For example /dev/rbd0 is mounted as follows (mount | grep rbd):
/dev/rbd0 on /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/32030ca958fe47c1e0da9fd8df2c7727edec6f59a544669ed4e4835bc15e1566/globalmount/0001-0009-rook-ceph-0000000000000002-0251e137-0a3e-11ec-99db-724c03a610e7 type ext4 (rw,relatime,stripe=16,_netdev)
/dev/rbd0 on /var/snap/microk8s/common/var/lib/kubelet/pods/deaf89a8-e96a-46d6-bb17-7a21702c049a/volumes/kubernetes.io~csi/pvc-2759dd8c-8a9c-455d-8b80-95bb5c40ecf6/mount type ext4 (rw,relatime,stripe=16,_netdev)
If I'm correct the max length is 255, but length of the first mount is 276 according to the following command: systemd-escape /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/32030ca958fe47c1e0da9fd8df2c7727edec6f59a544669ed4e4835bc15e1566/globalmount/0001-0009-rook-ceph-0000000000000002-0251e137-0a3e-11ec-99db-724c03a610e7 | wc -c
Expected behavior:
Syslog should not be spammed with the above mentioned errors
How to reproduce it (minimal and precise):
Start with node without any workloads mounting RBD PVC's (drain the node). /var/log/syslog is clean and 'mount | grep rbd' shows nothing
Schedule a workload with a RBD PVC on the node. /var/log/syslog is getting spammed with Failed to set up mount unit: Invalid argument.
File(s) to submit:
Cluster CR (custom resource), typically called cluster.yaml, if necessary
################################################################################################################## Define the settings for the rook-ceph cluster with common settings for a production cluster.# All nodes with available raw devices will be used for the Ceph cluster. At least three nodes are required# in this example. See the documentation for more details on storage settings available.# For example, to create the cluster:# kubectl create -f crds.yaml -f common.yaml -f operator.yaml# kubectl create -f cluster.yaml#################################################################################################################apiVersion: ceph.rook.io/v1kind: CephClustermetadata:
name: rook-cephnamespace: rook-ceph # namespace:clusterspec:
cephVersion:
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).# v16 is Pacific, and v17 is Quincy.# RECOMMENDATION: In production, use a specific version tag instead of the general v17 flag, which pulls the latest release and could result in different# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.# If you want to be more precise, you can always use a timestamp tag such quay.io/ceph/ceph:v17.2.3-20220805# This tag might not contain a new Ceph version, just security fixes from the underlying operating system, which will reduce vulnerabilitiesimage: quay.io/ceph/ceph:v17.2.5# Whether to allow unsupported versions of Ceph. Currently `pacific` and `quincy` are supported.# Future versions such as `reef` (v18) would require this to be set to `true`.# Do not set to true in production.allowUnsupported: false# The path on the host where configuration files will be persisted. Must be specified.# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.dataDirHostPath: /var/lib/rook# Whether or not upgrade should continue even if a check fails# This means Ceph's status could be degraded and we don't recommend upgrading but you might decide otherwise# Use at your OWN risk# To understand Rook's upgrade process of Ceph, read https://rook.io/docs/rook/latest/ceph-upgrade.html#ceph-version-upgradesskipUpgradeChecks: false# Whether or not continue if PGs are not clean during an upgradecontinueUpgradeAfterChecksEvenIfNotHealthy: false# WaitTimeoutForHealthyOSDInMinutes defines the time (in minutes) the operator would wait before an OSD can be stopped for upgrade or restart.# If the timeout exceeds and OSD is not ok to stop, then the operator would skip upgrade for the current OSD and proceed with the next one# if `continueUpgradeAfterChecksEvenIfNotHealthy` is `false`. If `continueUpgradeAfterChecksEvenIfNotHealthy` is `true`, then operator would# continue with the upgrade of an OSD even if its not ok to stop after the timeout. This timeout won't be applied if `skipUpgradeChecks` is `true`.# The default wait timeout is 10 minutes.waitTimeoutForHealthyOSDInMinutes: 10mon:
# Set the number of mons to be started. Generally recommended to be 3.# For highest availability, an odd number of mons should be specified.count: 3# The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.# Mons should only be allowed on the same node for test environments where data loss is acceptable.allowMultiplePerNode: falsemgr:
# When higher availability of the mgr is needed, increase the count to 2.# In that case, one mgr will be active and one in standby. When Ceph updates which# mgr is active, Rook will update the mgr services to match the active mgr.count: 2allowMultiplePerNode: falsemodules:
# Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules# are already enabled by other settings in the cluster CR.
- name: pg_autoscalerenabled: true
- name: rookenabled: true# enable the ceph dashboard for viewing cluster statusdashboard:
enabled: true# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)# urlPrefix: /ceph-dashboard# serve the dashboard at the given port.# port: 8443# serve the dashboard using SSLssl: true# enable prometheus alerting for clustermonitoring:
# requires Prometheus to be pre-installedenabled: truenetwork:
connections:
# Whether to encrypt the data in transit across the wire to prevent eavesdropping the data on the network.# The default is false. When encryption is enabled, all communication between clients and Ceph daemons, or between Ceph daemons will be encrypted.# When encryption is not enabled, clients still establish a strong initial authentication and data integrity is still validated with a crc check.# IMPORTANT: Encryption requires the 5.11 kernel for the latest nbd and cephfs drivers. Alternatively for testing only,# you can set the "mounter: rbd-nbd" in the rbd storage class, or "mounter: fuse" in the cephfs storage class.# The nbd and fuse drivers are *not* recommended in production since restarting the csi driver pod will disconnect the volumes.encryption:
enabled: false# Whether to compress the data in transit across the wire. The default is false.# Requires Ceph Quincy (v17) or newer. Also see the kernel requirements above for encryption.compression:
enabled: false# enable host networking#provider: host# enable the Multus network provider#provider: multus#selectors:# The selector keys are required to be `public` and `cluster`.# Based on the configuration, the operator will do the following:# 1. if only the `public` selector key is specified both public_network and cluster_network Ceph settings will listen on that interface# 2. if both `public` and `cluster` selector keys are specified the first one will point to 'public_network' flag and the second one to 'cluster_network'## In order to work, each selector value must match a NetworkAttachmentDefinition object in Multus##public: public-conf --> NetworkAttachmentDefinition object name in Multus#cluster: cluster-conf --> NetworkAttachmentDefinition object name in Multus# Provide internet protocol version. IPv6, IPv4 or empty string are valid options. Empty string would mean IPv4#ipFamily: "IPv6"# Ceph daemons to listen on both IPv4 and Ipv6 networks#dualStack: false# enable the crash collector for ceph daemon crash collectioncrashCollector:
disable: false# Uncomment daysToRetain to prune ceph crash entries older than the# specified number of days.#daysToRetain: 30# enable log collector, daemons will log on files and rotatelogCollector:
enabled: trueperiodicity: daily # one of: hourly, daily, weekly, monthlymaxLogSize: 500M# SUFFIX may be 'M' or 'G'. Must be at least 1M.# automate [data cleanup process](https://github.com/rook/rook/blob/master/Documentation/Storage-Configuration/ceph-teardown.md#delete-the-data-on-hosts) in cluster destruction.cleanupPolicy:
# Since cluster cleanup is destructive to data, confirmation is required.# To destroy all Rook data on hosts during uninstall, confirmation must be set to "yes-really-destroy-data".# This value should only be set when the cluster is about to be deleted. After the confirmation is set,# Rook will immediately stop configuring the cluster and only wait for the delete command.# If the empty string is set, Rook will not destroy any data on hosts during uninstall.confirmation: ""# sanitizeDisks represents settings for sanitizing OSD disks on cluster deletionsanitizeDisks:
# method indicates if the entire disk should be sanitized or simply ceph's metadata# in both case, re-install is possible# possible choices are 'complete' or 'quick' (default)method: quick# dataSource indicate where to get random bytes from to write on the disk# possible choices are 'zero' (default) or 'random'# using random sources will consume entropy from the system and will take much more time then the zero sourcedataSource: zero# iteration overwrite N times instead of the default (1)# takes an integer valueiteration: 1# allowUninstallWithVolumes defines how the uninstall should be performed# If set to true, cephCluster deletion does not wait for the PVs to be deleted.allowUninstallWithVolumes: false# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and# tolerate taints with a key of 'storage-node'.# placement:# all:# nodeAffinity:# requiredDuringSchedulingIgnoredDuringExecution:# nodeSelectorTerms:# - matchExpressions:# - key: role# operator: In# values:# - storage-node# podAffinity:# podAntiAffinity:# topologySpreadConstraints:# tolerations:# - key: storage-node# operator: Exists# The above placement information can also be specified for mon, osd, and mgr components# mon:# Monitor deployments may contain an anti-affinity rule for avoiding monitor# collocation on the same node. This is a required rule when host network is used# or when AllowMultiplePerNode is false. Otherwise this anti-affinity rule is a# preferred rule with weight: 50.# osd:# prepareosd:# mgr:# cleanup:annotations:
# all:# mon:# osd:# cleanup:# prepareosd:# clusterMetadata annotations will be applied to only `rook-ceph-mon-endpoints` configmap and the `rook-ceph-mon` and `rook-ceph-admin-keyring` secrets.# And clusterMetadata annotations will not be merged with `all` annotations.# clusterMetadata:# kubed.appscode.com/sync: "true"# If no mgr annotations are set, prometheus scrape annotations will be set by default.# mgr:labels:
# all:# mon:# osd:# cleanup:# mgr:# prepareosd:# monitoring is a list of key-value pairs. It is injected into all the monitoring resources created by operator.# These labels can be passed as LabelSelector to Prometheus# monitoring:# crashcollector:resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory# mgr:# limits:# cpu: "500m"# memory: "1024Mi"# requests:# cpu: "500m"# memory: "1024Mi"# The above example requests/limits can also be added to the other components# mon:# osd:# For OSD it also is a possible to specify requests/limits based on device class# osd-hdd:# osd-ssd:# osd-nvme:# prepareosd:# mgr-sidecar:# crashcollector:# logcollector:# cleanup:# The option to automatically remove OSDs that are out and are safe to destroy.removeOSDsIfOutAndSafeToRemove: falsepriorityClassNames:
#all: rook-ceph-default-priority-classmon: system-node-criticalosd: system-node-criticalmgr: system-cluster-critical#crashcollector: rook-ceph-crashcollector-priority-classstorage: # cluster level storage configuration and selectionuseAllNodes: trueuseAllDevices: true#deviceFilter:config:
# crushRoot: "custom-root" # specify a non-default root label for the CRUSH map# metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.# databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB# journalSizeMB: "1024" # uncomment if the disks are 20 GB or smaller# osdsPerDevice: "1" # this value can be overridden at the node or device level# encryptedDevice: "true" # the default value for this option is "false"# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.# nodes:# - name: "172.17.4.201"# devices: # specific devices to use for storage can be specified for each node# - name: "sdb"# - name: "nvme01" # multiple osds can be created on high performance devices# config:# osdsPerDevice: "5"# - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths# config: # configuration can be specified at the node level which overrides the cluster level config# - name: "172.17.4.301"# deviceFilter: "^sd."# when onlyApplyOSDPlacement is false, will merge both placement.All() and placement.osdonlyApplyOSDPlacement: false# The section for configuring management of daemon disruptions during upgrade or fencing.disruptionManagement:
# If true, the operator will create and manage PodDisruptionBudgets for OSD, Mon, RGW, and MDS daemons. OSD PDBs are managed dynamically# via the strategy outlined in the [design](https://github.com/rook/rook/blob/master/design/ceph/ceph-managed-disruptionbudgets.md). The operator will# block eviction of OSDs by default and unblock them safely when drains are detected.managePodBudgets: true# A duration in minutes that determines how long an entire failureDomain like `region/zone/host` will be held in `noout` (in addition to the# default DOWN/OUT interval) when it is draining. This is only relevant when `managePodBudgets` is `true`. The default value is `30` minutes.osdMaintenanceTimeout: 30# A duration in minutes that the operator will wait for the placement groups to become healthy (active+clean) after a drain was completed and OSDs came back up.# Operator will continue with the next drain if the timeout exceeds. It only works if `managePodBudgets` is `true`.# No values or 0 means that the operator will wait until the placement groups are healthy before unblocking the next drain.pgHealthCheckTimeout: 0# If true, the operator will create and manage MachineDisruptionBudgets to ensure OSDs are only fenced when the cluster is healthy.# Only available on OpenShift.manageMachineDisruptionBudgets: false# Namespace in which to watch for the MachineDisruptionBudgets.machineDisruptionBudgetNamespace: openshift-machine-api# healthChecks# Valid values for daemons are 'mon', 'osd', 'status'healthCheck:
daemonHealth:
mon:
disabled: falseinterval: 45sosd:
disabled: falseinterval: 60sstatus:
disabled: falseinterval: 60s# Change pod liveness probe timing or threshold values. Works for all mon,mgr,osd daemons.livenessProbe:
mon:
disabled: falsemgr:
disabled: falseosd:
disabled: false# Change pod startup probe timing or threshold values. Works for all mon,mgr,osd daemons.startupProbe:
mon:
disabled: falsemgr:
disabled: falseosd:
disabled: false
Cluster Status to submit:
Output of krew commands, if necessary
kubectl rook-ceph health:
Info: Checking if at least three mon pods are running on different nodes
rook-ceph-mon-m-7765746b58-vfgvh 2/2 Running 0 20d
rook-ceph-mon-n-7769cf6bc8-2h6dr 2/2 Running 0 20d
rook-ceph-mon-k-585754cb5c-hssmx 2/2 Running 0 20d
Info: Checking mon quorum and ceph health details
HEALTH_OK
Info: Checking if at least three osd pods are running on different nodes
rook-ceph-osd-0-5989dc5cc9-6ffcw 2/2 Running 0 16h
rook-ceph-osd-5-866f59d556-kmc4v 2/2 Running 0 16h
rook-ceph-osd-6-7f749856d5-bsr9x 2/2 Running 0 16h
rook-ceph-osd-2-7c6fc4cdd9-r4qh4 2/2 Running 0 16h
rook-ceph-osd-7-585765569d-spztf 2/2 Running 0 16h
rook-ceph-osd-4-6dcc586fd8-tdgcn 2/2 Running 0 16h
rook-ceph-osd-3-864b745f5f-n5c42 2/2 Running 0 16h
rook-ceph-osd-8-66879b79dd-vh49l 2/2 Running 0 16h
Info: Pods that are in 'Running' status
NAME READY STATUS RESTARTS AGE
rook-ceph-mon-m-7765746b58-vfgvh 2/2 Running 0 20d
rook-ceph-mds-myfs-b-6fb7c564c5-9dp82 2/2 Running 0 20d
rook-ceph-mon-n-7769cf6bc8-2h6dr 2/2 Running 0 20d
rook-ceph-rgw-my-store-a-745657c48c-5gvf5 2/2 Running 0 20d
rook-ceph-mds-myfs-a-d8c857cd-vfgzh 2/2 Running 1 (20d ago) 20d
rook-ceph-mon-k-585754cb5c-hssmx 2/2 Running 0 20d
rook-ceph-operator-5dcccd4b4c-x62xm 1/1 Running 0 16h
rook-ceph-crashcollector-am5-k8s-node-04-6f4d5d6746-czw4f 1/1 Running 0 16h
rook-ceph-crashcollector-am5-k8s-node-01-56f5c86cd7-54sq4 1/1 Running 0 16h
rook-ceph-crashcollector-am5-k8s-node-03-bccd7fbd9-sttxq 1/1 Running 0 16h
rook-ceph-crashcollector-am5-k8s-node-02-6f44584ddb-jqdf4 1/1 Running 0 16h
rook-discover-plr4p 1/1 Running 0 16h
csi-cephfsplugin-vpkhf 2/2 Running 0 16h
csi-rbdplugin-jgz6g 2/2 Running 0 16h
rook-discover-zgbf8 1/1 Running 0 16h
csi-cephfsplugin-6f2pw 2/2 Running 0 16h
csi-rbdplugin-provisioner-99dd6c4c6-f627t 5/5 Running 0 16h
csi-cephfsplugin-provisioner-7c594f8cf-m5z5s 5/5 Running 0 16h
csi-cephfsplugin-provisioner-7c594f8cf-vtvnr 5/5 Running 0 16h
csi-rbdplugin-provisioner-99dd6c4c6-8w7gd 5/5 Running 0 16h
csi-cephfsplugin-p8f5l 2/2 Running 0 16h
csi-rbdplugin-clvvz 2/2 Running 0 16h
csi-rbdplugin-p7l5h 2/2 Running 0 16h
rook-discover-rl5js 1/1 Running 0 16h
csi-cephfsplugin-szm79 2/2 Running 0 16h
csi-rbdplugin-2wkfl 2/2 Running 0 16h
rook-discover-7nw6t 1/1 Running 0 16h
rook-ceph-mgr-a-584c6c6647-wtbbk 3/3 Running 0 16h
rook-ceph-mgr-b-677bd6944-2tjnb 3/3 Running 0 16h
rook-ceph-osd-0-5989dc5cc9-6ffcw 2/2 Running 0 16h
rook-ceph-osd-5-866f59d556-kmc4v 2/2 Running 0 16h
rook-ceph-osd-6-7f749856d5-bsr9x 2/2 Running 0 16h
rook-ceph-osd-2-7c6fc4cdd9-r4qh4 2/2 Running 0 16h
rook-ceph-osd-7-585765569d-spztf 2/2 Running 0 16h
rook-ceph-osd-4-6dcc586fd8-tdgcn 2/2 Running 0 16h
rook-ceph-osd-3-864b745f5f-n5c42 2/2 Running 0 16h
rook-ceph-osd-8-66879b79dd-vh49l 2/2 Running 0 16h
Warning: Pods that are 'Not' in 'Running' status
NAME READY STATUS RESTARTS AGE
Info: checking placement group status
Info: 169 pgs: 169 active+clean; 229 GiB data, 691 GiB used, 4.6 TiB / 5.3 TiB avail; 9.6 KiB/s rd, 640 KiB/s wr, 18 op/s
Info: checking if at least one mgr pod is running
rook-ceph-mgr-a-584c6c6647-wtbbk Running am5-k8s-node-01
rook-ceph-mgr-b-677bd6944-2tjnb Running am5-k8s-node-02
If I'm correct the max length is 255, but length of the first mount is 276 according to the following command: systemd-escape /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/32030ca958fe47c1e0da9fd8df2c7727edec6f59a544669ed4e4835bc15e1566/globalmount/0001-0009-rook-ceph-0000000000000002-0251e137-0a3e-11ec-99db-724c03a610e7 | wc -c
256
@DjVinnii yes, I think you are correct. This happens with most of the clusters which use a custom kubelet path ex:- micr0k8s. In actual cluster, its /var/lib/kublet, and in microk8s its /var/snap/microk8s/common/var/lib/kubelet/ , more then Rook this is a systemd issue and it was fixed in systemd/systemd#18077, AFAIK nothing can be fixed at Rook for this one.
@Madhu-1 First of all, thanks for the quick reply.
In actual cluster, its /var/lib/kublet, and in microk8s its /var/snap/microk8s/common/var/lib/kubelet/ , more then Rook this is a systemd issue and it was fixed in systemd/systemd#18077, AFAIK nothing can be fixed at Rook for this one.
I already thought this wasn't something Rook could fix. After some further investigation with the issue you mentioned, I found out that it should be fixed with systemd 249 and higher. Ubuntu 20.04 LTS makes use of systemd 245. So, it looks like I need to upgrade my nodes Ubuntu 22.04 LTS, which makes use of systemd 249 according to Ubuntu Packages
Is this a bug report or feature request?
Deviation from expected behavior:
As soon as my nodes get any pod assigned with an RBD PVC, it starts spamming the following lines in
/var/log/syslog
:In December I did some investigation and it looks like it all starts with te following logs:
However it looks like all PVC's are correctly mounted to the pods.
I have a feeling the mount path might be too long. For example
/dev/rbd0
is mounted as follows (mount | grep rbd
):If I'm correct the max length is 255, but length of the first mount is 276 according to the following command:
systemd-escape /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/32030ca958fe47c1e0da9fd8df2c7727edec6f59a544669ed4e4835bc15e1566/globalmount/0001-0009-rook-ceph-0000000000000002-0251e137-0a3e-11ec-99db-724c03a610e7 | wc -c
Expected behavior:
Syslog should not be spammed with the above mentioned errors
How to reproduce it (minimal and precise):
/var/log/syslog
is clean and 'mount | grep rbd' shows nothing/var/log/syslog
is getting spammed withFailed to set up mount unit: Invalid argument
.File(s) to submit:
cluster.yaml
, if necessaryCluster Status to submit:
kubectl rook-ceph health
:kubectl rook-ceph ceph status
Environment:
uname -a
): 5.4.0-135-genericrook version
inside of a Rook Pod): v1.10.10ceph -v
): v17.2.5kubectl version
): v1.24.8ceph health
in the Rook Ceph toolbox): HEALTH_OKThe text was updated successfully, but these errors were encountered: