Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails to format #135

Closed
davidkarlsen opened this issue Aug 23, 2021 · 25 comments · Fixed by kubernetes/kubernetes#104923
Closed

fails to format #135

davidkarlsen opened this issue Aug 23, 2021 · 25 comments · Fixed by kubernetes/kubernetes#104923
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Milestone

Comments

@davidkarlsen
Copy link

What steps did you take and what happened:

   ----              ----               -------
  Warning  FailedScheduling  6m2s              default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: selectedNode annotation reset for PVC "elasticsearch-elasticsearch-cdm-4qo1qel7-1"
  Normal   Scheduled         16s               default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-4qo1qel7-1-6db94d4d88-lwtv7 to alp-dts-g-c01oco09
  Warning  FailedMount       5s (x5 over 13s)  kubelet            MountVolume.SetUp failed for volume "pvc-c9073859-fd54-4890-b444-b96e6f46dea1" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount
Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io~csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

beacause:

 mkfs.xfs /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1
mkfs.xfs: /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 appears to contain an existing filesystem (xfs).
mkfs.xfs: Use the -f option to force overwrite.

Maybe it should force by default or some notes added to the docs.

What did you expect to happen:
formatting should happen

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • LVM Driver version
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
@w3aman
Copy link
Contributor

w3aman commented Aug 23, 2021

Hi @davidkarlsen
Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

@davidkarlsen
Copy link
Author

Hi @davidkarlsen
Can you please tell us the environment details like LVM-driver version, k8s version and OS ?

lvm version
  LVM version:     2.02.187(2)-RHEL7 (2020-03-24)
  Library version: 1.02.170-RHEL7 (2020-03-24)
  Driver version:  4.37.1
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dmfilemapd
uname -a
Linux alp-dts-g-c01oco07 3.10.0-1160.36.2.el7.x86_64 #1 SMP Thu Jul 8 02:53:40 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.9 (Maipo)

openebs helm chart 2.12.0

@kmova
Copy link
Member

kmova commented Aug 24, 2021

@davidkarlsen -- this looks related to #75.

Would it be possible to try this on RHEL 8?

@davidkarlsen
Copy link
Author

davidkarlsen commented Aug 24, 2021

It looks the same.
Unfortunately I can't run on RHEL8 as it's not supported for OCP . In my cases I had just deleted some LVs, then created a new one, which probably landed at the same offset, so w/o clearing the old volume it will probably find a magic superblock and avoid formatting w/o using force - so in order to compare rhel7/8 that should be the underlying setting.

@davidkarlsen
Copy link
Author

Can wiping and zeroing be controlled when the volumes are created? I'd recommend having both enabled by default.

@pawanpraka1
Copy link
Contributor

@davidkarlsen that was the plannd item for LVM LocalPV. We already wipe the lvm partition when we delete the volume. From the error it looks like you already had some partition before and volume landed at the same offset. We need to clear the fs at the creation time also. We had planned this and somehow missed implementing it. Will take care of adding this enhancement.

@pawanpraka1 pawanpraka1 added the enhancement New feature or request label Aug 25, 2021
@pawanpraka1 pawanpraka1 added this to the v0.9 milestone Aug 25, 2021
@davidkarlsen
Copy link
Author

Note that the safest is to do wipe at create too.

@pawanpraka1
Copy link
Contributor

pawanpraka1 commented Aug 25, 2021

@davidkarlsen I have raised a PR(#138) to fix it. Can you try with the image pawanpraka1/lvm-driver:vp and see if it is working.

@pawanpraka1
Copy link
Contributor

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

@mittachaitu
Copy link

mittachaitu commented Aug 26, 2021

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

  • Create a volume(PVC) with ext4 fs and launched pod.
  • Delete a pod & volume(PVC)
  • Create a volume with XFS fs and launched pod... then the issue will be reproducible.
    Note: If we create volume again with the same fs as previous one then the application is able to access it.

@davidkarlsen
Copy link
Author

@davidkarlsen can you confirm the lvm driver version you are using? It should be there in beginning of the openebs-lvm-plugin container log in the openebs-lvm-node-xxxx daemonset.

LVM Driver Version :- 0.8.0 - commit :- 929ae44

@davidkarlsen
Copy link
Author

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0?
BTW, when you format, do you pass the -f (force) option?

@mittachaitu
Copy link

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0?
BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

@davidkarlsen
Copy link
Author

This behavior is due to compatibility issues between the container and the host operating system. openebs/lvm-localpv 0.6.0 version is already erasing the fs signatures on LVM volume before creating the volume. Fix was merged via #88 . This issue can be reproduced by performing the following steps:

Hmm, then howcome I experience this problem with 0.8.0?
BTW, when you format, do you pass the -f (force) option?

Yes, we are passing -f(force) option from 0.6.0 version onwards

Then it's a bit surprising to meet this in the current release for two reasons:

  1. If volumes are wiped at creation the superblock should be wiped in the first place and the bug should not surface
  2. If force formatting it should ignore it and pass anyway

I'll try to provoke this in a third cluster when I have time.

@davidkarlsen
Copy link
Author

Tried now with 2.12.2 chart, still same:

                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  3m20s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Warning  FailedScheduling  3m18s               default-scheduler  0/18 nodes are available: 3 Insufficient memory, 3 node(s) had taint {fsapplog: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector, 5 node(s) had taint {fss.tietoevry.com/finods-group: }, that the pod didn't tolerate.
  Normal   Scheduled         3m4s                default-scheduler  Successfully assigned openshift-logging/elasticsearch-cdm-cqg8zvqd-1-5596fc5479-7lmtg to alp-ksx-c01oco05
  Warning  FailedMount       62s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[elasticsearch-storage], unattached volumes=[kube-api-access-29pgd elasticsearch-metrics elasticsearch-storage elasticsearch-config certificates]: timed out waiting for the condition
  Warning  FailedMount       57s (x9 over 3m5s)  kubelet            MountVolume.SetUp failed for volume "pvc-5128b42c-a7c1-403b-b599-2cadf8984328" : rpc error: code = Internal desc = failed to format and mount the volume error: mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-5128b42c-a7c1-403b-b599-2cadf8984328 /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount
Output: mount: /var/lib/kubelet/pods/b7c50bae-72a1-4ae5-9c0d-e23b8e84a5b3/volumes/kubernetes.io~csi/pvc-5128b42c-a7c1-403b-b599-2cadf8984328/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--5128b42c--a7c1--403b--b599--2cadf8984328, missing codepage or helper program, or other error.

@davidkarlsen
Copy link
Author

same problem on 2.12.5

@davidkarlsen
Copy link
Author

From the logs:

I0909 20:35:40.848768       1 grpc.go:72] GRPC call: /csi.v1.Node/NodePublishVolume requests {"target_path":"/var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount","volume_capa
bility":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}},"volume_context":{"csi.storage.k8s.io/ephemeral":"false","csi.storage.k8s.io/pod.name":"prometheus-k8s-1","csi.storage.k8s.io/pod.namespace":"openshift-monitoring","csi.storage.k
8s.io/pod.uid":"179a5e86-43a5-43f7-b78e-b11af4368674","csi.storage.k8s.io/serviceAccount.name":"prometheus-k8s","openebs.io/cas-type":"localpv-lvm","openebs.io/volgroup":"datavg","storage.kubernetes.io/csiProvisionerIdentity":"1631215660348-8081-local.cs
i.openebs.io"},"volume_id":"pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937"}
I0909 20:35:40.864001       1 mount_linux.go:366] Disk "/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937" appears to be unformatted, attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]
I0909 20:35:41.646181       1 mount_linux.go:376] Disk successfully formatted (mkfs): xfs - /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
E0909 20:35:41.648622       1 mount_linux.go:150] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937 /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount
Output: mount: /var/lib/kubelet/pods/179a5e86-43a5-43f7-b78e-b11af4368674/volumes/kubernetes.io~csi/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--d5be05a4--f5f8--4b7e--83b3--b53eaaff8937, missing codepage or helper program, or other error.

note that there is no -f in:
attempting to format as type: "xfs" with options: [/dev/datavg/pvc-d5be05a4-f5f8-4b7e-83b3-b53eaaff8937]

@davidkarlsen
Copy link
Author

The issue lies here: kubernetes/mount-utils#5

@davidkarlsen
Copy link
Author

@mittachaitu
Copy link

Looks like even with above force flag issue is the still the same... When this issue occurred following are the system logs:

Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Superblock has unknown read-only compatible features (0x4) enabled.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Attempted to mount read-only compatible filesystem read-write.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): Filesystem can only be safely mounted read only.
Sep 12 18:51:53 centos-master kernel: XFS (dm-0): SB validate failed with error -22.

Above error -22 leads to EINVAL which means Invalid Argument(As I understood kernel is not yet supporting the same)... and did some googling around above error takes me to this page.

mkfs.xfs version on centos 7: 4.5.0
mkfs.xfs version on container: 5.6.0
Looks like some incompatibility as mentioned in issue...

To resolve issue we have to format xfs filesystem with following option: 'mkfs.xfs -m reflink=0 /dev/lvm/manual1'

Attepmt1 formated with xfs without using any flags:

bash-5.0# lvcreate -n manual1 -L 1G lvm
  Logical volume "manual1" created.
bash-5.0# mkfs.xfs /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Attepmt2 formated with xfs using -m reflink=0 flag:

bash-5.0# lsblk -fa
NAME          FSTYPE      FSVER LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
fd0                                                                                         
loop0         squashfs                                                                      
loop1         squashfs                                                                      
loop2         squashfs                                                                      
sda                                                                                         
├─sda1        xfs                     8808cf9e-0900-4d7a-af19-36bf061d7a24                  
└─sda2        xfs                     72d0dc49-d80f-4aa8-a51f-51e237deb23e     10.9G    62% /var/lib/kubelet
sdb           LVM2_member             IvJ3Z4-PaLm-zZ5j-4oxK-H6dS-pkBk-KjcJSG                
└─lvm-manual1                                                                               
sr0                                                                                         
bash-5.0# lvs
  LV      VG  Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  manual1 lvm -wi-a----- 1.00g                                                    
bash-5.0# mkfs.xfs -m reflink=0 /dev/lvm/manual1 
meta-data=/dev/lvm/manual1       isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
bash-5.0# mount /dev/lvm/manual1 /var/lib/kubelet/mnt/store1
bash-5.0# 
bash-5.0# df -h
Filesystem               Size  Used Avail Use% Mounted on
overlay                   29G   19G   11G  63% /
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda2                 29G   19G   11G  63% /plugin
devtmpfs                 1.9G     0  1.9G   0% /dev
shm                       64M     0   64M   0% /dev/shm
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/32966bd7-fd41-4f49-b572-8a25a1dc802d/volumes/kubernetes.io~secret/kube-proxy-token-tmnfs
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/8e82a39d-d592-4051-83f2-bb372f568246/volumes/kubernetes.io~secret/flannel-token-fpwlc
tmpfs                    1.9G   12K  1.9G   1% /var/lib/kubelet/pods/be4ddd06-bed9-4d18-bb54-26e67c77eb74/volumes/kubernetes.io~secret/openebs-maya-operator-token-sj7w5
tmpfs                    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
---------------------------------------------------------------------------------------------------------------------
| /dev/mapper/lvm-manual1 1014M   33M  982M   4% /var/lib/kubelet/mnt/store1              |
---------------------------------------------------------------------------------------------------------------------
bash-5.0# 
  • Able to mount when used -m reflink=0 flag which say to xfs fs to disables shared copy-on-write feature which is not supported by centos-7(AFAIK)

Redhat document which says to pass reflink option

@davidkarlsen
Copy link
Author

@mittachaitu I believe that's another problem (it has another error-message) - please create a separate issue for that.

@mittachaitu
Copy link

mount: /var/lib/kubelet/mnt/store1: wrong fs type, bad option, bad superblock on /dev/mapper/lvm-manual1, missing codepage or helper program, or other error.

Above is the error I got when tried to mount xfs formatted lvm volume and the issue description is also having a simmilar error.. So I belive both are the same...

Mounting command: mount
Mounting arguments: -t xfs -o defaults /dev/datavg/pvc-c9073859-fd54-4890-b444-b96e6f46dea1 /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.iocsi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount
Output: mount: /var/lib/kubelet/pods/0c34d38c-88f0-4a1c-bf6f-02e6b3ab05cd/volumes/kubernetes.io
csi/pvc-c9073859-fd54-4890-b444-b96e6f46dea1/mount: wrong fs type, bad option, bad superblock on /dev/mapper/datavg-pvc--c9073859--fd54--4890--b444--b96e6f46dea1, missing codepage or helper program, or other error.

The above is from issue description

@davidkarlsen
Copy link
Author

@w3aman could you maybe by any chance pull in my hack on mount_utils? Merging into Kubernetes and waiting for a release will take forever.

@dsharma-dc
Copy link
Contributor

A reasonable update at the moment is to mention in our documentation that a combination of xfs and older kernel (< 5.10) may run into this issue and can be mitigated by updated host node kernel version.

@dsharma-dc dsharma-dc added documentation Improvements or additions to documentation labels Jun 5, 2024
@balaharish7
Copy link

Documented in PR #448 & PR #451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: Done
7 participants