write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device #3085

hitrikrtek · 2023-11-17T08:12:39Z

ReaR version ("/usr/sbin/rear -V"):
Relax-and-Recover 2.7 / 2022-07-13
If your ReaR version is not the current version, explain why you can't upgrade:
Latest version with distribution
OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp5"

ReaR configuration files ("cat /etc/rear/local.conf"):

BACKUP=NETFS
OUTPUT=ISO
BACKUP_URL=nfs://172.17.3.1/rear
BACKUP_OPTIONS=
NETFS_KEEP_OLD_BACKUP_COPY=yes
USE_DHCLIENT=
MODULES_LOAD=( )
#BACKUP_PROG_INCLUDE=("{BACKUP_PROG_INCLUDE[@]}" '/boot/grub2/i386-pc' '/boot/grub2/x86_64-efi' '/home' '/opt' '/root' '/srv' '/hana/shared' '/usr' '/var')
BACKUP_PROG_EXCLUDE=("{BACKUP_PROG_EXCLUDE[@]}" '/hana/backup' '/hana/data' '/hana/log' '/hana/shared/HSP/HDB05/backup' '/hana/shared/NSP/HDB50/backup' '/opt/dpsapps/agentsvc/dbs' '/opt/dpsapps/agentsvc/logs' '/tmp')
POST_RECOVERY_SCRIPT=('if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi')
REQUIRED_PROGS=(snapper chattr lsattr ${REQUIRED_PROGS[@]})
COPY_AS_IS=(/usr/lib/snapper/installation-helper /etc/snapper/config-templates/default ${COPY_AS_IS[@]})
EXCLUDE_MOUNTPOINTS=("/mnt")
ISO_MKISOFS_BIN=/usr/bin/ebiso

Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
DELL PowerEdge R860
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
x86_64
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
BIOS Version 1.5.6
grub2-x86_64-efi-2.06-150500.29.8.1
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
NVMe in RAID configuration

Backplane 1 on Connector 0 of RAID Controller in SL 1
DeviceDescription	Backplane 1 on Connector 0 of RAID Controller in SL 1
DeviceType	PCIeSSDBackPlane
FirmwareVersion	7.10
FQDD	Enclosure.Internal.0-1:RAID.SL.1-1
InstanceID	Enclosure.Internal.0-1:RAID.SL.1-1
MediaType	Solid State Drive
PCIExpressGeneration	Gen 4
ProductName	BP_PSV 0:1
RollupStatus	OK
SlotCount	8
WiredOrder	1

Backplane 2 on Connector 0 of RAID Controller in SL 4
DeviceDescription	Backplane 2 on Connector 0 of RAID Controller in SL 4
DeviceType	PCIeSSDBackPlane
FirmwareVersion	7.10
FQDD	Enclosure.Internal.0-2:RAID.SL.4-1
InstanceID	Enclosure.Internal.0-2:RAID.SL.4-1
MediaType	Solid State Drive
PCIExpressGeneration	Gen 4
ProductName	BP_PSV 0:2
RollupStatus	OK
SlotCount	8
WiredOrder	2

Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):

NAME                          KNAME          PKNAME         TRAN   TYPE FSTYPE      LABEL   SIZE MOUNTPOINT
/dev/sda                      /dev/sda                             disk LVM2_member        23.3T
|-/dev/mapper/vghana-lvusrsap /dev/dm-2      /dev/sda              lvm  xfs                 150G /usr/sap
|-/dev/mapper/vghana-lvdata   /dev/dm-3      /dev/sda              lvm  xfs                   6T /hana/data
|-/dev/mapper/vghana-lvshared /dev/dm-4      /dev/sda              lvm  xfs                   2T /hana/shared
|-/dev/mapper/vghana-lvlog    /dev/dm-5      /dev/sda              lvm  xfs                   4T /hana/log
'-/dev/mapper/vghana-lvbackup /dev/dm-6      /dev/sda              lvm  xfs                   2T /hana/backup
/dev/sdb                      /dev/sdb                      iscsi  disk                     100M
/dev/sdc                      /dev/sdc                      iscsi  disk                     100M
/dev/sdd                      /dev/sdd                      iscsi  disk                       1G
/dev/nvme0n1                  /dev/nvme0n1                  nvme   disk                   447.1G
|-/dev/nvme0n1p1              /dev/nvme0n1p1 /dev/nvme0n1   nvme   part vfat        ESP     250M /boot/efi
|-/dev/nvme0n1p2              /dev/nvme0n1p2 /dev/nvme0n1   nvme   part vfat        OS        2G /boot
'-/dev/nvme0n1p3              /dev/nvme0n1p3 /dev/nvme0n1   nvme   part LVM2_member       444.8G
  |-/dev/mapper/vgroot-root   /dev/dm-0      /dev/nvme0n1p3        lvm  btrfs             442.8G /var
  '-/dev/mapper/vgroot-swap   /dev/dm-1      /dev/nvme0n1p3        lvm  swap                  2G [SWAP]

Description of the issue (ideally so that others can reproduce it):

When running rear recover we get this error:

Workaround, if any:
None
Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
rear-srv-lx2013.log
See attached debug log: rear-srv-lx2013.log

The text was updated successfully, but these errors were encountered:

jsmeix · 2023-11-23T08:22:48Z

@hitrikrtek
what is the output of each of the commands

# echo /sys/block/*

# ls -l /dev/nvme0c0n1

# ls -l /sys/block/nvme0c0n1

# ls -l $( readlink -e /sys/block/nvme0c0n1 )

?

Details:

The following excerpt from your
https://github.com/rear/rear/files/13389016/rear-srv-lx2013.log
shows how it fails:

++ for current_device_path in /sys/block/*
++ current_disk_name=nvme0c0n1
++ is_multipath_path nvme0c0n1
++ test nvme0c0n1
++ is_multipath_used
++ type multipath
++ return 1
++ return 1
++ test -d /sys/block/nvme0c0n1/queue
++ test 0 = 1
++ is_write_protected /sys/block/nvme0c0n1
+++ write_protected_candidate_device /sys/block/nvme0c0n1
+++ local device=/sys/block/nvme0c0n1
+++ [[ /sys/block/nvme0c0n1 == /sys/block/* ]]
++++ get_device_name /sys/block/nvme0c0n1
++++ local name=/sys/block/nvme0c0n1
++++ name=nvme0c0n1
++++ contains_visible_char nvme0c0n1
+++++ tr -d -c '[:graph:]'
++++ test nvme0c0n1
++++ [[ nvme0c0n1 =~ ^mapper/ ]]
++++ [[ -L /dev/nvme0c0n1 ]]
++++ [[ nvme0c0n1 =~ ^dm- ]]
++++ name=nvme0c0n1
++++ echo /dev/nvme0c0n1
++++ [[ -r /dev/nvme0c0n1 ]]
++++ return 1
+++ device=/dev/nvme0c0n1
+++ test -b /dev/nvme0c0n1
+++ BugError 'write_protected_candidate_device called for '\''/sys/block/nvme0c0n1'\'' but '\''/dev/nvme0c0n1'\'' is no block device'

This looks strange because it seems in your case
there is "nvme0c0n1" in /sys/block/ where

test -b /dev/nvme0c0n1

fails so it seems "nvme0c0n1" is a block device
(because "nvme0c0n1" is in /sys/block/) that
either has no matching /dev/nvme0c0n1 device node
or it has a matching /dev/nvme0c0n1 device node
but that /dev/nvme0c0n1 device node is no block device.

By "googling" for 'nvme0c0n1' I found
google/cadvisor#3340

Not all /sys/block devices will have a "dev" file

which seems to explain the root cause.

For comparison:
On my laptop with a NVMe disk I have /sys/block/nvme0n1
and I have a matching device node /dev/nvme0n1
which is a block device.

The matching code is in
usr/share/rear/layout/prepare/default/250_compare_disks.sh
which is online for ReaR 2.7 starting at
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/default/250_compare_disks.sh#L94

The is_write_protected and write_protected_candidate_device
functions are in
usr/share/rear/lib/write-protect-functions.sh
which is online for ReaR 2.7
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/lib/write-protect-functions.sh

So it seems we have to enhance the
write_protected_candidate_device function
to ignore it when it is called with a device
where no matching /dev/... device node exists.

hitrikrtek · 2023-11-23T08:28:58Z

Hi @jsmeix
First of all thank you for your time looking into this.
I've ran the commands you requested and here's the output:

# echo /sys/block/*
/sys/block/dm-0 /sys/block/dm-1 /sys/block/dm-2 /sys/block/dm-3 /sys/block/dm-4 /sys/block/dm-5 /sys/block/dm-6 /sys/block/nvme0c0n1 /sys/block/nvme0n1 /sys/block/sda /sys/block/sdb /sys/block/sdc /sys/block/sdd

# ls -l /dev/nvme0c0n1
ls: cannot access '/dev/nvme0c0n1': No such file or directory

# ls -l /sys/block/nvme0c0n1
lrwxrwxrwx 1 root root 0 Nov 21 13:38 /sys/block/nvme0c0n1 -> ../devices/pci0000:00/0000:00:0a.0/0000:01:00.0/nvme/nvme0/nvme0c0n1

# ls -l $( readlink -e /sys/block/nvme0c0n1 )
total 0
-r--r--r-- 1 root root 4096 Nov 23 09:26 alignment_offset
-r--r--r-- 1 root root 4096 Nov 23 09:26 capability
lrwxrwxrwx 1 root root    0 Nov 21 13:38 device -> ../../nvme0
-r--r--r-- 1 root root 4096 Nov 23 09:26 discard_alignment
-r--r--r-- 1 root root 4096 Nov 23 09:26 diskseq
-r--r--r-- 1 root root 4096 Nov 23 09:26 eui
-r--r--r-- 1 root root 4096 Nov 23 09:26 events
-r--r--r-- 1 root root 4096 Nov 23 09:26 events_async
-rw-r--r-- 1 root root 4096 Nov 23 09:26 events_poll_msecs
-r--r--r-- 1 root root 4096 Nov 23 09:26 ext_range
-r--r--r-- 1 root root 4096 Nov 23 09:26 hidden
drwxr-xr-x 2 root root    0 Nov 23 09:26 holders
-r--r--r-- 1 root root 4096 Nov 23 09:26 inflight
drwxr-xr-x 2 root root    0 Nov 23 09:26 integrity
-rw-r--r-- 1 root root 4096 Nov 23 09:26 make-it-fail
drwxr-xr-x 5 root root    0 Nov 23 09:26 mq
-r--r--r-- 1 root root 4096 Nov 23 09:26 nsid
drwxr-xr-x 2 root root    0 Nov 23 09:26 power
drwxr-xr-x 2 root root    0 Nov 23 09:26 queue
-r--r--r-- 1 root root 4096 Nov 23 09:26 range
-r--r--r-- 1 root root 4096 Nov 23 09:26 removable
-r--r--r-- 1 root root 4096 Nov 23 09:26 ro
-r--r--r-- 1 root root 4096 Nov 23 09:26 size
drwxr-xr-x 2 root root    0 Nov 23 09:26 slaves
-r--r--r-- 1 root root 4096 Nov 23 09:26 stat
lrwxrwxrwx 1 root root    0 Nov 21 13:38 subsystem -> ../../../../../../../class/block
drwxr-xr-x 2 root root    0 Nov 23 09:26 trace
-rw-r--r-- 1 root root 4096 Nov 21 13:38 uevent
-r--r--r-- 1 root root 4096 Nov 23 09:26 wwid

jsmeix · 2023-11-23T08:31:17Z

# echo /sys/block/*
... /sys/block/nvme0c0n1 /sys/block/nvme0n1 ...

# ls -l /dev/nvme0c0n1
ls: cannot access '/dev/nvme0c0n1': No such file or directory

proves that the root cause is that

Not all /sys/block devices will have a "dev" file

so we have to enhance the
write_protected_candidate_device function
to ignore it when it is called with a device
where no matching /dev/... device node exists.

Let the is_write_protected function report candidate devices without device node as not write protected because not all /sys/block devices have a "dev" file e.g. /sys/block/nvme0c0n1 has no /dev/nvme0c0n1 device node see #3085 Because the is_write_protected function is meant to identify write-protected disks and partitions only candidate devices with a device node are considered for write protection

jsmeix · 2023-11-23T10:08:14Z

@hitrikrtek

I cannot reproduce your issue
because I don't have a system with /sys/block/nvme0c0n1
or something similar - i.e. where a /sys/block/device
does not have a matching /dev/device.

So as an offhanded untested workaround for now
you may try out how things behave when you replace in
usr/share/rear/lib/write-protect-functions.sh

test -b "$device" || BugError "write_protected_candidate_device called for '$1' but '$device' is no block device"

with

test -b "$device" || DebugPrint "write_protected_candidate_device called for '$1' but '$device' is no block device"

Perhaps with that you get other errors in other functions
in usr/share/rear/lib/write-protect-functions.sh

Alternatively (and preferred by me if you can) you could test
if my overhauled usr/share/rear/lib/write-protect-functions.sh
https://raw.githubusercontent.com/rear/rear/a403b5fe1c2c58420ba1b77db52283c041e4f7d4/usr/share/rear/lib/write-protect-functions.sh
in
#3091
makes things work well for your case.

hitrikrtek · 2023-11-23T10:16:45Z

Alright! I've grabbed your updated function and am creating a backup now. Will report back the result.
Thank you

jsmeix · 2023-11-23T12:16:06Z

@hitrikrtek
with

OUTPUT=ISO

you should get nothing automatically set for
WRITE_PROTECTED_IDS and/or WRITE_PROTECTED_FS_LABEL_PATTERNS
after "rear mkrescue/mkbackup" in your
/var/tmp/rear.XXX/rootfs/etc/rear/rescue.conf
When you also have
neither WRITE_PROTECTED_IDS
nor WRITE_PROTECTED_FS_LABEL_PATTERNS
specified in your etc/rear/local.conf file
the all is_write_protected function calls
would always result that nothing is write-protected
so the is_write_protected function call could be
simplified and made behave more robust against errors
when it checks if WRITE_PROTECTED_IDS
and WRITE_PROTECTED_FS_LABEL_PATTERNS are empty
and in this case directly return 1 (i.e. "not write-protected").

hitrikrtek · 2023-11-23T12:49:16Z

Regarding the updated function - this seems to work now, we were able to do restore - but now we're experiencing some other issues after restore our network broke somehow... all disks and configs seem just fine, we're investigating but I don't think it's related to this device. I'll post my finding here, if we figure it out.

jsmeix · 2023-11-23T12:52:54Z

@hitrikrtek
please do not post other things that do not belong to this issue
in this issue here but instead create new separated issues
for each separated issue that you have.
Otherwise all would get messed up.

schlomo · 2024-01-13T17:48:52Z

@hitrikrtek just going over some stuff, a quick question: Your setup has NVMe multipath, correct? I'm asking because https://lore.kernel.org/all/3c725e5deaabaaf145f48f2f6fcfdae9f6d41e2e.camel@suse.de/t/ suggests that the specific sysfs path you encounter is related to the multipath setup, where indeed I understand that there are more sysfs block devices listed than actual dev block devices, because multipathing means that there are multiple "backend" block devices for the actually same disk.

I'm asking this because I'm wondering if ReaR actually correctly recognizes NVMe multipath setups as multpath setups, or if it tries to handle them are "regular" disks.

Of course, a proper fix would start from access to such a NVMe multpath device...

hitrikrtek · 2024-01-15T05:23:35Z

@schlomo it is NVMe in RAID1 configuration (two disks, but it is a HW raid controller) so I guess it is multipath yes. We've tried restore also and it works, so it seems the current fix works ok for this configuration. If you want me to send you some more info from the system, just let me know what to send.

jsmeix · 2024-01-15T10:19:27Z

I need more explanatory information
to be able to correctly imagine things
because I am not at all a multipath user
neither "traditional" multipath with non-NVMe disks
nor current multipath with NVMe.

Therefore I have some questions:

(1)
First about the wording
"NVMe native multipathing" versus "NVMe multipathing":

https://lore.kernel.org/all/3c725e5deaabaaf145f48f2f6fcfdae9f6d41e2e.camel@suse.de/t/
and
https://documentation.suse.com/de-de/sles/15-SP1/html/SLES-all/cha-nvmeof.html
talk about "NVMe native multipathing".
But here we talk about "NVMe multipathing" in particular
#3091 (comment)
Are "NVMe native multipathing" and "NVMe multipathing"
the same thing or is there a difference?

(2)
Second about kernel device nodes
versus "other devices" like "device" smylinks to "whatever":

As far as I know in traditional multipath with a non-NVMe disk
that is accessible via two separated ways (called "paths")
all involved devices have kernel device nodes for example
traditional multipath with a non-NVMe disk the path devices
/dev/sda and /dev/sdb for the two paths to the disk
plus via device mapper the multipath map device
/dev/dm-1 for the disk (via any of the two paths).

How does this look like with
"NVMe native multipathing" and/or "NVMe multipathing"?

https://lore.kernel.org/all/3c725e5deaabaaf145f48f2f6fcfdae9f6d41e2e.camel@suse.de/t/
only talks about "devices" in /sys/
but I would like to understand the relationship
between kernel device nodes (i.e. things in /dev/)
versus "other devices" like things in /sys/ for
"NVMe native multipathing" and/or "NVMe multipathing".

(3)
Third about multipathing versus RAID1:

As far as I know multipathing is about accessing
one same disk via several separated ways/paths.
RAID1 is about mirroring data on two different disks
so each of the two RAID1 disks is accessible via
its own specific way of access so RAID1 implies
multipath-like behaviour.
But (as far as I know) "RAID1 implicit multipath"
is not what is normally meant when one talks about multipath.

(4)
Fourth about hardware RAID1 versus software RAID1:

When "traditional" non-NVMe hardware RAID1 is used
there is only one disk device node (e.g. /dev/sda)
which behaves same as a normal single disk.

When "traditional" non-NVMe software RAID1 is used
two normal single disks (e.g. /dev/sda and /dev/sdb)
are combined by the Linux kernel into one
software RAID1 kernel device node (e.g. /dev/md127)
where the RAID1 kernel device node behaves
same as a normal single disk.

Here is seems that what is called "NVMe hardware RAID1"
does not behave same as a normal single NVMe disk.

…ite protected (#3091) In lib/write-protect-functions.sh let the is_write_protected function report candidate devices without device node as not write protected because not all /sys/block devices have a "dev" file e.g. /sys/block/nvme0c0n1 has no /dev/nvme0c0n1 device node see #3085 Because the is_write_protected function is meant to identify write-protected disks and partitions, only candidate devices with a device node are considered for write protection. Additionally completely overhauled lib/write-protect-functions.sh which now contains only the is_write_protected function (the only one intended to be called by other scripts). The former helper functions are not needed as functions because all what happens is a single straightforward sequence of actions.

jsmeix added enhancement Adaptions and new features waiting for info support / question labels Nov 23, 2023

jsmeix self-assigned this Nov 23, 2023

jsmeix added this to the ReaR v2.8 milestone Nov 23, 2023

jsmeix mentioned this issue Nov 23, 2023

Let is_write_protected() report devices without device node as not write protected #3091

Merged

jsmeix added bug The code does not do what it is meant to do and removed waiting for info support / question labels Nov 23, 2023

jsmeix closed this as completed Nov 23, 2023

ramzcode mentioned this issue Dec 1, 2023

Device nvme0n1 is designated as write-protected (needs manual configuration) #3096

Closed

jsmeix added the waiting for info label Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device #3085

write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device #3085

hitrikrtek commented Nov 17, 2023 •

edited by jsmeix

jsmeix commented Nov 23, 2023

hitrikrtek commented Nov 23, 2023 •

edited by jsmeix

jsmeix commented Nov 23, 2023 •

edited

jsmeix commented Nov 23, 2023 •

edited

hitrikrtek commented Nov 23, 2023

jsmeix commented Nov 23, 2023

hitrikrtek commented Nov 23, 2023

jsmeix commented Nov 23, 2023

schlomo commented Jan 13, 2024

hitrikrtek commented Jan 15, 2024 •

edited

jsmeix commented Jan 15, 2024 •

edited

write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device #3085

write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device #3085

Comments

hitrikrtek commented Nov 17, 2023 • edited by jsmeix

jsmeix commented Nov 23, 2023

hitrikrtek commented Nov 23, 2023 • edited by jsmeix

jsmeix commented Nov 23, 2023 • edited

jsmeix commented Nov 23, 2023 • edited

hitrikrtek commented Nov 23, 2023

jsmeix commented Nov 23, 2023

hitrikrtek commented Nov 23, 2023

jsmeix commented Nov 23, 2023

schlomo commented Jan 13, 2024

hitrikrtek commented Jan 15, 2024 • edited

jsmeix commented Jan 15, 2024 • edited

hitrikrtek commented Nov 17, 2023 •

edited by jsmeix

hitrikrtek commented Nov 23, 2023 •

edited by jsmeix

jsmeix commented Nov 23, 2023 •

edited

jsmeix commented Nov 23, 2023 •

edited

hitrikrtek commented Jan 15, 2024 •

edited

jsmeix commented Jan 15, 2024 •

edited