Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better is_multipath_path function (issue #2298) plus some other stuff #2299

Conversation

jsmeix
Copy link
Member

@jsmeix jsmeix commented Dec 11, 2019

  • Type: Bug Fix / Cleanup

  • Impact: High
    In my case disklayout.conf became totally useless.
    If the '/dev/sda' entries were only commented out
    the bad disklayout.conf could still be manually adapted
    but without any '/dev/sda' entries it is useless in practice
    (I cannot guess during "rear recover" what right values could be).

  • Reference to related issue (URL):
    Normal /dev/sda by "multipath -c /dev/sda" falsely recognized as multipath device (DM_MULTIPATH_DEVICE_PATH="1") #2298

  • How was this pull request tested?
    Currently untested preliminary initial state.
    "rear mkrescue" looks goot to me with that changes
    but I need to test "rear recover" (hopefully tomorrow).

  • Brief description of the changes in this pull request:

Better (i.e. hopefully more fail safe) is_multipath_path function, cf.
#2298 (comment)

Have lsblk output as disklayout.conf header comments
so that it is easier to make sense of the values in the subsequent entries.

Some TODO comments added in layout/prepare/default/010_prepare_files.sh
because I wonder about how some things therein are meant to work.

Added xdd (belongs to vi) to the PROGS array because also
a tool to display binary files is needed in the recovery system.

… as disklayout.conf header plus TODO comments in layout/prepare/default/010_prepare_files.sh
@jsmeix jsmeix added enhancement Adaptions and new features bug The code does not do what it is meant to do cleanup labels Dec 11, 2019
@jsmeix jsmeix added this to the ReaR v2.6 milestone Dec 11, 2019
@jsmeix jsmeix requested review from schabrolles and a team December 11, 2019 15:55
@jsmeix jsmeix self-assigned this Dec 11, 2019
@jsmeix
Copy link
Member Author

jsmeix commented Dec 11, 2019

With the lsblk output as disklayout.conf header comments
I get this disklayout.conf:

# Disk layout dated 20191211160659 (YYYYmmddHHMMSS)
# NAME        KNAME     PKNAME   TRAN   TYPE FSTYPE   SIZE MOUNTPOINT
# /dev/sda    /dev/sda           sata   disk        931.5G 
# |-/dev/sda1 /dev/sda1 /dev/sda        part vfat     500M 
# |-/dev/sda2 /dev/sda2 /dev/sda        part          100M 
# |-/dev/sda3 /dev/sda3 /dev/sda        part swap      20G [SWAP]
# |-/dev/sda4 /dev/sda4 /dev/sda        part ext4     400G /
# `-/dev/sda5 /dev/sda5 /dev/sda        part ext4     400G /other
# /dev/sdb    /dev/sdb           usb    disk        465.8G 
# `-/dev/sdb1 /dev/sdb1 /dev/sdb        part ext3    46.6G 
# /dev/sr0    /dev/sr0           sata   rom          1024M 
# Disk /dev/sda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sda 1000204886016 gpt
# Partitions on /dev/sda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sda 524288000 1048576 rear-noname boot,legacy_boot,esp /dev/sda1
part /dev/sda 104857600 525336576 rear-noname bios_grub /dev/sda2
part /dev/sda 21474836480 630194176 rear-noname swap /dev/sda3
part /dev/sda 429496729600 22105030656 rear-noname none /dev/sda4
part /dev/sda 429496729600 451601760256 rear-noname none /dev/sda5
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdb 500107859968 msdos
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdb 50002395136 8388608 primary boot /dev/sdb1
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/sda4 / ext4 uuid=cff8eaf4-2369-439b-8ef2-620dd515d767 label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16384 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
fs /dev/sda5 /other ext4 uuid=2aa61137-5fc3-4fca-8b8e-959bc4c3676d label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16384 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/sda3 uuid=6d4a07be-f9d7-4e56-a141-75758754e822 label=

cf. #2298 (comment)

As far as I see the initial comment header in disklayout.conf

# Disk layout dated 20191211160659  ...

that changes each time when "rear mkrescue" is run
cannot cause a falsely detected changed layout by "rear checklayout"
because layout/compare/default/500_compare_layout.sh
compares only non-comment lines in disklayout.conf

@jsmeix
Copy link
Member Author

jsmeix commented Dec 11, 2019

I did a "rear recover" test on my laptop
and from what one can see on the terminal all looks OK
(excerpts):

Welcome to Relax-and-Recover. Run "rear recover" to restore your system !

RESCUE linux-88cr:~ # lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 931.5G  0 disk 
|-sda1   8:1    0   500M  0 part 
|-sda2   8:2    0   100M  0 part 
|-sda3   8:3    0    20G  0 part 
|-sda4   8:4    0   400G  0 part 
`-sda5   8:5    0   400G  0 part 
sdb      8:16   0 465.8G  0 disk 
`-sdb1   8:17   0  46.6G  0 part 
sr0     11:0    1  1024M  0 rom  

RESCUE linux-88cr:~ # cat ./rear.github.master/var/lib/rear/layout/disklayout.conf
# Disk layout dated 20191211170123 (YYYYmmddHHMMSS)
# NAME        KNAME     PKNAME   TRAN   TYPE FSTYPE   SIZE MOUNTPOINT
# /dev/sda    /dev/sda           sata   disk        931.5G 
# |-/dev/sda1 /dev/sda1 /dev/sda        part vfat     500M 
# |-/dev/sda2 /dev/sda2 /dev/sda        part          100M 
# |-/dev/sda3 /dev/sda3 /dev/sda        part swap      20G [SWAP]
# |-/dev/sda4 /dev/sda4 /dev/sda        part ext4     400G /
# `-/dev/sda5 /dev/sda5 /dev/sda        part ext4     400G /other
# /dev/sdb    /dev/sdb           usb    disk        465.8G 
# `-/dev/sdb1 /dev/sdb1 /dev/sdb        part ext3    46.6G 
# /dev/sr0    /dev/sr0           sata   rom          1024M 
# Disk /dev/sda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sda 1000204886016 gpt
# Partitions on /dev/sda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sda 524288000 1048576 rear-noname boot,legacy_boot,esp /dev/sda1
part /dev/sda 104857600 525336576 rear-noname bios_grub /dev/sda2
part /dev/sda 21474836480 630194176 rear-noname swap /dev/sda3
part /dev/sda 429496729600 22105030656 rear-noname none /dev/sda4
part /dev/sda 429496729600 451601760256 rear-noname none /dev/sda5
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdb 500107859968 msdos
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdb 50002395136 8388608 primary boot /dev/sdb1
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/sda4 / ext4 uuid=cff8eaf4-2369-439b-8ef2-620dd515d767 label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16384 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
fs /dev/sda5 /other ext4 uuid=2aa61137-5fc3-4fca-8b8e-959bc4c3676d label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16384 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/sda3 uuid=6d4a07be-f9d7-4e56-a141-75758754e822 label=

RESCUE linux-88cr:~ # rear -D recover
Relax-and-Recover 2.5 / Git
Running rear recover (PID 839)
Using log file: /var/log/rear/rear-linux-88cr.log
Running workflow recover within the ReaR rescue/recovery system
Using backup archive '/tmp/rear.UktcIFtnmwX1eU0/outputfs/rear/linux-88cr/ReaRbackup/backup.tar.gz'
Will do driver migration (recreating initramfs/initrd)
Calculating backup archive size
Backup archive size is 2.5G     /tmp/rear.UktcIFtnmwX1eU0/outputfs/rear/linux-88cr/ReaRbackup/backup.tar.gz (compressed)
Comparing disks
Device sda has expected (same) size 1000204886016 (will be used for recovery)
Disk configuration looks identical
UserInput -I DISK_LAYOUT_PROCEED_RECOVERY needed in /usr/share/rear/layout/prepare/default/250_compare_disks.sh line 148
Proceed with recovery (yes) otherwise manual disk layout configuration is enforced
(default 'yes' timeout 30 seconds)

UserInput: No real user input (empty or only spaces) - using default input
UserInput: No choices - result is 'yes'
User confirmed to proceed with recovery
Start system layout restoration.
Disk '/dev/sda': creating 'gpt' partition table
Disk '/dev/sda': creating partition number 1 with name ''sda1''
Disk '/dev/sda': creating partition number 2 with name ''sda2''
Disk '/dev/sda': creating partition number 3 with name ''sda3''
Disk '/dev/sda': creating partition number 4 with name ''sda4''
Disk '/dev/sda': creating partition number 5 with name ''sda5''
Creating filesystem of type ext4 with mount point / on /dev/sda4.
Mounting filesystem /
Creating filesystem of type ext4 with mount point /other on /dev/sda5.
Mounting filesystem /other
Creating swap on /dev/sda3
Disk layout created.
Restoring from '/tmp/rear.UktcIFtnmwX1eU0/outputfs/rear/linux-88cr/ReaRbackup/backup.tar.gz' (restore log in /var/lib/rear/restore/recover.backup.tar.gz.839.restore.log) ...
Backup restore program 'tar' started in subshell (PID=2760)
Restored 336 MiB [avg. 114846 KiB/sec] 
...
Restored 6123 MiB [avg. 87085 KiB/sec] 
OK
Restored 6176 MiB in 75 seconds [avg. 84327 KiB/sec]
Restoring finished (verify backup restore log messages in /var/lib/rear/restore/recover.backup.tar.gz.839.restore.log)
Recreating directories (with permissions) from /var/lib/rear/recovery/directories_permissions_owner_group
Migrating disk-by-id mappings in certain restored files in /mnt/local to current disk-by-id mappings ...
Migrating network configuration files according to the mapping files ...
Running mkinitrd...
Recreated initrd (/sbin/mkinitrd).
Installing GRUB2 boot loader...
Determining where to install GRUB2 (no GRUB2_INSTALL_DEVICES specified)
Found possible boot disk /dev/sda - installing GRUB2 there
Finished recovering your system. You can explore it under '/mnt/local'.
Exiting rear recover (PID 839) and its descendant processes ...
Running exit tasks
You should also rm -Rf /tmp/rear.UktcIFtnmwX1eU0

RESCUE linux-88cr:~ # reboot

BUT
the recreated system can no longer boot from its built-in /dev/sda
(the BIOS tells that an operating system needs to be installed on the disk)
but it can be booted from the USB disk ReaR recovery system bootloader
in its syslinux boot menue by selecting Boot Local disk (hd1)
cf. #2276 (comment)
so something on the built-in /dev/sda got lost or damaged by "rear recover"
that is needed by that laptop's firmware (UEFI capable firmware that I use
in legacy BIOS mode) to boot.

@jsmeix
Copy link
Member Author

jsmeix commented Dec 12, 2019

With the essential help of a colleague the
no longer boot from built-in /dev/sda after rear recover
problem was solved:

The missing piece on /dev/sda that got destroyed by "rear recover"
was the enabled boot flag on the GPT’s protective MBR partition, cf.
https://www.gnu.org/software/parted/manual/html_node/disk_005fset.html
that reads

Command: disk_set flag state

Changes a flag on the disk.
A flag can be either “on” or “off”.
Some or all of these flags will be available,
depending on what disk label you are using:

‘pmbr_boot’
(GPT) - this flag enables the boot flag
on the GPT’s protective MBR partition.

The disk’s flags are displayed by the print command
on the "Disk Flags:" line. They are also output as
the last field of the disk information in machine mode.

(parted) disk_set pmbr_boot on

Set the PMBR’s boot flag. 

How things looked in the ReaR recovery system
where we fixed that:

Before:

RESCUE linux-88cr:~ # parted -s /dev/sda unit MiB print

Model: ATA HGST HTS541010A9 (scsi)
Disk /dev/sda: 953870MiB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 
Number  Start      End        Size       File system     Name  Flags
 1      1.00MiB    501MiB     500MiB     fat16           sda1  boot, legacy_boot, esp
 2      501MiB     601MiB     100MiB                     sda2  bios_grub
 3      601MiB     21081MiB   20480MiB   linux-swap(v1)  sda3  swap
 4      21081MiB   430681MiB  409600MiB  ext4            sda4
 5      430681MiB  840281MiB  409600MiB  ext4            sda5

After:

RESCUE linux-88cr:~ # parted -s /dev/sda unit GiB print

Model: ATA HGST HTS541010A9 (scsi)
Disk /dev/sda: 932GiB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot
Number  Start    End      Size     File system     Name  Flags
 1      0.00GiB  0.49GiB  0.49GiB  fat16           sda1  boot, legacy_boot, esp
 2      0.49GiB  0.59GiB  0.10GiB                  sda2  bios_grub
 3      0.59GiB  20.6GiB  20.0GiB  linux-swap(v1)  sda3  swap
 4      20.6GiB  421GiB   400GiB   ext4            sda4
 5      421GiB   821GiB   400GiB   ext4            sda5

The difference is Disk Flags: empty which does not work
versus Disk Flags: pmbr_boot which is needed
at least by my UEFI firmware that I use in legacy BIOS mode
to boot from a GPT disk with GRUB2 installed "in MBR"
according to (xdd output shrinked):

RESCUE linux-88cr:~ # dd if=/dev/sda of=/sda.mbr.bin bs=512 count=1

RESCUE linux-88cr:~ # xxd -g 32 -c 32 /sda.mbr.bin
00000000: eb...00  .c..............................
00000020: 00...00  ................................
00000040: 00...00  ................................
00000060: 00...bc  ...........t...pt....y|..1......
00000080: 00...72  . ..d|<.t...R..}.....|.A..U..ZRr
000000a0: 3d...5c  =..U.u7...t21..D.@.D..D.....f..\
000000c0: 7c...08  |f.\.f..`|f.\..D..p.B..r...p.v..
000000e0: cd..d1  ..s.Z........}...f....d.@f.D....
00000100: c1...5c  .......@.D.......f..f.`|f..uNf.\
00000120: 7c...88  |f1.f.4..1.f.t.;D.}7....0.......
00000140: d0...80  .Z....p..1......r...`......1....
00000160: 8e...fe  ......a.&Z|..}....}.4...}.......
00000180: 47...72  GRUB .Geom.Hard Disk.Read. Error
000001a0: 0d...00  ...........<.u..................
000001c0: 02...00  ...........mpt..................
000001e0: 00...aa  ..............................U.

By default neither xdd nor hexdump is in the ReaR recovery system.
I learned hereby a tool to display binary files is needed.

I prefer xdd over hexdump in the recovery system
because xdd is more convenient to use,
belongs to vi that we have anyway in the recovery system,
/usr/bin/xxd is smaller than /usr/bin/hexdump
(19K versus 51K on my openSUSE Leap 15.0 system) and
/usr/bin/xxd needs less libraries than /usr/bin/hexdump

# ldd /usr/bin/xxd
        linux-vdso.so.1 (0x00007fff367ec000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f55ccf72000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f55cd532000)

# ldd /usr/bin/hexdump
        linux-vdso.so.1 (0x00007fffff9fb000)
        libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f038d077000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f038ccbd000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f038d4b2000)

so I will add xdd to the PROGS array.

…play binary files is needed in the recovery system
@jsmeix
Copy link
Member Author

jsmeix commented Dec 12, 2019

The fix for the
no longer boot from built-in /dev/sda after rear recover
problem will be done separatedly via
#2300

parted "disk_set pmbr_boot on" needed in "rear recover"
to boot from GPT disk on BIOS system

@jsmeix
Copy link
Member Author

jsmeix commented Dec 12, 2019

@rear/contributors
I would like to merge this pull request soon (ideally tomorrow)
unless there are objections.

@jsmeix
Copy link
Member Author

jsmeix commented Dec 13, 2019

If there are no objections I merge it today afternoon.

@jsmeix
Copy link
Member Author

jsmeix commented Dec 13, 2019

@schabrolles
could you have a look here (provided you find a bit of time)?

In particular at my change of the is_multipath_path function:
https://github.com/rear/rear/pull/2299/files#diff-cc2bbe1c50edffdb8abd2b4ab63944a3

Perhaps you see something that is obviously wrong.

I would like to merge it today afternoon if there are no objections.

@jsmeix jsmeix merged commit d8946bc into rear:master Dec 13, 2019
@jsmeix jsmeix deleted the better_is_multipath_path_function_plus_lsblk_output_as_disklayout_conf_header branch December 13, 2019 14:24
@jsmeix
Copy link
Member Author

jsmeix commented Dec 13, 2019

@schabrolles
thank you for your prompt review!

I appreciate it because I know you have
currently almost no time for ReaR.

# so that no "multipath -l" output could clutter the log (the "multipath -l" output is irrelevant here)
# in contrast to e.g. test "$( multipath -l )" that would falsely succeed with blank output
# and the output would appear in the log in 'set -x' debugscript mode:
multipath -l | grep -q '[[:alnum:]]' || return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reintroduces a performance problem that was fixed in PR #2034. multipath -l scans all devices and the time it takes is proportional to their number, so the total time spent on all this is quadratic in the number of devices. For 1600 devices it used to take 300 s, now it takes 4445 s. See #2597 (comment), #2597 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and foremost:
I am not at all a multipath expert
but nevertheless somehow it is mostly me
who has to deal with bad multipath code
and I get step by step more and more annoyed about that.

I neither see how I re-introduced the performance problem
nor do I see how the multipath -l performance problem
was fixed by #2034
because as far as I see its changes do not contain multipath -l
but as multipath noob I can easily miss imortant things here.

Furthermore we use multipath -l at other places in our code
and as far as I know this is the only simple and reliably working
commad to find out in general whether or not multipath is used at all.
So I introduced the multipath -l test to avoid that the subsequent
wrong working multipath -c when multipath is not used
can mess up all on "normal" (i.e. non-multipath) systems.

This pull request was even approved by @schabrolles
who was our multipath expert who maintained it so well in ReaR.
But unfortunately he gets no longer time from his employer IBM
to further maintain ReaR which proves things are OK as is for IBM.
I mean:
It is those big systems with multipath where IBM makes money with
but when IBM does no longer care about ReaR
then IBM gets what they pay for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jsmeix , sorry for the wrong formulation. The problem is new, but the problem fixed #2034 was also due to quadratic time complexity in multipath code, so the result is indistinguishable (inordinate amount of time spent creating the rescue image when there are lots of multipath devices) and for the end user (or a quality engineer) it looks like a regression.

Concerning

I am not at all a multipath expert
but nevertheless somehow it is mostly me
who has to deal with bad multipath code
and I get step by step more and more annoyed about that.

I am also not a multipath expert, but I may slowly become one, because I also have to deal with multipath issues in ReaR from time to time. So, feel free to assign multipath issues to me, or tag me for review of changes.

Concerning this particular change: do you have a test case for the problem that it fixed (#2298)? I will need to verify that any fix I am going to make does not reintroduce the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm still in vacation ;) but I had a quick look on it. If I understand well, the main main issue here is the fact that multipath -l is listing all the devices and can some time when a huge number of devices are connected to the server. And because this function will be executed for each device it's even worse.

My 2 cents:

  1. @pcahyna, could you try on your system with a lot of devices if the following command is quicker than multipath -l
    dmsetup ls --target multipath

  2. The other other option would be to remove this test from "the loop" for each device.
    May be create a first test that create a variable like multipath_device_count=$(dmsetup ls --target multipath | wc -l).
    This variable could be use to enter into specific code to treat multipath devices.

And sorry again if my contribution to rear was really minimal those days. I was never paid by IBM to do it, I'm a presale and my main activity today is around Openshift on Power. So it is more difficult for me to taking care of rear. It is on my very reduced personal time, on top of my job activity, but I also need to take care of my family and myself. thanks for your understanding.

Copy link
Member Author

@jsmeix jsmeix Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcahyna
thank you for your explanatory reply.

I have no (automated) test cases.
I only test manually (I know, I know ... ;-)
In particular #2298
never happened on my virtual machines.
For me it only happened on one specific laptop.
Also #2321
was on real hardware "ThinkSystem SR630".
So an automated test that is run on virtual machines
would likely have never found that particular issuse.

But I have an idea how to generically solve
the underlying generic issue:

The underlying generic issue is that
no multipath code should be run when multipath is not used.
I.e. on non-multipath systems all multipath code should be skipped.

To do that it is sufficient to run a test only once
to find out whether or not multipath is used
and remember the result (in a global variable)
and then only use that global variable.

So we could have a new function like
(just off the top of my head to show the idea - not at all tested):

function is_multipath_used {
    # 'multipath -l' is the only simple and reliably working commad
    # to find out in general whether or not multipath is used at all.
    # But 'multipath -l' scans all devices and the time it takes is proportional
    # to their number so that time would become rather long (seconds up to minutes)
    # if 'multipath -l' was called for each one of hundreds or thousands of devices.
    # So we call 'multipath -l' only once and remember the result
    # in a global variable and then only use that global variable
    # so we can call is_multipath_used very many times as often as needed.
    is_true $MULTIPATH_IS_USED && return 0
    is_false $MULTIPATH_IS_USED && return 1
    # When MULTIPATH_IS_USED has neither a true nor false value set it and return accordingly.
    # Because "multipath -l" always returns zero exit code we check if it has real output via grep -q '[[:alnum:]]'
    # so that no "multipath -l" output could clutter the log (the "multipath -l" output is irrelevant here)
    # in contrast to e.g. test "$( multipath -l )" that would falsely succeed with blank output
    # and the output would appear in the log in 'set -x' debugscript mode:
    if multipath -l | grep -q '[[:alnum:]]' ; then
        MULTIPATH_IS_USED='yes'
        return 0
    else
        MULTIPATH_IS_USED='no'
        return 1
    fi
}

plus

MULTIPATH_IS_USED=''

in default.conf together with an explanatory comment
to provide "final power to the user".

Caution!
I think during "rear recover" multipath -l may not report the truth
unless layout/prepare/GNU/Linux/210_load_multipath.sh was run
so during "rear recover" the function is_multipath_used must not
be called after layout/prepare/GNU/Linux/210_load_multipath.sh
was run which happens not at the very beginning of "rear recover"
so all scripts before layout/prepare/GNU/Linux/210_load_multipath.sh
must not contain multipath code because all multipath code should
be skipped on non-multipath systems by using is_multipath_used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @schabrolles , my ideas for making the test quicker and more reliable are described in #2597 (comment) (test whether lsblk --nodeps -o fstype /dev/sd... shows mpath_member ) and #2597 (comment) (test whether there is a /sys/block/*/holders/* symlink pointing to the multipath device). I suppose dmsetup would work as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schabrolles
happy to hear from you again!

No need at all to say sorry that you can no longer maintain ReaR.
It is IBM who should say sorry that they do not support ReaR.
SUSE supports ReaR.
Red Hat supports ReaR.
But IBM itself does not (directly) support it.
Same for HP.
Same for all other "big server" manufacturers.

I didn't see your posting because I was about writing mine.
Your proposal "2." is essentially what also I suggest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schabrolles
of course you and your family have way much higher priority
than any piece of software could ever have
(living humans versus "dead" programs).
Of course I don't expect any further work for ReaR from you.
I do much appreciate all what you contributed.
So just enjoy your vacation - i.e. relax and recover!
I wish you and your family all the very best.

Copy link
Member Author

@jsmeix jsmeix Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcahyna
if you think something like a is_multipath_used function is useful
it could of course implement any test that is appropriate,
perhaps even several tests could be needed
to make it work also on old systems where e.g.
lsblk --nodeps -o fstype /dev/sd... may not yet work
so e.g. multipath -l might be used as final fallback
(or not at all when other tests work better in general).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcahyna
please have a look at #2708

In particular if you know a better method than multipath -l
which we can use in the new function is_multipath_used.

pcahyna added a commit to pcahyna/rear that referenced this pull request Feb 17, 2023
multipath -l is very slow with many multipath devices. As it will be
called for every multipath device, it leads to quadratic time complexity
in the number of multipath devices. For thousands of devices, ReaR can
take hours to scan and exclude them. We therefore have to comment
multipath -l out, as it is a huge performance regression, and find
another solution to bug rear#2298.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code does not do what it is meant to do cleanup enhancement Adaptions and new features fixed / solved / done
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants