Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken 'part' entries in disklayout.conf in case of 'unknown' partition table #2801

Closed
tpatel80 opened this issue May 9, 2022 · 19 comments
Closed
Assignees
Milestone

Comments

@tpatel80
Copy link

tpatel80 commented May 9, 2022

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.6 / Git

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

OS_VENDOR=RedHatEnterpriseServer
OS_VERSION=8.2
  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
KEEP_BUILD_DIR=""
OUTPUT=ISO
BACKUP=NETFS
BACKUP_PROG=tar
BACKUP_TYPE=incremental
BACKUP_PROG_OPTIONS+=( --anchored --xattrs --xattrs-include=security.capability )
FULLBACKUPDAY="Mon"
BACKUP_PROG_EXCLUDE=('/var/log/*'  '/fs/site*' '/opt/IBM/zimon/data*' '/var/log/messages*' '/var/crash/*' '/var/tmp/*')
BACKUP_OPTIONS="nfsvers=4,nolock"
BACKUP_URL=nfs://192.168.1.101/backup/
USE_STATIC_NETWORKING=Y
REAR_INITRD_COMPRESSION="lzma"
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" parted sfdisk )
PROGS=( "${PROGS[@]}" partprobe fdisk cfdisk mkofboot ofpath ybin yabootconfig bootlist pseries_platform nvram ofpathname bc agetty )

test "${FIRMWARE_FILES[*]}" || FIRMWARE_FILES=( 'no' )
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
    IBM

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    PPC64LE

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    Petitboot

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    local disk

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):

NAME                               KNAME     PKNAME    TRAN TYPE FSTYPE      LABEL   SIZE MOUNTPOINT
/dev/sda                           /dev/sda                 disk                   893.8G 
|-/dev/sda1                        /dev/sda1 /dev/sda       part                       4M 
|-/dev/sda2                        /dev/sda2 /dev/sda       part ext4                500M /boot
`-/dev/sda3                        /dev/sda3 /dev/sda       part LVM2_member       893.3G 
  |-/dev/mapper/VolGroup00-lv_root /dev/dm-0 /dev/sda3      lvm  ext4                 50G /
  `-/dev/mapper/VolGroup00-lv_swap /dev/dm-1 /dev/sda3      lvm  swap                1.5G [SWAP]
  • Description of the issue (ideally so that others can reproduce it):

REAR fails with following error:

ERROR: 
====================
BUG in /usr/local/src/rear/usr/share/rear/layout/save/default/950_verify_disklayout_file.sh line 254:
'Entries in /usr/local/src/rear/var/lib/rear/layout/disklayout.conf are broken ('rear recover' would fail)'
--------------------
  • Workaround, if any:

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

To paste verbatim text like command output or file content,
include it between a leading and a closing line of three backticks like

```
verbatim content
```
@tpatel80
Copy link
Author

tpatel80 commented May 9, 2022

rear.log

@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

@tpatel80
your https://github.com/rear/rear/files/8654468/rear.log
contains (excerpts):

2022-05-09 18:31:50.466037669 Verifying that the 'part' entries for /dev/sda in /usr/local/src/rear/var/lib/rear/layout/disklayout.conf are correct
++ partitions=()
++ read keyword dummy part_size part_start part_name part_flags part_dev junk
+++ grep '^part /dev/sda ' /usr/local/src/rear/var/lib/rear/layout/disklayout.conf
++ test -b ''
++ broken_part_errors+=("$part_dev is not a block device")
...
++ read keyword dummy part_size part_start part_name part_flags part_dev junk
++ test -b ''
++ broken_part_errors+=("$part_dev is not a block device")
...
++ read keyword dummy part_size part_start part_name part_flags part_dev junk
++ test -b ''
++ broken_part_errors+=("$part_dev is not a block device")

The matching code in
usr/share/rear/layout/save/default/950_verify_disklayout_file.sh
is (excerpts):

    while read keyword dummy part_size part_start part_name part_flags part_dev junk ; do
        test -b "$part_dev" || broken_part_errors+=( "$part_dev is not a block device" )

This shows that

read keyword dummy part_size part_start part_name part_flags part_dev junk

does not get any value for part_dev from the

grep '^part /dev/sda ' /usr/local/src/rear/var/lib/rear/layout/disklayout.conf

output.

So the error is right because according to the 'lsblk' output
in your #2801 (comment)
there should be

part /dev/sda <size> <start> <type|name> <flags> /dev/sda1
part /dev/sda <size> <start> <type|name> <flags> /dev/sda2
part /dev/sda <size> <start> <type|name> <flags> /dev/sda3

entries in your
/usr/local/src/rear/var/lib/rear/layout/disklayout.conf

Next step is to find out why read does not get part_dev values:

Your https://github.com/rear/rear/files/8654468/rear.log
contains (excerpts):

+ source /usr/local/src/rear/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh
...
+++ parted -s /dev/sda print
...
Error: Can't have a partition outside the disk!
++ disktype=unknown
...
++ parted -m -s /dev/sda print
Error: Can't have a partition outside the disk!
...
++ echo 'part /dev/sda 4194304 1048576 none  /dev/sda1'
...
++ echo 'part /dev/sda 524288000 5242880 none  /dev/sda2'
...
++ echo 'part /dev/sda 959127224320 529530880 none  /dev/sda3'
.
.
.
+ source /usr/local/src/rear/usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh
...
++ lvm pvdisplay -c
...
  WARNING: Device /dev/sda3 has size of 1873295360 sectors which is smaller than corresponding PV size of 1874350080 sectors. Was device resized?
  WARNING: One or more devices used as PVs in VG VolGroup00 have changed sizes.

The

++ echo 'part /dev/sda 4194304 1048576 none  /dev/sda1'
++ echo 'part /dev/sda 524288000 5242880 none  /dev/sda2'
++ echo 'part /dev/sda 959127224320 529530880 none  /dev/sda3'

show what is wrong:
A 'part' entry in disklayout.conf has the syntax

part <device> <size> <start> <type|name> <flags> /dev/<partition>

see the section
"Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

For example on my homeoffice laptop I have

part /dev/sda 8388608 1048576 rear-noname bios_grub /dev/sda1
part /dev/sda 4294967296 9437184 rear-noname swap /dev/sda2
part /dev/sda 214748364800 4304404480 rear-noname legacy_boot /dev/sda3
part /dev/sda 107374182400 219052769280 rear-noname none /dev/sda4

In contrast you have

part /dev/sda 4194304 1048576 none  /dev/sda1
part /dev/sda 524288000 5242880 none  /dev/sda2
part /dev/sda 959127224320 529530880 none  /dev/sda3

so one of <type|name> or <flags> is missing.

Next step is why <type|name> or <flags> is missing:
Your https://github.com/rear/rear/files/8654468/rear.log
contains (excerpts):

+ source /usr/local/src/rear/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh
.
.
.
++ parted -m -s /dev/sda print
Error: Can't have a partition outside the disk!
+++ grep '^/' /var/tmp/rear.3lGTm845SuRAkd2/tmp/parted
+++ cut -d : -f 6
++ disk_label=unknown
++ cp /var/tmp/rear.3lGTm845SuRAkd2/tmp/partitions /var/tmp/rear.3lGTm845SuRAkd2/tmp/partitions-data
++ declare type
++ [[ unknown = \m\s\d\o\s ]]
++ [[ unknown = \g\p\t ]]
++ [[ unknown = \g\p\t\_\s\y\n\c\_\m\b\r ]]
++ [[ unknown = \d\a\s\d ]]
++ declare flags flaglist

The matching code in
usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh
is (excerpts)

        parted -m -s $device print > $TMP_DIR/parted
        disk_label=$(grep ^/ $TMP_DIR/parted | cut -d ":" -f "6")
...
    if [[ "$disk_label" = "msdos" ]] ; then
...
    if [[ "$disk_label" = "gpt" || "$disk_label" = "gpt_sync_mbr" || "$disk_label" = "dasd" ]] ; then

Your disk_label=unknown which does not match
any of "msdos" "gpt" "gpt_sync_mbr" or "dasd"
so the partition type is not set
which leads to it be empty in disklayout.conf

I guess your disk label is actually one of
"msdos" "gpt" "gpt_sync_mbr" or "dasd"
but the current code in 200_partition_layout.sh
does not work because it does not sufficiently
"Try hard to care about possible errors", cf.
https://github.com/rear/rear/wiki/Coding-Style
because it blindly proceeds in case of errors like

++ parted -m -s /dev/sda print
Error: Can't have a partition outside the disk!

and I assume in your case the

parted -m -s /dev/sda print

output is not what the current code in 200_partition_layout.sh
normally needs and then things arbitrarily fail in ReaR.

@tpatel80
you should fix your partitioning so that parted
does not show errors.

I will have a look at
usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh
how to make it behave more reliably and failsafe,
in particular error out directly therein when things failed
instead of error out later in 950_verify_disklayout_file.sh
(at least I had added 950_verify_disklayout_file.sh
as generic safety net against our many old code places
that blindly proceed instead of caring about errors).

@jsmeix jsmeix self-assigned this May 10, 2022
@jsmeix jsmeix added enhancement Adaptions and new features support / question labels May 10, 2022
@jsmeix jsmeix added this to the ReaR v2.7 milestone May 10, 2022
jsmeix added a commit that referenced this issue May 10, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure the partition name/type entry in disklayout.conf
is always set (and percent encoded if needed)
at least it is set to the fallback value 'rear-noname'
regardless of the 'disk_label' value, cf.
#2801 (comment)
jsmeix added a commit that referenced this issue May 10, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure $disk_label is one of the supported partition tables
cf. #2801 (comment)
@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

#2802
intended to avoid the difficulty to detect
the root cause of an error like in this issue
but my attempt made things even worse.

jsmeix added a commit that referenced this issue May 10, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure $disk_label is one of the supported partition tables, cf.
#2801 (comment)
@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

Now its successor #2803
tries to avoid the difficulty to detect the
root cause of an error like in this issue here
by ensuring a supported partition table is found
and if not it errors out.

@jsmeix jsmeix changed the title Entries in disklayout.conf are broken ('rear recover' would fail) Broken 'part' entries in disklayout.conf in case of 'unknown' partition table May 10, 2022
jsmeix added a commit that referenced this issue May 10, 2022
…conf

In layout/save/GNU/Linux/200_partition_layout.sh
ensure syntactically correct 'disk' and 'part' entries, cf.
#2801 (comment)
@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

#2804
tries to ensure syntactically correct 'disk' and 'part' entries
which also intends to avoid the difficulty to detect
the root cause of an error like in this issue here.

@pcahyna
Copy link
Member

pcahyna commented May 10, 2022

I believe it is needed to know how to reproduce the issue before merging any fix (like #2804). @tpatel80 how is the disk partitioned? I suppose your machine is PowerNV (BareMetal)?

@tpatel80
Copy link
Author

@pcahyna the machine is a baremetal IBM Power System IC922 (9183-22X) factory installed with Redhat 8.2.

@pcahyna
Copy link
Member

pcahyna commented May 10, 2022

@tpatel80 I will try (I have used ReaR on POWER bare metal and I have not seen this kind of issue, but it was the version shipped with RHEL, not the development version from Git)

@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

Neither #2803
nor #2804
intend to fix the initially reported issue here
i.e. none makes correct entries in disklayout.conf.

Both are only meant to detect early i.e. directly
in the code where the entries are genereated in
layout/save/GNU/Linux/200_partition_layout.sh
when invalid entries would be generated
(regardless what the actual reason is).

I.e. #2803
and #2804
are only meant to better care about possible errors
and exit directly where the error happened
instead of blindly proceeding and let things fail
at some later place (950_verify_disklayout_file.sh).

@pcahyna
Copy link
Member

pcahyna commented May 10, 2022

@tpatel80 I am not able to reproduce the issue on RHEL 8.2 / POWER bare metal. What does

parted -m -s /dev/sda print

and

parted -s /dev/sda print

print for you?

The message WARNING: Device /dev/sda3 has size of 1873295360 sectors which is smaller than corresponding PV size of 1874350080 sectors. Was device resized? indicates that there is something wrong with partitioning on your machine.

@pcahyna
Copy link
Member

pcahyna commented May 10, 2022

Also please add the output of fdisk -l /dev/sda

@jsmeix
Copy link
Member

jsmeix commented May 10, 2022

Only a side note FYI:
When rear is run in debug mode KEEP_BUILD_DIR is true
so that [/var]/tmp/rear.XXXXXX/tmp/parted should be kept
that contains the parted -m -s output.

@tpatel80
Copy link
Author

I've attempted to correct the end value for sda3 to 1874329599. Will report back if this helps in any manner.

**sh-4.4# parted -m -s /dev/sda print**
Error: Can't have a partition outside the disk!
BYT;
/dev/sda:960GB:scsi:512:4096:unknown:AVAGO MR9361-8i:;

**sh-4.4# parted -s /dev/sda print**
Error: Can't have a partition outside the disk!
Model: AVAGO MR9361-8i (scsi)
Disk /dev/sda: 960GB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags: 

**sh-4.4# fdisk -l /dev/sda**
Disk /dev/sda: 893.8 GiB, 959656755200 bytes, **1874329600** sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disklabel type: dos
Disk identifier: 0x3db2a74e

Device     Boot   Start        End    Sectors   Size Id Type
/dev/sda1  *       2048      10239       8192     4M 41 PPC PReP Boot
/dev/sda2         10240    1034239    1024000   500M 83 Linux
/dev/sda3       1034240 1875384319 **1874350080** 893.8G 8e Linux LVM


**sh-4.4# fdisk -l -u /dev/sda**
Disk /dev/sda: 893.8 GiB, 959656755200 bytes, 1874329600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disklabel type: dos
Disk identifier: 0x3db2a74e

Device     Boot   Start        End    Sectors   Size Id Type
/dev/sda1  *       2048      10239       8192     4M 41 PPC PReP Boot
/dev/sda2         10240    1034239    1024000   500M 83 Linux
/dev/sda3       1034240 1875384319 1874350080 893.8G 8e Linux LVM

@tpatel80
Copy link
Author

tpatel80 commented May 10, 2022

Updating END value of sda3 seems to have done the trick. Running mkbackup now.

[Tue May 10 14:26:31 root@hpnls4vfs5:~ ] # parted -m -s /dev/sda print
BYT;
/dev/sda:960GB:scsi:512:4096:msdos:AVAGO MR9361-8i:;
1:1049kB:5243kB:4194kB:::boot, prep;
2:5243kB:530MB:524MB:ext4::;
3:530MB:960GB:959GB:::lvm;

[Tue May 10 14:26:40 root@hpnls4vfs5:~ ] # parted -s /dev/sda print
Model: AVAGO MR9361-8i (scsi)
Disk /dev/sda: 960GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  5243kB  4194kB  primary               boot, prep
 2      5243kB  530MB   524MB   primary  ext4
 3      530MB   960GB   959GB   primary               lvm

@pcahyna
Copy link
Member

pcahyna commented May 11, 2022

@tpatel80 I believe you still have a problem with LVM, it thinks that the PV is longer than the partition that it sits on. Not sure how serious it is or how to correct it...

@jsmeix
Copy link
Member

jsmeix commented May 11, 2022

@tpatel80
only out of curiosity (I am not a Red Hat user) regarding your
#2801 (comment)

the machine is a baremetal IBM Power System IC922 (9183-22X)
factory installed with Redhat 8.2.

Because you have parted errors and LVM warnings
about partion sizes and PV sizes on your original system
(so it cannot be ReaR that recreated something wrong)
I wonder if your original system is really
"factory installed with Redhat 8.2"?

I assume a Red Hat installation does not result
wrong partitioning and/or wrong LVM PV setup
(I think otherwise Red Hat would have already noticed)
so I guess smoeone or something had somehow changed
the pristine partitioning on your original system.

Alternatively a Red Hat installation may do
something wrong only on special hardware like
your baremetal IBM Power System IC922 (9183-22X).

FYI:
This would not be the first case we had here at ReaR
where disaster recovery setup with ReaR reveals
a broken setup on the original system.
In particular we had "fun" with certain special
third-party software that a bit messed up things
in the original system.

@tpatel80
Copy link
Author

@jsmeix I suspect the image used to clone these systems had different disk size. We've contacted IBM for details.

@jsmeix
Copy link
Member

jsmeix commented May 13, 2022

@tpatel80
so the installation on this particular machine
was not an actual Red Hat installation
but an imaged clone of a Red Hat installation
and ReaR revealed that the likely 100% clone
(e.g. like a simple 'dd' whole disk clone)
does not 100% match the clone's target machine hardware
(e.g. the disk image is a bit bigger than the target disk).

In general regarding a bit different hardware disk sizes
(in contrast virtual disk sizes can be made 100% same)
you may have a look at the description about
AUTORESIZE_PARTITIONS
AUTORESIZE_EXCLUDE_PARTITIONS
AUTOSHRINK_DISK_SIZE_LIMIT_PERCENTAGE
AUTOINCREASE_DISK_SIZE_THRESHOLD_PERCENTAGE
in usr/share/rear/conf/default.conf
what ReaR would do during "rear recover" to deal with
a bit different disk sizes on replacement hardware.

To keep issues because of a bit different disk sizes
out of the way I would recommend to not use a disk
up to its very end but leave some reasonable amount of
disk space unused at the end so that the a bit smaller
actually used disk space fits within the smallest existing
hardware disk size among one kind of disks that one uses
(e.g. among all 1TB disks that one uses).

This belongs to the general topic
"Prepare for disaster recovery from the very beginning" in
https://en.opensuse.org/SDB:Disaster_Recovery

I think I add a subsection there about recommended disk usage.

pcahyna pushed a commit to pcahyna/rear that referenced this issue May 16, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure $disk_label is one of the supported partition tables, cf.
rear#2801 (comment)
pcahyna pushed a commit to pcahyna/rear that referenced this issue May 16, 2022
…conf

In layout/save/GNU/Linux/200_partition_layout.sh
ensure syntactically correct 'disk' and 'part' entries, cf.
rear#2801 (comment)
pcahyna pushed a commit to pcahyna/rear that referenced this issue May 16, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure $disk_label is one of the supported partition tables, cf.
rear#2801 (comment)
pcahyna pushed a commit to pcahyna/rear that referenced this issue May 16, 2022
…conf

In layout/save/GNU/Linux/200_partition_layout.sh
ensure syntactically correct 'disk' and 'part' entries, cf.
rear#2801 (comment)
jsmeix added a commit that referenced this issue May 18, 2022
…able

In layout/save/GNU/Linux/200_partition_layout.sh
ensure $disk_label is one of the supported partition tables
(i.e. one of 'msdos' 'gpt' 'gpt_sync_mbr' 'dasd'),
see the last part about "error out directly ... when things failed" in
#2801 (comment)
jsmeix added a commit that referenced this issue May 18, 2022
In layout/save/GNU/Linux/200_partition_layout.sh
ensure syntactically correct 'disk' and 'part' entries in disklayout.conf
(each value must exist and each value must be a single non-blank word),
see the last part about "error out directly ... when things failed" in
#2801 (comment)
@jsmeix
Copy link
Member

jsmeix commented May 18, 2022

With #2803
and #2804
merged I think this issue is sufficiently solved.

@jsmeix jsmeix closed this as completed May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants