Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rear Recover finishes but system will not boot. SLES12 SP3 with BRTFS/ppc64le #2094

Closed
kkoehle opened this issue Mar 21, 2019 · 10 comments
Closed

Comments

@kkoehle
Copy link

kkoehle commented Mar 21, 2019

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version: Relax-and-Recover 2.4 / Git

*OS version: SUSE Linux Enterprise Server 12 SP3

  • ReaR configuration files:
OUTPUT=ISO
BACKUP=NETFS
BOOT_OVER_SAN=y
AUTOEXCLUDE_MULTIPATH=n
BACKUP_URL=nfs://9.5.110.188/nfs
REAR_INITRD_COMPRESSION=lzma
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" resize snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )
BACKUP_PROG_INCLUDE=(/srv /boot/grub2/powerpc-ieee1275 /var/log /var/opt /tmp /var/tmp /var/cache /opt /usr/local /var/spool /var/lib/libvirt/images /home)
EXCLUDE_MOUNTPOINTS=( /hana/data /hana/log /hana/shared)
  • Hardware: IBM PoverVM LPAR s824

  • System architecture: PPC64LE

  • Firmware: BIOS and GRUB2

  • Storage: SAN NPIV IBM v7000

  • Description of the issue:
    Rear recover finishes without error, but LPAR will not boot.

  • Workaround, if any: none.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

rear.txt

@schabrolles
Copy link
Member

@kkoehle,

I tried rear from the git repo on a sles12 without any issue (on KVM not PowerVM)

I notice that you don't have POST_RECOVERY_SCRIPT in your rear local.conf file.

Here is the part I usually add to my SLES12 servers.

## SLES12
#BACKUP_OPTIONS="nfsvers=4,nolock"
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )

for subvol in $(findmnt -n -r -t btrfs | cut -d ' ' -f 1 | grep -v '^/$' | egrep -v 'snapshots|crash') ; do
        BACKUP_PROG_INCLUDE=( "${BACKUP_PROG_INCLUDE[@]}" "$subvol" )
done

POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )

@kkoehle
Copy link
Author

kkoehle commented Mar 21, 2019

@schabrolles

I added the POST_RECOVERY_SCRIPT and no change. I don't imagine it is a KVM vs PowerVM thing that determines if a disk shows bootable. Does your SLES12 install use LVM and BTRFS for the boot disk?:

 PowerPC Firmware
 Version FW860.11 ### (SV860_063)
 SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
 Current Boot Sequence
 1.   Device is not bootable or removed.
 2.    None
 3.    None
 4.    None
 5.    None

What can I do to make PowerVM think it is bootable? Why is there no error message saying the bootloader didn't work? Is there anything I should check for?

@schabrolles
Copy link
Member

@kkoehle,

agree, I don't think my KVM test could explain the difference. I think there is something wrong with grub2 but I never had this on my rear on Power test.

you can try to run rear -d recover to have a debug log and check if something wrong happens during the grub installation.
usr/share/rear/finalize/Linux-ppc64le/620_install_grub2.sh

I won't be available from today to end of next week.

@jsmeix do you think it could be related to #2093

@jsmeix
Copy link
Member

jsmeix commented Mar 22, 2019

@kkoehle
use rear -d -D recover (i.e. with -D) to get a full debug log,
cf. "Debugging issues with Relax-and-Recover" at
https://en.opensuse.org/SDB:Disaster_Recovery

@jsmeix
Copy link
Member

jsmeix commented Mar 22, 2019

@schabrolles
#2093 looks totally unrelated
because there 1:1 restore for all fs and disks works fine.

But what happens in this issue here is not a recovery
on 100% compatible hardware because in
https://github.com/rear/rear/files/2993248/rear.txt
there is (excerpts)

Switching to manual disk layout configuration
Original disk /dev/mapper/36005076802818158a400000000000495 does not exist (with same size) in the target system
...
User confirmed disk layout file
Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.
Doing SLES12-SP1 (and later) btrfs subvolumes setup because the default subvolume path contains '@/.snapshots/'
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
1
No code has been generated to recreate pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
Confirm or edit the disk recreation script
1) Confirm disk recreation script and continue 'rear recover'
2) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
3) View disk recreation script (/var/lib/rear/layout/diskrestore.sh)
4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
5) Use Relax-and-Recover shell and return back to here
6) Abort 'rear recover'
(default '1' timeout 300 seconds)
2
***** At this point I change the line: 
    lvm lvcreate -L 64424509440b -n root system <<<y
****** to: 
    lvm lvcreate -L 55g -n root system <<<y
***** I do this because I am restoring a disk that was 100 GB to a 60 GB disk, and the 64424509440b is too big (especially since swap still needs 2 GB)

So the recovery hapens on a smaller disk and that is a migration
that will for sure not just work.

I was wondering why that

Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.

happened because that is a far too late "desperate" mesage from
the create_partitions() function when it works aready with
wrong data in disklayout.conf

I would have expected to see messages from
layout/prepare/default/420_autoresize_last_partitions.sh
because of

Device mapper!36005076802818158a400000000000495 does not exist (manual configuration needed)
Switching to manual disk layout configuration

we are in MIGRATION_MODE where in particular scripts like
layout/prepare/default/420_autoresize_last_partitions.sh
should run (regardless that this script alone does not help here
because in case of LVM manual adaptions are needed anyway,
cf. the Resizing partitions in MIGRATION_MODE during "rear recover"
section in default.conf:
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L370

@kkoehle
Copy link
Author

kkoehle commented Mar 22, 2019

Here is the log file: I couldn't see anything that stood out.
rear-hana-n3.log

@jsmeix
Copy link
Member

jsmeix commented Mar 25, 2019

I think I know why layout/prepare/default/420_autoresize_last_partitions.sh
doesn't actually do something in this particular case:
https://github.com/rear/rear/files/2998736/rear-hana-n3.log contains

+ source /usr/share/rear/layout/prepare/default/420_autoresize_last_partitions.sh
...
++ cp /var/lib/rear/layout/disklayout.conf /var/lib/rear/layout/disklayout.conf.resized_last_partition
...
+++ grep '^disk ' /var/lib/rear/layout/disklayout.conf
++ mv /var/lib/rear/layout/disklayout.conf.resized_last_partition /var/lib/rear/layout/disklayout.conf

i.e. grep '^disk ' /var/lib/rear/layout/disklayout.conf does not find anything.

@kkoehle
we also need at least your var/lib/rear/layout/disklayout.conf
plus some more additional files in the /var/lib/rear/ directory
and in its sub-directories in the ReaR recovery system,
cf. "Debugging issues with Relax-and-Recover" at
https://en.opensuse.org/SDB:Disaster_Recovery

Be careful when attaching files here to not make
possibly confidential internal information public here.
I.e. you may have to obfuscate some values in those files
before you upload those files here.
On the other hand you should not obfuscate too much
so that it would become impossible for us to see
what actually goes on on your particular system.

Additionally describe as exact as you can how
your replacement system where you run "rear recover" differs from
your original system where you had run "rear mkbackup/mkrescue".

@kkoehle
Copy link
Author

kkoehle commented Apr 26, 2019

@schabrolles @jsmeix, I found the problem: the boot flag never gets set on the correct partition:

hana-n4:~ # parted /dev/mapper/36005076802818158a4000000000004c0
(parted) print                                                            
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  8389kB  7340kB  primary               prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e
(parted) set 1 boot on
 1      1049kB  8389kB  7340kB  primary               boot, prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e

@jsmeix
Copy link
Member

jsmeix commented Apr 29, 2019

According to #2094 (comment)
this is a bootloader issue in MIGRATION_MODE so I think the general issue is
#1437

@github-actions
Copy link

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants