Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUSE 12.2 ReaR recovery stuck at system layout restoration #1786

Closed
manums1983 opened this issue Apr 27, 2018 · 39 comments
Closed

SUSE 12.2 ReaR recovery stuck at system layout restoration #1786

manums1983 opened this issue Apr 27, 2018 · 39 comments

Comments

@manums1983
Copy link

Relax-and-Recover (ReaR) Issue Template

SUSE 12.2 ReaR recovery stuck at system layout restoration
Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"): 2.31
  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
    SUSE 12.2 (SP2) SAP Server
  • ReaR configuration files ("cat /etc/rear/site.conf" or "cat /etc/rear/local.conf"):
    System is not booting up to get local.conf file
  • System architecture (x86 compatible or POWER and/or what kind of virtual machine):
    Inte 64 bit
  • Are you using BIOS or UEFI or another way to boot?
    UEFI
  • Brief description of the issue:
    The resrore is stuck at "system layout restoration"
  • Work-around, if any:
@gozora
Copy link
Member

gozora commented Apr 27, 2018

@manums1983 I'm afraid that from such vague problem description, none will be actually able to help you.
I'm suspecting that you are doing rear recover.
Can you relaunch with rear -d -D recover and provide us with log files from /var/log/rear ?

Thanks

V.

@manums1983
Copy link
Author

manums1983 commented Apr 27, 2018 via email

@gozora
Copy link
Member

gozora commented Apr 27, 2018

My humble recommendation would be "Do not use backup to local filesystem (file://) until you are 100% sure what you are doing and what the implications are". Try to go with nfs:// or smb:// for the start.

V.

@manums1983
Copy link
Author

manums1983 commented Apr 27, 2018 via email

@manums1983
Copy link
Author

rear case log .docx
Can copy the log form iLO so attached the screenshots.

@manums1983
Copy link
Author

I just noticed the /tmp directory got full. /tmp is only 10 GB. Is there a way i can point rear log to a different directory?

@jsmeix
Copy link
Member

jsmeix commented Apr 27, 2018

@manums1983
ReaR is not something where you can "just simply make settings"
what might look to you as if it works. For example
NETFS_KEEP_OLD_BACKUP_COPY=10
does not work this way.
When you use ReaR you need to carefully read the documentation
in particular read default.conf for each config variable and often
you may even have a look at the scripts to really understand
what a particular setting actually does.
In general see
https://en.opensuse.org/SDB:Disaster_Recovery

I will not even try to read any kind of "proprietary" file format like doc or docx
(even if it is possible with huge deskop applications like LibreOffice).
Please provide ReaR's plain text debug log file completely as is.
Usually screenshots won't help because plain error messages
won't tell us the root cause.
Only the ReaR debug log file may tell the root cause.

FYI:
Your TMPDIR issue is explained in default.conf.

@jsmeix
Copy link
Member

jsmeix commented Apr 27, 2018

@manums1983

FYI:
For me with a SLES12-SP3 default system (with its default btrfs structure)
on a QEMU/KVM virtual machine with two virtual harddisks
a 20 GiB sda for the system plus a 2 GiB sdb for the backup
the following /etc/rear/local.conf works:

OUTPUT=ISO
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=file:///mnt/sdb1
OUTPUT_URL=nfs://10.160.4.244/nfs
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )
BACKUP_PROG_INCLUDE=( /srv /var/lib/mailman /tmp /var/lib/libvirt/images /var/opt /var/log /boot/grub2/i386-pc /var/lib/mysql /var/tmp /opt /boot/grub2/x86_64-efi /var/spool /var/lib/pgsql /var/lib/mariadb /usr/local /home /var/lib/machines /var/lib/named /var/cache )
POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )
SSH_ROOT_PASSWORD="rear"
USE_DHCLIENT="yes"

but one must know what BACKUP_URL=file:///mnt/sdb1 means:

On the original system I did the preparation steps

# parted -s /dev/sdb mklabel msdos

# parted -s /dev/sdb unit MiB mkpart primary 1 2047

# mkfs.ext4 /dev/sdb1

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start    End      Size     Type     File system  Flags
 1      1.00MiB  2047MiB  2046MiB  primary  ext4         type=83

# mkdir /mnt/sdb1

# mount /dev/sdb1 /mnt/sdb1

and after "rear -D mkbackup" I got

# find /mnt/sdb1

/mnt/sdb1
/mnt/sdb1/lost+found
/mnt/sdb1/f121
/mnt/sdb1/f121/backup.log
/mnt/sdb1/f121/backup.tar.gz

and on my NFS server (OUTPUT_URL=nfs://10.160.4.244/nfs) I got

nfs-server:/nfs # ls -lhrt /nfs/f121

total 85M
-rw------- 1 nobody nogroup  80M Apr 27 10:40 rear-f121.iso
-rw------- 1 nobody nogroup  261 Apr 27 10:40 VERSION
-rw------- 1 nobody nogroup  202 Apr 27 10:40 README
-rw------- 1 nobody nogroup 5.2M Apr 27 10:40 rear-f121.log

For the recovery I set up a second virtual machine
with a new 20 GiB sda for the system
and the existing 2 GiB sdb from the original system with the backup
and a virtual CD-ROM where I have the rear-f121.iso to boot
the ReaR recovery system on the second virtual machine.

In the running ReaR recovery system on the second virtual machine
I did the preparation steps to make the backup on its sdb accessible:

# mkdir /mnt/sdb1

# mount /dev/sdb1 /mnt/sdb1

and then I run "rear -D recover" which then "just works" for me.

But personally I find using BACKUP_URL=file:///... too complicated
from my point of view - but I don't know your particular use-case.

Personally I would perfer OUTPUT=USB to get the backup together
with the ReaR recovery system on a bootable disk medium.

I think - with probability one (https://en.wikipedia.org/wiki/Almost_surely) - this issue
is a ReaR configuration issue or a "how to use ReaR" issue.

@jsmeix
Copy link
Member

jsmeix commented Apr 27, 2018

Bad typo correction in
#1786 (comment)
wrong before and then I run "rear -D mkbackup" which then "just works" for me.
now corrected and then I run "rear -D recover" which then "just works" for me.

@manums1983
Copy link
Author

manums1983 commented Apr 27, 2018 via email

@manums1983
Copy link
Author

In the /mnt/backup/GESPRD1 will have all gzip file and ISO file. I will copy ISO to my local laptop and attach this ISO to iLO console and boot from rear ISO.

GESPRD1:/mnt/backup/GESPRD1 # ls -l
total 13242768
-rw------- 1 root root 202 Apr 27 14:42 README
-rw------- 1 root root 293 Apr 27 14:42 VERSION
-rw------- 1 root root 26890796 Apr 27 15:14 backup.log
-rw------- 1 root root 13263365582 Apr 27 15:14 backup.tar.gz
-rw------- 1 root root 253259776 Apr 27 14:42 rear-GESPRD1.iso
-rw------- 1 root root 17055276 Apr 27 14:42 rear-GESPRD1.log

@manums1983
Copy link
Author

Please find the logs attached.
rear-logs.zip.

@gozora
Copy link
Member

gozora commented Apr 27, 2018

Hello @manums1983
As @jsmeix pointed your in his #1786 (comment) and added reasoning to my brief "no no no, don't do it" #1786 (comment), what is happening to your rear recover is following.

  1. Either your /mnt/backup contains mounted some remote filesystem or external device, in such case you should use appropriate protocol (nfs://, smb://, usb://) instead of file://

  2. Or you store backup on your local filesystem which is part of your OS FS structure. This is simply not a good idea and you should use remote storage that is not directly related to OS your are backing up.

V.

@gozora
Copy link
Member

gozora commented Apr 27, 2018

As I've read your disklayout.conf I've noticed that point 2. of my #1786 (comment) is true.

Your /mnt/backup is located on /dev/sdb1 which is SmarArray logical volume spread across single disk. If this is a test server all you have to do before starting rear recover is to mount your /mnt/backup directory and maybe comment out following line from /var/lib/rear/disklayout.conf

logicaldrive /dev/sdb 0|A|1 raid=0 drives=1I:1:3, spares= sectors=32 stripesize=256

it is just a guess, but ReaR might try to re-create your SmartArray configuration with this line active, which is not something you want.

If this is however production server, you should reconsider your bare metal disaster recovery strategy, because storing backups locally on SmartArray without any redundancy will really not help you in case of disaster.

V.

@jsmeix jsmeix added the special hardware or VM The issue depends on non common (virtual) hardware. label Apr 27, 2018
@jsmeix
Copy link
Member

jsmeix commented Apr 27, 2018

@gozora
as far as I see by quick googling 'SmartArray' is HP specific hardware
so that I added the "special hardware" label to this issue
(remove it if if I am wrong).

@gozora
Copy link
Member

gozora commented Apr 27, 2018

@jsmeix yes, you are right with one tiny correction, it is HPE nowadays :-), this is however not the crux of the problem, but rather (as you already stated)

I think - with probability one (https://en.wikipedia.org/wiki/Almost_surely) - this issue
is a ReaR configuration issue or a "how to use ReaR" issue.

So I personally would avoid using "special hardware" label ...

V.

@jsmeix jsmeix removed the special hardware or VM The issue depends on non common (virtual) hardware. label Apr 27, 2018
@manums1983
Copy link
Author

Hardware Specs:

HPE iLO4 "Smart Array P830i".

Hard disk Configuration : RAID1 Array (2x600 GB) (/dev/sda) ---> This is local disk where OS is installed.
RAID-0 (1x600 GB) ---> This is also local disk is where the ReaR backup is pointed /mnt/backup. This is GPT, Primary with EXT 4 partition (/dev/sdb1).

All other disk are 3PAR multipath disks. All 3PAR disks exclude from the backup and recreation.

@jsmeix
Copy link
Member

jsmeix commented Apr 27, 2018

I don't know about Smart Array but assume there is real hardware RAID here and
not a B110i software RAID solution based on the Smart Array firmware as in
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c00687518
so that it is basically the same disk layout as in my
#1786 (comment)
"sda for the system ... sdb for the backup"

@manums1983
when your backup is on local disk sdb
I wonder how you access it from a replacement server?
But I may misunderstand things because I don't know about Smart Array.

@gozora
Copy link
Member

gozora commented Apr 27, 2018

@jsmeix

I don't know about Smart Array but assume there is real hardware RAID here and
not a B110i software RAID solution based on the Smart Array firmware as in ...

Correct ;-)

@gozora
Copy link
Member

gozora commented Apr 27, 2018

@jsmeix

I wonder how you access it from a replacement server?

I'm afraid that there is no replacement server, SmartArray disks are presented to system as any other disk /dev/sda, /dev/sdb ... So I guess that strategy here is just to boot ReaR recovery system and restore content of /dev/sda from /dev/sdb

V.

@manums1983
Copy link
Author

This is purely a hardware RAID array controller. Not software based. This is a production SAP server in DR site. Highly critical to make any kind of changes,
Just wondering why this is not working because the same method of backup and restore successfully done on Hardware GEN 9 DL 380 box with OS SUSE 12.2 no issues faced, but this server do not have any 3PAR LUNs, the server have local disks sda (system) and sdb (backup).

Here the difference is DL 580 with OS SUSE 12.2 with lots of 3PAR Luns. Noticed the restore getting stuck at "Start System Layout Restoration" , do we have any issue with multi-path.

I wonder how you access it from a replacement server? ---> OS is running on RAID1 (sda) if it corrupted after patching or any other reasons can restore the (sda) from the backup (sdb). Boot from ReaR ISO , in rescue mode mount /dev/sab1 /mnt/backup and start recovery.

@manums1983
Copy link
Author

correction:
mount /dev/sdb1 /mnt/backup and start recovery.

@gozora
Copy link
Member

gozora commented Apr 27, 2018

@manums1983 if you think you are heaving problem with multipath and you are doing recovery solely on local disks, just try to unload dm-multipath and friends kernel modules + whatever FC (lpfc, bnx, qlaxxx) driver you are using prior rear recover.
This should let you only with local disks visible to ReaR recovery system, so there shall be no interference ...

V.

@manums1983
Copy link
Author

@gozora,
could you pls help on this how to unload multipath disks kernel modules . This is production server i am afraid some mistake may happen by doing so.

@gozora
Copy link
Member

gozora commented Apr 27, 2018

@manums1983 if this is an critical / production server and downtime would mean any kind of SLA breach you should either escalate this task to higher support level or ask some more experienced colleague for help.
Community around ReaR can help you with ReaR related problems but I guess that no one will give you step by step guidance through every single step that can go wrong during OS recovery ...

V.

@manums1983
Copy link
Author

Hi gozora,
ReaR can very well handle the external disks (multipath) during backup and recovery. May you help to find obstacle for the ReaR recovery to hang at disk layout restoration phase. Is there any issues with local disk configuration or any ReaR configuration problem.
We are using Data Protector in the environment i hope ReaR can integrated with DP, but do not know this is a good method. Currently Data Protector will do the backup of /mnt/backup as flat file backup as second copy.

@gozora
Copy link
Member

gozora commented Apr 28, 2018

One of the options would be to run recovery with debug options rear -d -D recover and check logs in /var/log/rear .

V.

@manums1983
Copy link
Author

sure i will share you the log on Monday.

@manums1983
Copy link
Author

Hi Gozora,
Please find the logs in /var/log/rear/ and layouts. The recovery hung in the very begining.

I like to try backup to USB. Below configuration is ok to boot form USB and do recovery. pls let know.
rear format /dev/sdX
OUTPUT=USB
BACKUP=NETFS
BACKUP_URL="usb:///dev/disk/by-label/REAR-000"
rear-GESPRD1.log

@manums1983
Copy link
Author

I try backup to USB. But getting following error in writing logs.
rear-GESPRD1.log

Trying to find what to use as UEFI bootloader...
Trying to find a 'well known file' to be used as UEFI bootloader...
Using '/boot/efi/EFI/sles/grubx64.efi' as UEFI bootloader file
Copying logfile /var/log/rear/rear-GESPRD1.log into initramfs as '/tmp/rear-GESPRD1-partial-2018-04-30T14:13:23+08:00.log'
Copying files and directories
Copying binaries and libraries
Copying kernel modules
Copying all files in /lib*/firmware/
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (208486998 bytes) in 75 seconds
ERROR: Could not copy /mnt/backup/rear.YnLVHYO5VGq69u4/tmp/initrd.cgz to /tmp/rear-efi.yJHGT//EFI/BOOT/initrd.cgz
Aborting due to an error, check /var/log/rear/rear-GESPRD1.log for details
Exiting rear mkbackup (PID 84124) and its descendant processes
Running exit tasks
Terminated

@manums1983
Copy link
Author

Noticed that the /dev/sdaq1 is getting full and it is failing.
/dev/sdaq1 200M 200M 0 100% /tmp/rear-efi.exLw5

@manums1983 manums1983 reopened this Apr 30, 2018
@manums1983
Copy link
Author

Hi Gozora,
I did backup as file based and USB. Both having the same issue. They are getting hung at "Disk Layout Restoration".
As you have requested i have attached the logs. could pls have a look on.
rear-GESPRD1.log

###############Logs################################

++ StopIfError 'You must specify either BACKUP_URL or BACKUP_MOUNTCMD and BACKUP_UMOUNTCMD !'
++ StopIfError 'Could not mkdir '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''''
++ StopIfError 'Mount command '''mount -v -o rw,noatime /dev/disk/by-label/REAR-000 /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
++ StopIfError 'Could not find file '''mbr.bin'''. Syslinux version 3.08 or newer is required, 4.x prefered!'
++ StopIfError 'Unmounting '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
+++ StopIfError 'Partition number '''1''' of partition sda1 is not a valid number.'
+++ StopIfError 'Partition sda1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''2''' of partition sda2 is not a valid number.'
+++ StopIfError 'Partition sda2 is numbered '''2'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''3''' of partition sda3 is not a valid number.'
+++ StopIfError 'Partition sda3 is numbered '''3'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''4''' of partition sda4 is not a valid number.'
+++ StopIfError 'Partition sda4 is numbered '''4'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''5''' of partition sda5 is not a valid number.'
+++ StopIfError 'Partition sda5 is numbered '''5'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''6''' of partition sda6 is not a valid number.'
+++ StopIfError 'Partition sda6 is numbered '''6'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''7''' of partition sda7 is not a valid number.'
+++ StopIfError 'Partition sda7 is numbered '''7'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''8''' of partition sda8 is not a valid number.'
+++ StopIfError 'Partition sda8 is numbered '''8'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition sdaq1 is not a valid number.'
+++ StopIfError 'Partition sdaq1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''2''' of partition sdaq2 is not a valid number.'
+++ StopIfError 'Partition sdaq2 is numbered '''2'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition sdb1 is not a valid number.'
+++ StopIfError 'Partition sdb1 is numbered '''1'''. More than 128 partitions is not supported.'
++ StopIfError 'Failed to save XFS options of /dev/mapper/360002ac000000000000000230001e534-part1'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Failed to save XFS options of /dev/sda4'
++ StopIfError 'Failed to save XFS options of /dev/sda5'
++ StopIfError 'Failed to save XFS options of /dev/sda6'
++ StopIfError 'Failed to save XFS options of /dev/sda7'
++ StopIfError 'Failed to save XFS options of /dev/sda8'
++ StopIfError 'Divide by zero detected'
++ StopIfError 'Divide by zero detected'
++ LogIfError 'Did not find sysfs name for device 360002ac0000000000000002c0001e534 (/sys/block/dm-7)'
++ LogIfError 'Failed to get size of dm-7 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002c0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002c0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002c0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002c0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000260001e534 (/sys/block/dm-3)'
++ LogIfError 'Failed to get size of dm-3 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000260001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000260001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000260001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000260001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac0000000000000002a0001e534 (/sys/block/dm-9)'
++ LogIfError 'Failed to get size of dm-9 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002a0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002a0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002a0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002a0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000240001e534 (/sys/block/dm-1)'
++ LogIfError 'Failed to get size of dm-1 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000240001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000240001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000240001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000240001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac0000000000000002d0001e534 (/sys/block/dm-8)'
++ LogIfError 'Failed to get size of dm-8 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002d0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002d0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002d0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002d0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000270001e534 (/sys/block/dm-4)'
++ LogIfError 'Failed to get size of dm-4 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000270001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000270001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000270001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000270001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac0000000000000002b0001e534 (/sys/block/dm-6)'
++ LogIfError 'Failed to get size of dm-6 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002b0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002b0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac0000000000000002b0001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac0000000000000002b0001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000250001e534 (/sys/block/dm-2)'
++ LogIfError 'Failed to get size of dm-2 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000250001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000250001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000250001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000250001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000230001e534 (/sys/block/dm-0)'
++ LogIfError 'Failed to get size of dm-0 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000230001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000230001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000230001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000230001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ LogIfError 'Did not find sysfs name for device 360002ac000000000000000280001e534 (/sys/block/dm-5)'
++ LogIfError 'Failed to get size of dm-5 with get_disk_size'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000280001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000280001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
+++ StopIfError 'Partition number '''1''' of partition 360002ac000000000000000280001e534-part1 is not a valid number.'
+++ StopIfError 'Partition 360002ac000000000000000280001e534-part1 is numbered '''1'''. More than 128 partitions is not supported.'
++ StopIfError 'Could not find a suitable kernel. Maybe you have to set KERNEL_FILE [/boot/vmlinuz-4.4.103-92.56-default] ?'
++ StopIfError 'Failed to create mount point /tmp/rear-efi.MQ5RB'
++ StopIfError 'Failed to mount EFI partition /dev/disk/by-label/REAR-EFI to /tmp/rear-efi.MQ5RB'
++ StopIfError 'Failed to create /tmp/rear-efi.MQ5RB//EFI/BOOT'
++ StopIfError 'Could not copy EFI bootloader to /tmp/rear-efi.MQ5RB//EFI/BOOT/BOOTX64.efi'
++ StopIfError 'Could not copy /boot/vmlinuz-4.4.103-92.56-default to /tmp/rear-efi.MQ5RB//EFI/BOOT/kernel'
++ StopIfError 'Could not copy /mnt/backup/rear.QIWPMl9uTLsWAr5/tmp/initrd.cgz to /tmp/rear-efi.MQ5RB//EFI/BOOT/initrd.cgz'
++ StopIfError 'Failed to create BOOTX64.efi'
++ LogIfError 'Could not remove temporary directory /tmp/rear-efi.MQ5RB, please check manually'
++ StopIfError 'Could not mkdir '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''''
++ StopIfError 'Mount command '''mount -v -o rw,noatime /dev/disk/by-label/REAR-000 /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
++ StopIfError 'Could not mkdir /mnt/backup/rear.QIWPMl9uTLsWAr5/tmp/boot'
++ StopIfError 'Could not mkdir '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/GESPRD1''''
++ StopIfError 'Could not create '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/GESPRD1/.lockfile''''
++ StopIfError 'Could not find a working syslinux path.'
++ StopIfError 'Could not create USB ReaR dir [/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537] !'
++ StopIfError 'Could not create USB syslinux dir [/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/boot/syslinux] !'
++ StopIfError 'Could not create /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537/kernel'
++ StopIfError 'Could not create /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537/initrd.cgz'
++ StopIfError 'Could not copy /var/log/rear/rear-GESPRD1.log to /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537/rear-GESPRD1.log'
++ BugIfError 'RAW_USB_DEVICE and REAL_USB_DEVICE should be already set'
++ StopIfError 'Problem with extlinux -i /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/boot/syslinux'
++ StopIfError 'Problem with writing the mbr.bin to '''/dev/sdaq''''
++ StopIfError 'Could not copy '''/usr/share/rear/conf/templates/RESULT_usage_USB.txt''''
++ StopIfError 'Unmounting '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
++ StopIfError 'Could not mkdir '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''''
++ StopIfError 'Mount command '''mount -v -o rw,noatime /dev/disk/by-label/REAR-000 /mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
++ StopIfError 'Could not remove '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537.old''''
++ StopIfError 'Could not move '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537''''
++ StopIfError 'Could not mkdir '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537''''
++ StopIfError 'Could not create '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs/rear/GESPRD1/20180430.1537/.lockfile''''
++ StopIfError 'Unmounting '''/mnt/backup/rear.QIWPMl9uTLsWAr5/outputfs''' failed.'
GESPRD1:/var/log/rear # ls
rear-GESPRD1.log rear-GESPRD1.log.old
GESPRD1:/var/log/rear # cat rear-GESPRD1.log.old | grep "Error"
GESPRD1:/var/log/rear # ls -l
total 17600
-rw-r----- 1 root root 18013129 Apr 30 16:35 rear-GESPRD1.log
-rw-r----- 1 root root 4455 Apr 30 15:37 rear-GESPRD1.log.old
GESPRD1:/var/log/rear # ls -l
total 17600
-rw-r----- 1 root root 18013129 Apr 30 16:35 rear-GESPRD1.log
-rw-r----- 1 root root 4455 Apr 30 15:37 rear-GESPRD1.log.old
GESPRD1:/var/log/rear # ls -l
total 17600
-rw-r----- 1 root root 18013129 Apr 30 16:35 rear-GESPRD1.log
-rw-r----- 1 root root 4455 Apr 30 15:37 rear-GESPRD1.log.old

@gozora
Copy link
Member

gozora commented Apr 30, 2018

Again, if you are heaving trouble during restore phase, please provide log files created by rear -d -D recover

V.

@manums1983
Copy link
Author

Hi Gozora,
i was doing it from iLO console i could not copy it from there. i believe i copied the "rear -d -D recover" logs. i copied it form location /var/log/rear/.
Ok let me try once more.

@manums1983
Copy link
Author

I tried it copy the /var/log/rear/gesprd*** to a local drive. but the server is getting hung. Not responding to any abort commands.
I have a screen shot can i attach?

@manums1983
Copy link
Author

I have attached restore -d -D recover error screenshot. i could not copy any logs to local directory because the whole server is hung. Really appreciate if you can see something form the screenshots.
restore_error_log_screenshot.docx

@gozora
Copy link
Member

gozora commented Apr 30, 2018

I don't really believe that screenshot is enough ...
Are you aware that there is sshd running inside ReaR recovery system ?

V.

@manums1983
Copy link
Author

Hi gozora,
I tried ssh to the server via putty from my laptop but it was failing.
Let me tell you the other thing i noticed while it was booting from rescue disk is that it was failing at script "40-start-udev-or-load-modules.sh". Below is the error message it was captured from screen
############Rescue disk Boot error on screen#############
Running 40-start-udev-or-load-modules.sh...
insmod /lib/modules/4.4.103-92..56-default/weak-update/updates/lpfc.ko lpfc_devloss_tmo=14 ipfc_lun_queue_depth=16 lpfc_discovery_threads=32
grep: write error.
########################################################
I understand without proper recovery log it is very difficult to analyse the problem. But situation i have is once i hit "rear -d -D recover" the server fully getting hung and only response to iLO reboot option which making no way to collect the logs. Also on screen i was not able to see any real time logs.
Please let know the above log making any sense relate to any know issue, because i observed in another recent case it seem as a bug.
#1207

@gozora
Copy link
Member

gozora commented May 1, 2018

@manums1983 I'm sorry, but without proper logs I can't help you.

V.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants