Permalink
Fetching contributors…
Cannot retrieve contributors at this time
132 lines (103 sloc) 5.3 KB

Troubleshooting Relax-and-Recover

If you encounter a problem, you may find more information in the log file which is located at /var/log/rear/rear-system.log. During recovery the backup log file is also available from /var/log/rear/, for your convenience the history in rescue mode comes with useful commands for debugging, use the up arrow key in the shell to find those commands.

There are a few options in Relax-and-Recover to help you debug the situation:

  • use the -v option to show progress output during execution

  • use the -d option to have debug information in your log file

  • use the -s option to see what scripts Relax-and-Recover would be using; this is useful to understand how Relax-and-Recover is working internally

  • use the -S option to step through each individual script when troubleshooting

  • use the -D option to dump every function call to log file; this is very convenient during development or when troubleshooting Relax-and-Recover

During backup

During backup Relax-and-Recover creates a description of your system layout in one file (disklayout.conf) and stores this as part of its rescue image. This file describes the configuration of SmartArray RAID, partitions, software RAID, DRBD, logical volumes, filesystems, and possibly more.

Here is a list of known issues during backup:

  1. One or more HP SmartArray controllers have errors

    Relax-and-Recover had detected that one of your HP SmartArray controllers is in ERROR state and as a result it can not trust the information returned from that controller. This can be dangerous because we cannot guarantee that the disk layout is valid when recovering the system.

    We discovered that this problem can be caused by a controller that still has information in its cache that has not been flushed and the only way to solve it was to reboot the system and pressing F2 during the controller initialization when it reports this problem.

  2. USB sticks disappear and re-appear with difference device name

    We have had issues before with a specific batch of JetFlash USB sticks which, during write operations, reset the USB controller because of a bug in the Linux kernel. The behavior is that the device disappears (during write operations!) and reappears with a different device name. The result is that the filesystem becomes corrupt and the stick cannot be used.

    To verify if the USB stick has any issues like this, we recommend using the f3 tool on Linux or the h2testw tool on Windows. If this tool succeeds in a write and verify test, the USB stick is reliable.

During recovery

During restore Relax-and-Recover uses the saved system layout as the basis for recreating a workable layout on your new system. If your new hardware is very different, it’s advised to copy the layout file /var/lib/rear/layout/disklayout.conf to /etc/rear and modify it according to what is required.

cp /var/lib/rear/layout/disklayout.conf /etc/rear/
vi /etc/rear/disklayout.conf

Then restart the recovery process: rear recover

During the recovery process, Relax-and-Recover translates this layout file into a shell procedure (/var/lib/rear/layout/diskrestore.sh) that contains all the needed instructions for recreating your desired layout.

If Relax-and-Recover comes across irreconcilable differences, it provides you with a small menu of options you have. In any case you can Abort the menu, and retry after cleaning up everything Relax-and-Recover may already have done, incl. mdadm --stop --scan or vgchange -a n.

In any case, you will have to look into the issue, see what goes wrong and either fix the layout file (disklayout.conf) and restart the recovery process (rear recover) or instead fix the shell procedure (diskrestore.sh) and choose Retry.

Warning
Customizations to the shell procedure (diskrestore.sh) get lost when restarting rear recover.

Here is a list of known issues during recovery:

  1. Failed to clear HP SmartArray controller 1

    This error may be caused by trying to clear an HP SmartArray controller that does not have a configuration or does not exist. Since we have no means to know whether this is a fatal condition or not we simply try to recreate the logical drive(s) and see what happens.

    This message is harmless, but may help troubleshoot the subsequent error message.

  2. An error has been detected during restore

    The (generated) layout restore script /var/lib/rear/layout/diskrestore.sh was not able to perform all necessary steps without error. The system will provide you with a menu allowing you to fix the diskrestore.sh script manually and continue from where it left off.

    Cannot create array. Cannot add physical drive 2I:1:5
    Could not configure the HP SmartArray controllers

    When the number of physical or logical disks are different, or when other important system characteristics that matter to recovery are incompatible, this will be indicated by a multitude of possible error-messages. Relax-and-Recover makes it possible to recover also in these cases by hand.

    You can find more information about your HP SmartArray setup by running one of the following commands:

    # hpacucli ctrl all show detail
    # hpacucli ctrl all show config
    # hpacucli ctrl all show config detail
Tip
You can find these commands as part of the history of the Relax-and-Recover shell.