New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub recovery gets confused by --prefix option, system hangs during boot #210

Closed
abbbi opened this Issue Mar 20, 2013 · 13 comments

Comments

Projects
None yet
6 participants
@abbbi
Contributor

abbbi commented Mar 20, 2013

hi,

while trying to recover a SLES11 x64 system with REAR everything went well but the system was unable to boot. GRUB installation was reported as beeing succesfull but in the detailed logging it failed:

GUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ]
grub> device (hd0) /dev/sda
grub> root (hd0,1)
Filesystem type is reiserfs, partition type 0x83
grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd0)
Checking if "/grub/stage1" exists... no

Error 15: File not found
grub> quit

it seems the --prefix option did confuse it some kind of a way, with a simple:

setup --stage2=/boot/grub/stage2 (hd0)

everything went well.

Relax-and-Recover 1.14 / Git

@abbbi

This comment has been minimized.

Show comment
Hide comment
@abbbi

abbbi Mar 20, 2013

Contributor

the system did not have a dedicated boot partition, i think it went wrong because of this, according to the script the prefix stays /grub if no seperate disk is found:

finalize/Linux-i386/21_install_grub.sh

bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
grub_prefix=/grub
if [[ -z "$bootparts" ]]; then
    bootparts=$(find_partition fs:/)
    grub_prefix=/boot/grub
fi

i think defaulting to /boot/grub as the prefix makes sense anyway?

Contributor

abbbi commented Mar 20, 2013

the system did not have a dedicated boot partition, i think it went wrong because of this, according to the script the prefix stays /grub if no seperate disk is found:

finalize/Linux-i386/21_install_grub.sh

bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
grub_prefix=/grub
if [[ -z "$bootparts" ]]; then
    bootparts=$(find_partition fs:/)
    grub_prefix=/boot/grub
fi

i think defaulting to /boot/grub as the prefix makes sense anyway?

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Mar 26, 2013

Member

do you mean /boot is not mounted by default?

Member

gdha commented Mar 26, 2013

do you mean /boot is not mounted by default?

@abbbi

This comment has been minimized.

Show comment
Hide comment
@abbbi

abbbi Mar 26, 2013

Contributor

yes, there is no seperate partition for /boot, it exists on / however:

/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

dbupdate:~ # cat /etc/fstab
/dev/sda2 / reiserfs acl,user_xattr 1 1
/dev/sda1 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0

Contributor

abbbi commented Mar 26, 2013

yes, there is no seperate partition for /boot, it exists on / however:

/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

dbupdate:~ # cat /etc/fstab
/dev/sda2 / reiserfs acl,user_xattr 1 1
/dev/sda1 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Mar 31, 2013

Member

@jhoekx any suggestions on this topic?

Member

gdha commented Mar 31, 2013

@jhoekx any suggestions on this topic?

@jhoekx

This comment has been minimized.

Show comment
Hide comment
@jhoekx

jhoekx Mar 31, 2013

Member

@dagwieers knows this better. I was just sitting idly next to him when we wrote than and all I did was just nod when he asked a question :-)

But if I look at it, I think we want our initial search fot $bootparts to return nothing, then the prefix is set to /boot/grub. I don't know why we also search for fs:/ and do the sort and uniq there.

Member

jhoekx commented Mar 31, 2013

@dagwieers knows this better. I was just sitting idly next to him when we wrote than and all I did was just nod when he asked a question :-)

But if I look at it, I think we want our initial search fot $bootparts to return nothing, then the prefix is set to /boot/grub. I don't know why we also search for fs:/ and do the sort and uniq there.

@mclien

This comment has been minimized.

Show comment
Hide comment
@mclien

mclien Apr 9, 2013

Had/have the same issue. Did the workaround another way by creating a symlink /grub -> /boot/grub on the System before doing the backup
But I think the script should get /boot/grub as prefix, when no separate boot partition is found

mclien commented Apr 9, 2013

Had/have the same issue. Did the workaround another way by creating a symlink /grub -> /boot/grub on the System before doing the backup
But I think the script should get /boot/grub as prefix, when no separate boot partition is found

@abbbi

This comment has been minimized.

Show comment
Hide comment
@abbbi

abbbi Jun 4, 2013

Contributor

hi guys,

anything new on that? I just ran into the same troubles again and i think the fix is quite straight-forward :)

Contributor

abbbi commented Jun 4, 2013

hi guys,

anything new on that? I just ran into the same troubles again and i think the fix is quite straight-forward :)

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Jun 4, 2013

Member

Some background info (which is important to understand the why's I guess)

From the mail archive I found the following (from @dagwieers ):
Re: [rear-users] mbr broken after recover

Anyway, the reason there is a distinction between /grub/stage1 and
/boot/grub/stage1 is related to the fact that it could be on a separate
filesystem, in that case /grub/stage1 is correct (in fact in most cases
this is what is happening). So apparently it now fails for cases where it
should be using /boot/grub/stage1.

The reason the grub installation code is more complex is because we had to
support this second possibility. Why it now fails, I don't know. We did
test it when we wrote it ;-)

So how did we do this ?

To know exactly what devices are involved with the boot partition, we
search for the dependencies of fs:/boot and we remove any dependencies we
find for fs:/ as shown below:

 bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )

If in this case bootparts is empty, it means that fs:/boot and fs:/ share
the same partition(s). In which case we need to use /boot/grub as
grub_prefix.

The reason we have this complexity is because if you have software raid,
you need to be sure both disks are being updated, so we find more than one
partition ! This explains the complexity, otherwise we could just compare
the partition of fs:/ and fs:/boot.

Once we know the boot partition(s) and we have the correct grub_prefix, we
can go and look for the disks that relate to these partitions and install
grub on these.

So how to proceed to debug the problem ? Enable debugging and check what
Rear reports for find_partition fs:/boot and find_partition fs:/, that
should give an indication to what is going on.

PS: we're talking about code from finalize/Linux-i386/21_install_grub.sh and/or finalize/Linux-i386/22_install_grub2.sh

Member

gdha commented Jun 4, 2013

Some background info (which is important to understand the why's I guess)

From the mail archive I found the following (from @dagwieers ):
Re: [rear-users] mbr broken after recover

Anyway, the reason there is a distinction between /grub/stage1 and
/boot/grub/stage1 is related to the fact that it could be on a separate
filesystem, in that case /grub/stage1 is correct (in fact in most cases
this is what is happening). So apparently it now fails for cases where it
should be using /boot/grub/stage1.

The reason the grub installation code is more complex is because we had to
support this second possibility. Why it now fails, I don't know. We did
test it when we wrote it ;-)

So how did we do this ?

To know exactly what devices are involved with the boot partition, we
search for the dependencies of fs:/boot and we remove any dependencies we
find for fs:/ as shown below:

 bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )

If in this case bootparts is empty, it means that fs:/boot and fs:/ share
the same partition(s). In which case we need to use /boot/grub as
grub_prefix.

The reason we have this complexity is because if you have software raid,
you need to be sure both disks are being updated, so we find more than one
partition ! This explains the complexity, otherwise we could just compare
the partition of fs:/ and fs:/boot.

Once we know the boot partition(s) and we have the correct grub_prefix, we
can go and look for the disks that relate to these partitions and install
grub on these.

So how to proceed to debug the problem ? Enable debugging and check what
Rear reports for find_partition fs:/boot and find_partition fs:/, that
should give an indication to what is going on.

PS: we're talking about code from finalize/Linux-i386/21_install_grub.sh and/or finalize/Linux-i386/22_install_grub2.sh

@dagwieers

This comment has been minimized.

Show comment
Hide comment
@dagwieers

dagwieers Jun 4, 2013

Member

@gdha I am sorry, I should have replied to the bug report, instead of the mailinglist :-/

Member

dagwieers commented Jun 4, 2013

@gdha I am sorry, I should have replied to the bug report, instead of the mailinglist :-/

@sepsesam

This comment has been minimized.

Show comment
Hide comment
@sepsesam

sepsesam Jun 5, 2013

Contributor

hi,

i tried to reproduce, the strange thing is:

after recovering the data to /mnt/local and exiting the rear> command prompt with "exit" it does not show any error, the grub failure however is reported to the logfile

if get back into the recovery mode again, skipping the disk partitioninig it then complains correctly about
missing the boot partition:

rear> exit
Did you restore the backup to /mnt/local ? Are you ready to continue recovery ? yes
exit
Updated initramfs with new drivers for this system.
Installing GRUB boot loader
ERROR: BUG BUG BUG! Unable to find any /boot partitions
=== Issue report ===
Please report this unexpected issue at: https://github.com/rear/rear/issues
Also include the relevant bits from /var/log/rear/rear-dbupdate.log

HINT: If you can reproduce the issue, try using the -d or -D option !

Aborting due to an error, check /var/log/rear/rear-dbupdate.log for details
Terminated

so it seems there must be a difference between executing the recovery the first and the second time!
Attached you can find the logfile for the first recovery which does NOT catch the BugIfError, here is the relevant
part with -D

++++ grep -E '^[^ ]+ /dev/sda2 ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ sort
+++ type=part
+++ [[ part != \p\a\r\t ]]
+++ echo /dev/sda2
+++ for component in '"${ancestors[@]}"'
+++ [[ -n part ]]
++++ get_component_type /dev/sda
++++ grep -E '^[^ ]+ /dev/sda ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ uniq -u
+++ type=disk
+++ [[ disk != \p\a\r\t ]]
+++ continue
++ bootparts=/dev/sda2
++ grub_prefix=/grub
++ [[ -z /dev/sda2 ]]
++ [[ -n /dev/sda2 ]]
++ BugIfError 'Unable to find any /boot partitions'
++ (( 0 != 0 ))
+++ grep '^disk ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disklayout.conf
+++ cut '-d ' -f2

contents of diskalyout/todo:

disk /dev/sda 5368709120 msdos
part /dev/sda 12409206784 32256 primary none /dev/sda1
part /dev/sda 4770662400 575769600 primary boot /dev/sda2
fs /dev/sda2 / reiserfs uuid=3c83197e-b9f8-46ed-9586-105a48932e1b label= options=rw,acl,user_xattr
swap /dev/sda1 uuid= label=

done /dev/sda disk
done /dev/sda1 part
done /dev/sda2 part
done fs:/ fs
done swap:/dev/sda1 swap

I think the error is maybe also caused by the situation that /dev/sda1 is swap, and /dev/sda2 is / aswell as /boot/
if you need any further logfiles please tell me!

Contributor

sepsesam commented Jun 5, 2013

hi,

i tried to reproduce, the strange thing is:

after recovering the data to /mnt/local and exiting the rear> command prompt with "exit" it does not show any error, the grub failure however is reported to the logfile

if get back into the recovery mode again, skipping the disk partitioninig it then complains correctly about
missing the boot partition:

rear> exit
Did you restore the backup to /mnt/local ? Are you ready to continue recovery ? yes
exit
Updated initramfs with new drivers for this system.
Installing GRUB boot loader
ERROR: BUG BUG BUG! Unable to find any /boot partitions
=== Issue report ===
Please report this unexpected issue at: https://github.com/rear/rear/issues
Also include the relevant bits from /var/log/rear/rear-dbupdate.log

HINT: If you can reproduce the issue, try using the -d or -D option !

Aborting due to an error, check /var/log/rear/rear-dbupdate.log for details
Terminated

so it seems there must be a difference between executing the recovery the first and the second time!
Attached you can find the logfile for the first recovery which does NOT catch the BugIfError, here is the relevant
part with -D

++++ grep -E '^[^ ]+ /dev/sda2 ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ sort
+++ type=part
+++ [[ part != \p\a\r\t ]]
+++ echo /dev/sda2
+++ for component in '"${ancestors[@]}"'
+++ [[ -n part ]]
++++ get_component_type /dev/sda
++++ grep -E '^[^ ]+ /dev/sda ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ uniq -u
+++ type=disk
+++ [[ disk != \p\a\r\t ]]
+++ continue
++ bootparts=/dev/sda2
++ grub_prefix=/grub
++ [[ -z /dev/sda2 ]]
++ [[ -n /dev/sda2 ]]
++ BugIfError 'Unable to find any /boot partitions'
++ (( 0 != 0 ))
+++ grep '^disk ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disklayout.conf
+++ cut '-d ' -f2

contents of diskalyout/todo:

disk /dev/sda 5368709120 msdos
part /dev/sda 12409206784 32256 primary none /dev/sda1
part /dev/sda 4770662400 575769600 primary boot /dev/sda2
fs /dev/sda2 / reiserfs uuid=3c83197e-b9f8-46ed-9586-105a48932e1b label= options=rw,acl,user_xattr
swap /dev/sda1 uuid= label=

done /dev/sda disk
done /dev/sda1 part
done /dev/sda2 part
done fs:/ fs
done swap:/dev/sda1 swap

I think the error is maybe also caused by the situation that /dev/sda1 is swap, and /dev/sda2 is / aswell as /boot/
if you need any further logfiles please tell me!

@dagwieers

This comment has been minimized.

Show comment
Hide comment
@dagwieers

dagwieers Jun 7, 2013

Member

I think I get it now, although I don't know why. Look at this piece of code:

    # Find exclusive partitions belonging to /boot (subtract root partitions from deps)
    bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
    grub_prefix=/grub
    if [[ -z "$bootparts" ]]; then
        bootparts=$(find_partition fs:/)
        grub_prefix=/boot/grub
    fi
    # Should never happen
    [[ "$bootparts" ]]
    BugIfError "Unable to find any /boot partitions"

If in your case /boot is in the root partition, we expect to get from find_partition foor both fs:/ and fs:/boot to get the same partition back (/dev/sda2). This means that "sort | uniq -u" removes non-unique entries:

echo -e "/dev/sda2\n/dev/sda2" | sort | uniq -u

And so bootparts is expected to be empty, but in this case it isn't empty and that's the real problem. From your code I cannot tell what is going on (not enough copy&pasted).

Let me do this on my own running system (this is a good way to debug the code BTW:

[root@moria ~]# LAYOUT_DEPS=/var/lib/rear/layout/diskdeps.conf
[root@moria ~]# LAYOUT_FILE=/var/lib/rear/layout/disklayout.conf 
[root@moria ~]# LAYOUT_TODO=/var/lib/rear/layout/disktodo.conf 
[root@moria ~]# source /usr/share/rear/lib/array-functions.sh 
[root@moria ~]# source /usr/share/rear/lib/layout-functions.sh
[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/boot
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot )
/dev/sda2
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort
/dev/sda1
/dev/sda2
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort | uniq -u
/dev/sda1

This is to be expected in my case (/dev/sda1 is the /boot partition). In your case it should return nothing. Which means / and /boot are the same device.

Member

dagwieers commented Jun 7, 2013

I think I get it now, although I don't know why. Look at this piece of code:

    # Find exclusive partitions belonging to /boot (subtract root partitions from deps)
    bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
    grub_prefix=/grub
    if [[ -z "$bootparts" ]]; then
        bootparts=$(find_partition fs:/)
        grub_prefix=/boot/grub
    fi
    # Should never happen
    [[ "$bootparts" ]]
    BugIfError "Unable to find any /boot partitions"

If in your case /boot is in the root partition, we expect to get from find_partition foor both fs:/ and fs:/boot to get the same partition back (/dev/sda2). This means that "sort | uniq -u" removes non-unique entries:

echo -e "/dev/sda2\n/dev/sda2" | sort | uniq -u

And so bootparts is expected to be empty, but in this case it isn't empty and that's the real problem. From your code I cannot tell what is going on (not enough copy&pasted).

Let me do this on my own running system (this is a good way to debug the code BTW:

[root@moria ~]# LAYOUT_DEPS=/var/lib/rear/layout/diskdeps.conf
[root@moria ~]# LAYOUT_FILE=/var/lib/rear/layout/disklayout.conf 
[root@moria ~]# LAYOUT_TODO=/var/lib/rear/layout/disktodo.conf 
[root@moria ~]# source /usr/share/rear/lib/array-functions.sh 
[root@moria ~]# source /usr/share/rear/lib/layout-functions.sh
[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/boot
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot )
/dev/sda2
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort
/dev/sda1
/dev/sda2
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort | uniq -u
/dev/sda1

This is to be expected in my case (/dev/sda1 is the /boot partition). In your case it should return nothing. Which means / and /boot are the same device.

@dagwieers

This comment has been minimized.

Show comment
Hide comment
@dagwieers

dagwieers Jun 7, 2013

Member

Ok, I got it:

[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/usr

So the find_partition only works for mountpoints :-(

Member

dagwieers commented Jun 7, 2013

Ok, I got it:

[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/usr

So the find_partition only works for mountpoints :-(

@ghost ghost assigned dagwieers Jun 7, 2013

dagwieers added a commit to dagwieers/rear that referenced this issue Jun 7, 2013

Fix installing grub when /boot is inside the root filesystem
This fixes simplifies the code by using df in order to find the filesystem name (mountpoint) of a path and using this to determine whether /boot is in the root filesystem.

This change patches both grub and grub2 (since it was copied from grub).

This fixes #210.

@dagwieers dagwieers closed this in #237 Jun 7, 2013

@dagwieers

This comment has been minimized.

Show comment
Hide comment
@dagwieers

dagwieers Jun 7, 2013

Member

Please test the fix and reopen the bug if it does not work.

Member

dagwieers commented Jun 7, 2013

Please test the fix and reopen the bug if it does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment