Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioning errors in RAWDISK creation on Debian 7 and CentOS 6 #1846

Closed
GreenBlood opened this issue Jun 28, 2018 · 16 comments
Closed

Partitioning errors in RAWDISK creation on Debian 7 and CentOS 6 #1846

GreenBlood opened this issue Jun 28, 2018 · 16 comments
Labels
enhancement Adaptions and new features fixed / solved / done
Milestone

Comments

@GreenBlood
Copy link
Contributor

GreenBlood commented Jun 28, 2018

Relax-and-Recover (ReaR) Issue

  • ReaR version:
    Relax-and-Recover 2.4-git.3020.aa7b197.master / 2018-06-21
  • OS version:
    At least CentOS 6 and Debian 7 (fully updated)
  • ReaR configuration files:
BACKUP=NETFS
OUTPUT=RAWDISK
BACKUP_TYPE=incremental
FULLBACKUPDAY="Sun"


SSH_ROOT_PASSWORD="XXXXXXX"
USE_DHCLIENT=yes

BACKUP_URL=cifs://XXXXX/XXXXX
BACKUP_OPTIONS="cred=/etc/rear/.cifs_credentials"
  • System architecture (x86 compatible or POWER and/or what kind of virtual machine):
    amd64
  • Are you using BIOS or UEFI or another way to boot?
    classic BIOS
  • Brief description of the issue:
    I've found an issue while trying to use the RAWDISK output on old linux distroes (Centos 6 and Debian 7)
    When the mkrescue commands reaches the 280_create_bootable_disk_image.sh file, it creates the raw file with dd, and adds correctly the rescue partition. However, on line 86 (mkfs.vfat) the program does not find the loop device's partition (/dev/loop0p1).
+++ losetup --show --find /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw
++ disk_device=/dev/loop0
++ StopIfError 'Could not create loop device on /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw'
++ ((  0 != 0  ))
++ AddExitTask 'losetup -d /dev/loop0 >&2'
++ EXIT_TASKS=("$*" "${EXIT_TASKS[@]}")
++ Debug 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
++ test 1
++ Log 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
+++ date '+%Y-%m-%d %H:%M:%S.%N '
++ local 'timestamp=2018-06-28 14:58:49.154127597 '
++ test 1 -gt 0
++ echo '2018-06-28 14:58:49.154127597 Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
2018-06-28 14:58:49.154127597 Added 'losetup -d /dev/loop0 >&2' as an exit task
++ partprobe /dev/loop0
++ local boot_partition=/dev/loop0p1
++ mkfs.vfat -v /dev/loop0p1 -n 'RESCUE SYS'
/dev/loop0p1: No such file or directory
mkfs.vfat 3.0.13 (30 Jun 2012)
++ Error 'Could not create boot file system'
++ LogPrintError 'ERROR: Could not create boot file system'

I tried running the losetup command myself and in fact no /dev/loop0p1 appears, even though if I run gdisk on the loop0 device it finds correctly the partition created beforehand.

I guess it has to do as how those distros handle "refreshing" the partitions available but I don't really know how to work around that.

  • Work-around, if any:
    None yet.

Regards,
Green

@jsmeix
Copy link
Member

jsmeix commented Jun 29, 2018

Support for RAWDISK output
(plus TCG Opal 2-compliant self-encrypting disks)
was implemented by @OliverO2 in
#1659
where The code has been tested successfully on Ubuntu 16.04.3 LTS.

@OliverO2
could you please have a look what goes on here?

@GreenBlood
Copy link
Contributor Author

GreenBlood commented Jun 29, 2018

Just tried on Ubuntu 14 LTS and it works fine.

Then I tried to update (using backports) the Debian kernel to 3.16 and it does not change.
But as debian 7 is not supported anymore (my info were outdated) we might choose to drop it. It still leaves CentOS 6 which is still active.

On other news, using debian 8 there is no issue.

Might be related to losetup/util-linux version.

@jsmeix
Copy link
Member

jsmeix commented Jun 29, 2018

I know nothing at all about RAWDISK output
but from plain looking at the code in
usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh

disk_device="$(losetup --show --find "$disk_image")"
...
local boot_partition="${disk_device}p1"
...
mkfs.vfat $v "$boot_partition" ...

it seems one cannot assume that the boot_partition device name
is always of the form ${disk_device}p1.

I assume how partitions are named in this case also depends on
what each particular Linux distribution likes to do in this area
like the various ways how each version of each Linux distribution
implements their naming of multipath device nodes differently, cf.
#1765
in particular see
#1765 (comment)

@GreenBlood
if my above assumtion is right we would need to know
how on each version of each of your Linux distributions
the partitions are actually named in case of loop devices.

You could add a line

read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8

anywhere in output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh
e.g. directly before the mkfs.vfat ... line so that it stops there
and you could inspect what there actually is
at exactly that state on your system.

@OliverO2
Copy link
Contributor

@GreenBlood Thanks for reporting and your research done so far.

@jsmeix What you have argued so far sounds reasonable. I'll try to figure out what could be done to improve portability.

@jsmeix jsmeix added the enhancement Adaptions and new features label Jun 29, 2018
@jsmeix jsmeix added this to the ReaR future milestone Jun 29, 2018
@jsmeix jsmeix changed the title Partitionning errors in RAWDISK creation on debian 7 and centOS 6 Partitioning errors in RAWDISK creation on Debian 7 and CentOS 6 Jun 29, 2018
@GreenBlood
Copy link
Contributor Author

GreenBlood commented Jun 29, 2018

@jsmeix Yeah, i've actually already tried to run the commands by hand (dd, gdisk, losetup and such) and the loopXp1 does not appears anyhow. But running losetup -a or fdisk -l /dev/loop0 show that the loop device is activated.
HOWEVER
Using a more recent kernel on the centos 6 i've managed to get it working.

I noticed that util-linux was on a different version between CentOS 6 and 7, so I went and grabbed the last version source rpm and compiled it.

Using the newly compiled binaries (I have not installed them as its a pretty important package), I got this result :

[root@centos6 util-linux-2.23.2]# losetup --show --find /tmp/rear.MOL0gr9BL4uKxck/tmp/rear-centos6.raw
/dev/loop0
[root@centos6 util-linux-2.23.2]# partx -a /dev/loop0
HDIO_GETGEO: Inappropriate ioctl for device
[root@centos6 util-linux-2.23.2]# ./partx -a /dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0  /dev/loop0p1  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5  /dev/loop6  /dev/loop7  /dev/loop-control

While the compiled recent partx reloads the partitions, the system one does not work
Using the standard kernel and the compiled partx, it does not work anymore. So I guess it's the combination of the kernel version and a partx update that does the trick.

I don't know what to do with these informations, I feel like I went too far but meh. Seems like a dead end to me. My VM is now a CentOS Frankeinstein monster.

@OliverO2
Copy link
Contributor

AFAIK creating device names for partitions is a kernel excercise. Unfortunately, there doesn't seem to be a tool which reports assigned partition device names. For example, partprobe --summary just outputs

/dev/loop0: gpt partitions 1

As it looks like there are different opinions on how to create partition devices names and no one to ask, my best guess would be to just rely on the fact that partition names will consist of the device name followed by some appendix. In this case, there is only one partition, so its name shouldn't be too hard to guess...

@GreenBlood Could you change the line

local boot_partition="${disk_device}p1"

to

local boot_partition="$(echo "${disk_device}"?*)"

in usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh and see if that works on each distribution?

@OliverO2
Copy link
Contributor

OliverO2 commented Jun 29, 2018

@GreenBlood There was an overlap in our comments: If it's not a naming issue but a kernel failing to update its partitions table, there is probably not much we can do here. Modern kernels should update their partition tables automatically. Otherwise partprobe "$disk_device" should instruct them to. Another idea would be to use losetup's --partscan option like this:

disk_device="$(losetup --partscan --show --find "$disk_image")"

Would this make it work?

@GreenBlood
Copy link
Contributor Author

GreenBlood commented Jun 29, 2018

@OliverO2 Well, It seems that CentOS 6 being nearly ten years old, losetup does not includes --partscan.

Using my compiled losetup, it accepts this argument but does not work any better.

[root@centos6 util-linux-2.23.2]# ./losetup --show --find --partscan "/tmp/rear.IdSQo1PE0lrU95R/tmp/rear-centos6.raw"
/dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5  /dev/loop6  /dev/loop7

I guess CentOS 6 is off the list for RAWDISK, unless there is a way that I'm unaware of.

@jsmeix
Copy link
Member

jsmeix commented Jun 29, 2018

@OliverO2
as a minimal improvement could you add a test
that such a $boot_partition actually exists
and error out if not like

local boot_partition="${disk_device}p1"
test -b "$boot_partition" || Error "Cannot ceate raw disk image (no $boot_partition partition on $disk_device)"

This does not make things work in environments where it currently cannot work
but it would at least tell the user what is unexpected or wrong in his environment.

@jsmeix
Copy link
Member

jsmeix commented Jun 29, 2018

@OliverO2
in general regarding things like "kernel failing to update its partitions table"
you may have a look at the somewhat similar or related issue
#791

@OliverO2
Copy link
Contributor

OliverO2 commented Jul 3, 2018

@GreenBlood Maybe there is a solution after all if you're able to install the kpartx tool on your older distributions: Could you just try to apply commit bcb0ed3 in my branch https://github.com/OliverO2/rear/tree/feature/rawdisk-portability-improvements?

@jsmeix This should also improve the error message. I'll create a PR if this has been tested successfully (currently I don't have one of these older kernels available so I could not test it fully).

@GreenBlood
Copy link
Contributor Author

GreenBlood commented Jul 3, 2018

@OliverO2 Ok so I tried your patch but no luck.

But I think what @jsmeix was suggesting earlier might be the issue we're facing. It seems that running losetup, then kpartx -a /dev/loopX makes the device appear in /dev/disk/by-*
See:

[root@centos6 ~]# losetup -d /dev/loop0
[root@centos6 ~]# losetup --show --find rear-centos6.raw 
/dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
(System disks)
[root@centos6 ~]# kpartx -a /dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
dm-name-loop0p1   dm-uuid-part1-loop0        (System disks) 

So even though kpartx says that its adding a /dev/loop0p1 device it's not the case in this situation. On my server the symlinks in /dev/disk/by-* were to /dev/dm-2.
I integrated it in rear workflow to test using the correct devices on the server and it works.
I have this very ugly diff to show you what I modified (hardcoded value, but it was just for testing purposes :

@@ -67,9 +67,9 @@
 StopIfError "Could not create loop device on $disk_image"
 AddExitTask "losetup -d $disk_device >&2"
 
-partprobe "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
-local boot_partition="${disk_device}p1"
 
+kpartx -a "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
+local boot_partition="/dev/dm-2"
 
 ### Create and populate the boot file system
 
@@ -144,6 +144,7 @@
 
 umount "$boot_partition_root" || Error "Could not unmount boot file system"
 RemoveExitTask "umount $boot_partition_root >&2"
+kpartx -v -d /dev/loop0
 losetup -d "$disk_device" || Error "Could not delete loop device"
 RemoveExitTask "losetup -d $disk_device >&2"

I don't currently know how to "detect" where the loop device is going tho.

EDIT : It would seem that using lsblk -n --output "KNAME" $disk_device show on the second line where in /dev the partition is.
Debian 8 :

root@debian8:~# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1

CentOS 6 :

[root@centos6 ~]# lsblk -n --output "KNAME" /dev/loop0
loop0
dm-2

Ubuntu 16 LTS :

root@ubuntu16:/tmp# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1

I'd have to test on more Linux distros but as lsblk is part of util-linux it's a priori in every distro.

@OliverO2
Copy link
Contributor

OliverO2 commented Jul 3, 2018

@GreenBlood Thanks for trying. I think we're at least on the right path here. Note that my patch uses the -u option of kpartx, not -a. Unfortunately, kpartx is not well documented so it's trial and error.

Note that the mapping device dm-name-loop0p1 you saw is absolutely OK. kpartx creates device mappings like these but should also create links for the proper loop device path /dev/loop0p1.

Diagnosis

Could you post the relevant section of the rear log when running the code with my patch? Maybe you could run it again even examine the state after the kpartx call by inserting

read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8

before the comment

# If unsuccessful, say so.

and look for loop devices.

What does kpartx -l say?

Alternative solution without losetup:

In addition, could you try this?

  1. Replace the lines
local disk_device  # separate 'local' statement to avoid losing $(...) exit status - cf. https://stackoverflow.com/a/10397996
disk_device="$(losetup --show --find "$disk_image")"
StopIfError "Could not create loop device on $disk_image"
AddExitTask "losetup -d $disk_device >&2"

local boot_partition="${disk_device}p1"

with

local kpartx_fields=($(kpartx -asv "$disk_image"))
[[ ${#kpartx_fields[*]} == 6 ]] || Error "kpartx could not create loop device and its partitions (result: $kpartx_fields)"

local disk_device="${kpartx_fields[4]}"
local boot_partition="/dev/${kpartx_fields[0]}"

AddExitTask "kpartx -d $disk_image >&2"

LogPrint "loop device: $disk_device, boot partition: $boot_partition"
  1. Uncomment these lines:
losetup -d "$disk_device" || Error "Could not delete loop device"
RemoveExitTask "losetup -d $disk_device >&2"
  1. Then run rear and post the relevant section of the log.

@OliverO2
Copy link
Contributor

OliverO2 commented Jul 3, 2018

Correction - should be

AddExitTask "kpartx -d $disk_image >&2"

(not $disk_device)

@OliverO2
Copy link
Contributor

OliverO2 commented Jul 3, 2018

@GreenBlood

Update: I have pushed a new commit 26e6eec onto my branch https://github.com/OliverO2/rear/commits/feature/rawdisk-portability-improvements. With that commit I could successfully build a RAWDISK output file on CentOS 6.

Rear configuration:

OUTPUT=RAWDISK
OUTPUT_URL="file://$VAR_DIR/output"

Platform configuration:

  • CentOS-6.10-x86_64-minimal.iso
  • additional packages: gdisk, dosfstools

Terminal log:

Relax-and-Recover 2.4-git.3028.60985ac.featurerawdiskportabilityimprovements.changed / 2018-07-03
Using log file: /var/log/rear/rear-centos6.log
Creating disk layout
Using guessed bootloader 'GRUB' (found in first bytes on /dev/sda)
Creating root filesystem layout
Copying logfile /var/log/rear/rear-centos6.log into initramfs as '/tmp/rear-centos6-partial-2018-07-03T22:38:09+0200.log'
Copying files and directories
Copying binaries and libraries
Copying kernel modules
Copying all files in /lib*/firmware/
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (74476711 bytes) in 8 seconds
Creating 83 MiB raw disk image "rear-centos6.raw"
Using syslinux to install a Legacy BIOS bootloader
Copying resulting files to file location
Saving /var/log/rear/rear-centos6.log as rear-centos6.log to file location
Exiting rear mkrescue (PID 2930) and its descendant processes
Running exit tasks

@jsmeix
Copy link
Member

jsmeix commented Jul 9, 2018

@GreenBlood
with #1850 merged
this issue should be fixed where "fixed" means that now
ReaR tries as far as possible to make the needed partition device nodes appear
but if they finally won't appear it must error out because it cannot proceed
without the needed partition device nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adaptions and new features fixed / solved / done
Projects
None yet
Development

No branches or pull requests

3 participants