Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disks installed incorrectly on real hardware #337

Closed
celskeggs opened this issue Oct 27, 2018 · 12 comments
Closed

Disks installed incorrectly on real hardware #337

celskeggs opened this issue Oct 27, 2018 · 12 comments

Comments

@celskeggs
Copy link
Member

On real hardware, the installer is actually on a USB stick, not on a CDROM. In practice, the USB drive shows up as sda, and the actual disk shows up as sdb.

This causes a few problems:

  • grub cannot install normally (... did we hard-code the path for it or something?), and must be installed manually
  • the disk numbering changes on reboot without the USB stick, which means that sdb no longer exists, and everything gets stuck in the initramfs on boot. (This can be fixed manually, by mounting the disk from the initramfs, and then regenerating it.)
  • the normal automatic partitioning settings are, for an unknown reason, skipped, letting the user fall back to the normal debian partitioning menu.

Given that we need to add Ceph support soon, it would be prudent for us to have a better disk-handling story.

@celskeggs
Copy link
Member Author

The reason that installation isn't fully automated when using a USB drive is that the code only autoselects a disk when there's only one inserted:

# See if any autopartition disks have been set
disks=""
if db_get partman-auto/disk && [ "$RET" ]; then
        disks="$RET"
fi

# If there's only one disk, then preseeding partman-auto/disk is
# unnecessary, and sometimes inconvenient in heterogeneous environments
if [ "$method" ] && [ -z "$disks" ]; then
        DEVS="$(get_auto_disks)"
        if [ "$(echo "$DEVS" | wc -l)" -eq 1 ]; then
                disks="$(cat "${DEVS%$TAB*}"/device)"
        fi
fi

This comes from the https://salsa.debian.org/installer-team/d-i source code under packages/partman-auto/display.d/initial_auto.

In our case, we have the choice to either hardcode the disk we use, patch the source code to autoselect the disk, or figure out some sort of step to automatically select it.

@celskeggs
Copy link
Member Author

celskeggs commented Oct 25, 2019

Relevantly, the only reason a CDROM isn't treated as a valid disk is that there's special-case code for it:

get_auto_disks() {
        local dev device dmtype

        for dev in $DEVICES/*; do
                [ -d "$dev" ] || continue

                device=$(cat $dev/device)
                
                # Skip software RAID (mdadm) devices (/dev/md/X and /dev/mdX)
                # unless it's a whole-disk partitionable array
                if echo "$device" | grep -Eq "/dev/md/?[0-9]*$"; then
                        if ! is_wholedisk_mdraid "$device"; then
                                continue
                        fi
                fi

                # Skip installer disk
                $(mount | grep -qF "$device on /cdrom ") && continue

                # Skip device mapper devices (/dev/mapper/),
                # except for dmraid or multipath devices
                if echo $device | grep -q "^/dev/mapper/"; then
                        if [ ! -f "$dev/sataraid" ] && \
                           ! is_multipath_dev $device; then
                                continue
                        fi
                fi
                printf "$dev\t$(device_name $dev)\n"
        done
}

(via packages/partman-auto/lib/auto-shared.sh)

We could also patch this part of the code. If we wanted to be real sketch, we could even do it at runtime, since it's just a bash script.

@celskeggs
Copy link
Member Author

oh lol. there's already support in the preseed for running a command to pick disks manually:

# This command is run immediately before the partitioner starts. It may be
# useful to apply dynamic partitioner preseeding that depends on the state
# of the disks (which may not be visible when preseed/early_command runs).
#d-i partman/early_command \
#       string debconf-set partman-auto/disk "$(list-devices disk | head -n1)"

We just have to figure out how we want to select the disk to use.

@celskeggs
Copy link
Member Author

I've set up mitigations that appear to work for the partitioning and grub installation steps, which work by choosing the only disk that doesn't have a partition with the label ISOIMAGE, but the initramfs is still off-target, apparently due to the generated grub configuration pointing to /dev/sdb1 instead of /dev/sda1 or (more reasonably) something in /dev/disk/by-uuid/.

Not entirely sure what the right solution is to get the grub config to be generated correctly.

@cryslith
Copy link
Member

cryslith commented Oct 26, 2019

something in /dev/disk/by-uuid/

why doesn't this work? Shouldn't the disk have the same uuid?

@celskeggs
Copy link
Member Author

Yes, this should work, but I can't pass the disk-by-UUID to grub, because grub needs the full disk to install into, and the by-uuid symlinks are (apparently?) to partitions. I'm going to do more investigation on this.

@celskeggs
Copy link
Member Author

celskeggs commented Nov 15, 2019

Aha! I think I figured it out.

Grub is supposed to do the replacement with the UUID path itself. It wasn't doing so, because the UUID it had didn't match the UUID in /dev/disk/by-uuid/. After much painful debugging, it turns out that it's trying to use the UUID extracted from the filesystem header on disk, but the /dev/disk/by-uuid symlink uses udev, which is stale at this point. This can be fixed by running udevadm trigger, which causes udev to handle events from the kernel.

I'm not sure why this isn't run automatically by the installer, but running it fixes the problem, so I intend to run it and then update grub's config.

@celskeggs
Copy link
Member Author

Honestly, this might actually be an upstream bug in the debian installer...

@cryslith
Copy link
Member

cryslith commented Nov 15, 2019

Nice work! I agree this could be a bug in d-i.

@celskeggs
Copy link
Member Author

This bug is theoretically fixed in #435, and needs to be tested by reinstalling on the preproduction cluster.

@celskeggs
Copy link
Member Author

This has been tested on real hardware. Specifically, I used IPMI to attach the ISO as a virtual hard disk, confirmed that it presented the same scenario as we saw with the USB drive in person (i.e. having sda for the USB drive and sdb for the disk), and confirmed that the installer still functioned in this scenario.

This is not exactly the same as testing actually literally plugging in a USB drive, but I believe it's close enough -- and it means we've demonstrated we can actually install on a real machine.

@cryslith I'm closing this; let me know if you think I really need to test the ISO installer on a physical USB drive instead of using the IPMI virtual storage thing.

@cryslith
Copy link
Member

My feeling is that we should close this issue iff we can install on the hardware we have - from your comment I think this is the case, so I agree with closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants