Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Network cards UEFI boot #136

Open
ThomasToka opened this issue Jul 3, 2023 · 16 comments
Open

Multiple Network cards UEFI boot #136

ThomasToka opened this issue Jul 3, 2023 · 16 comments

Comments

@ThomasToka
Copy link

ThomasToka commented Jul 3, 2023

Hi,
we have servers with multiple network cards.
In Legacy mode we use IPAPPEND 2 to choose the right device that is chosen later for the dhcp request to download the squashfs.
In Uefi mode there is no such possibility.
Our problem is with Uefi.
It works with grml 2020.06, first the eth0 is tested, then eth1 is tested and as there is the dhcp selected as dhcp interface.
In 2021.07 and 2022.11 this stopped working. eth0 is tested, found as having link, but on this device is no dhcp. eth1 is not even tried and then we land in initramfs. eth1 is not tested at all.
This works with 2020.06 as mentioned.
grafik

Thank you.
Thomas

@mika
Copy link
Member

mika commented Jul 3, 2023

Hi @ThomasToka,

a customer of mine also ran into such a problem, there the problem shows up as follows:

  1. The nodhcp boot option actually doesn't prevent DHCP requests with PXE / netboot (while ip=.. is set accordingly)
  2. The cdc_ether module gets autoloaded on this system, which is related to the IPMI network device on the system, and works via DHCP so it's considered as booting device but fails of course because it's unrelated to our rootfs fetch=...

I'm not sure about your specific situation, I'd need to know what's in your /proc/cmdline (once your system is stuck in initramfs), but it feels like you're in a similar position with your setup.

The workaround for the problem at my customer for now is usage of the live-netdev boot option, so in your case that would be live-netdev=eth1. Does this work for you then?

@ThomasToka
Copy link
Author

ThomasToka commented Jul 3, 2023

Hello,
thanks for taking care :-)
live-netdev=eth1 is not a consistent solution as some servers swap the devices like they want.. like this one here:

these are the shown mac in bios for the devices..

eth0 aa:aa:aa:aa:aa:52
eth1 aa:aa:aa:aa:aa:53

when i boot grml that comes out:
eth0 aa:aa:aa:aa:aa:53
eth1 aa:aa:aa:aa:aa:52

our append string looks like this:

linuxefi grml/2021_07/boot/grml64full/vmlinuz boot=live fetch=http://10.0.0.1/grml/2021_07/grml64-full/grml64-full.squashfs lang=de nomce noprompt noeject noswraid vga=791 nolvm nodmraid nomodeset ramdisk=32768 ssh={?pw} nt={$notify_id} ip=dhcp net.ifnames=0 ipv6.disable_ipv6=1 biosdevname=0 netscript=10.0.0.1/grml/autorun0 arcconf_url=10.0.0.1

initrdefi grml/2021_07/boot/grml64full/initrd.img

so we use ip=dhcp. we dont want to use ip= with a static ip config or something else cause all our installation images use dhcp and we generate during installation static configurations for the final install.

for me it seems as the network devices script does not work anymore like in 2020.06. 2020.06 is that last that did it like it should be. it was so long under the radar cause we also have an own rescue system that mainly is used for such tasks.

could you please point me the script that does the "network discovery" ?

@ThomasToka
Copy link
Author

additionally: i tested live-netdev=eth1 if it then starts from this device.. yes it does. but as i say this is only a half backed solution cause some motherboards/kernels swap devices like this supermicro board here.
maybe you could expand live-netdev to a usage with a mac-address or pci address (thats what we do for systemd device name assigning). live-netdev=aa:aa:aa:aa:aa:52.
that would fix it naturally.

@mika
Copy link
Member

mika commented Jul 3, 2023

could you please point me the script that does the "network discovery" ?

The relevant component is https://github.com/grml/live-boot-grml/ - which is an "extension" of what's going on in the initramfs itself (which is driven by https://salsa.debian.org/kernel-team/initramfs-tools), and a custom version of upstream's https://salsa.debian.org/live-team/live-boot.

Most relevant for you should be https://github.com/grml/live-boot-grml/blob/master/components/9990-select-eth-device.sh + https://github.com/grml/live-boot-grml/blob/master/components/9990-networking.sh.

Furthermore, unless boot option debian_networking is present, https://github.com/grml/live-boot-grml/blob/master/components/9990-grml-networking.sh might be also being aware of (otherwise, so iff boot option debian_networking is present, then it's https://github.com/grml/live-boot-grml/blob/master/components/9990-netbase.sh).

@ThomasToka
Copy link
Author

ThomasToka commented Jul 3, 2023

We see more and more UEFI only Boards upcomming. Also with several onboard nics. This is a general long know problem with the swap of device names..
Do you see chance to do something in grml generally to make this work somehow? A solution would be great.
I develop myself some things but to be honest i did not have a deeper look in your great grml universe here on github. So i would have to invest time find out things that for you or the this part managing devs see and know in seconds what is where to do..
Otherwise i will have to bite the snake and have a deeper look if i can manage somehow.
Thx for the links..

@ThomasToka
Copy link
Author

case "${ARGUMENT}" in
  live-netdev=*)
    NETDEV="${ARGUMENT#live-netdev=}"

    # Check if NETDEV is a valid MAC address
    if [[ $NETDEV =~ ^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$ ]]; then
      echo "NETDEV is a valid MAC address."

      # Retrieve the device name associated with the MAC address
      DEVICE_NAME=$(ip -o link | awk -v mac="$NETDEV" '$0 ~ mac{print substr($2, 1, length($2)-1)}')
      if [ -n "$DEVICE_NAME" ]; then
        echo "Device name for MAC address $NETDEV is $DEVICE_NAME."
        NETDEV="$DEVICE_NAME"
      else
        echo "No device found for MAC address $NETDEV."
        # Handle the case when no device is found for the MAC address
      fi

    else
      echo "NETDEV is not a valid MAC address. Assuming it is a device name."
      # Assign NETDEV directly to $NETDEV
      NETDEV="$NETDEV"
    fi

    echo "DEVICE=$NETDEV" >> /conf/param.conf
    echo "Found live-netdev parameter, forcing it to use network device $NETDEV."
    Wait_for_carrier $NETDEV
    return
    ;;
esac

something like this would be nice ^^

@mika
Copy link
Member

mika commented Jul 4, 2023

@ThomasToka thanks for your suggestion, something like this sounds like a good idea! 👍
We definitely need to do something about it, just can't promise anything yet (time-wise) :)

@ThomasToka
Copy link
Author

is this the correct way to pack the initrd.img?

find . | cpio -o > ../new_initrd.img
cd ..
xz < new_initrd.img > new_initrd.img.xz
mv new_initrd.img.xz /tftpboot/grml/initrd.img

cause somehow i get a kernel panic.

@mika
Copy link
Member

mika commented Jul 4, 2023

update-initramfs -k $(uname -r) -u generates the initramfs file (see mkinitramfs(8) for further details)

@ThomasToka
Copy link
Author

ThomasToka commented Jul 4, 2023

update-initramfs -k $(uname -r) -u generates the initramfs file (see mkinitramfs(8) for further details)

i think you missunderstood me.

i unpacked the initrd.img to modify the script and now want to pack the unpacked folder to use it.

root@xx:/usr/src/grml# ls -la
insgesamt 40
drwxr-xr-x  8 root root 4096 Jul  4 09:36 .
drwxr-xr-x 11 root root 4096 Jul  4 09:23 ..
lrwxrwxrwx  1 root root    7 Jul  4 09:16 bin -> usr/bin
drwxr-xr-x  3 root root 4096 Jul  4 09:16 conf
drwxr-xr-x  2 root root 4096 Jul  4 09:16 cryptroot
drwxr-xr-x  9 root root 4096 Jul  4 09:16 etc
-rwxr-xr-x  1 root root 6301 Jul  4 09:16 init
lrwxrwxrwx  1 root root    7 Jul  4 09:16 lib -> usr/lib
lrwxrwxrwx  1 root root    9 Jul  4 09:16 lib64 -> usr/lib64
drwxr-xr-x  2 root root 4096 Jul  4 09:16 run
lrwxrwxrwx  1 root root    8 Jul  4 09:16 sbin -> usr/sbin
drwxr-xr-x  9 root root 4096 Jul  4 09:17 scripts
drwxr-xr-x  6 root root 4096 Jul  4 09:16 usr

update-initramfs is afaik for updating in a running system.

@mika
Copy link
Member

mika commented Jul 4, 2023

Yeah, though you can run update-initramfs on a live system and see what it does underneath, like the referenced mkinitramfs(8) (which is used underneath of update-initramfs).

It basically does something like:

find . | LC_ALL=C sort | cpio --quiet $cpio_owner_root $cpio_reproducible -o -H newc >>"${outfile}

and then xz -9 --check=crc32 (see https://salsa.debian.org/kernel-team/initramfs-tools/-/commit/332393057e27b3567a3d2c3367a0a5edbe35e86d), but I don't have exact and ready-to-use command lines available right now :)

@ThomasToka
Copy link
Author

ThomasToka commented Jul 4, 2023

find . | LC_ALL=C sort | cpio --quiet -R 0:0 $cpio_reproducible -o -H newc >>"${outfile}

so cpio_owner_root is probably -R 0:0.

but what is cpio_reproducible?

Sorry for that many questions? :-(

@mika
Copy link
Member

mika commented Jul 4, 2023

$cpio_reproducible := --reproducible, just look at the source :)

@ThomasToka
Copy link
Author

Thank you very much for the hints.

It works.

unpack:

cp /tftpboot/grml/initrd.img /usr/src/grml/
cd /usr/src/grml
mv initrd.img initrd.img.xz
xz -d initrd.img.xz
cpio -i < initrd.img

i had to adjust my proposal cause we are in /bin/sh here so i did:
nano usr/lib/live/boot/9990-select-eth-device.sh

                        case "${ARGUMENT}" in
                          live-netdev=*)
                            NETDEV="${ARGUMENT#live-netdev=}"

                            # Check if NETDEV is a valid MAC address
                            if echo "$NETDEV" | grep -Eq '^[0-9A-Fa-f]{2}[:-]([0-9A-Fa-f]{2}[:-]){4}[0-9A-Fa-f]{2}$'; then
                              echo "NETDEV is a valid MAC address."

                              # Retrieve the device name associated with the MAC address
                              DEVICE_NAME=$(ip -o link | awk -v mac="$NETDEV" '$0 ~ mac{print substr($2, 1, length($2)-1)}')
                              if [ -n "$DEVICE_NAME" ]; then
                                echo "Device name for MAC address $NETDEV is $DEVICE_NAME."
                                NETDEV="$DEVICE_NAME"
                              fi
                            else
                              echo "NETDEV is not a valid MAC address. Assuming it is a device name."
                              # Assign NETDEV directly to $NETDEV
                              NETDEV="$NETDEV"
                            fi
                            echo "DEVICE=$NETDEV" >> /conf/param.conf
                            echo "Found live-netdev parameter, forcing it to use network device $NETDEV."
                            Wait_for_carrier "$NETDEV"
                            return
                            ;;
                        esac
find . | LC_ALL=C sort | cpio --quiet -R 0:0 --reproducible -o -H newc >>/usr/src/initrd.img
xz -9 --check=crc32 < initrd.img > /tftpboot/grml/new_initrd.img

adding live-netdev=xx:xx:xx:xx:xx:52 to append then boots the desired interface.

for me fixed and our 2021 works now. will patch 2022 now for us ;)

@mika
Copy link
Member

mika commented Jul 4, 2023

@ThomasToka glad to hear, you're welcome! That sounds like an approach that we can also use for Grml, do you feel like you could provide a PR against https://github.com/grml/live-boot-grml/ or would you prefer me to take care of it (based on your patch with credits of course)?

@ThomasToka
Copy link
Author

grml/live-boot-grml#16

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants