Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debian: Unable to boot Kairos installer #2522

Open
6ixfalls opened this issue Apr 30, 2024 · 18 comments
Open

debian: Unable to boot Kairos installer #2522

6ixfalls opened this issue Apr 30, 2024 · 18 comments
Assignees
Labels

Comments

@6ixfalls
Copy link

Kairos version:

Fails to boot on kairos-debian-bookworm-standard-amd64-generic-v3.0.8-k3sv1.29.3+k3s1, success on kairos-debian-bookworm-standard-amd64-generic-v3.0.0-k3sv1.29.0+k3s1

CPU architecture, OS, and Version:

Linux localhost 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

(output from v3.0.0)

Describe the bug

The Kairos ISO is unable to boot and I'm unable to install Kairos (manually and automatically).

To Reproduce

Try to install Kairos on the latest version, not sure if this is reproducible. This is running in a KVM VM.

Expected behavior

I should be able to boot into the Kairos install ISO.

Logs

image

Additional context

This bug looks exactly like #2467, but trying the fix there and adding that to a Dockerfile doesn't resolve the issue.

@6ixfalls 6ixfalls added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Apr 30, 2024
@ci-robbot ci-robbot added the question Further information is requested label Apr 30, 2024
@ci-robbot
Copy link
Collaborator

Hello, 6ixfalls! I'm an automated bot assisting with Github issue audits in the kairos project. I've added the 'question' label to your issue (#2522) because it appears we need more information to properly investigate your report.

To enhance our understanding and help us better address your problem, please provide:

  • A clear description of the issue you're experiencing, including any error messages you receive.
  • Steps to reproduce the problem, if possible.
  • The versions of all relevant artifacts you're using, such as the Kairos version, CPU architecture, OS version, and any specific configurations or dependencies.

Please ensure that your description, steps to reproduce, and version details are explicitly mentioned in your issue. We appreciate your efforts to help us improve Kairos, and don't hesitate to reach out if you have any questions. Note that I am a bot, an experiment of @mudler and @jimmykarily.

Thanks!
kairos-io Githubbot

@6ixfalls
Copy link
Author

6ixfalls commented Apr 30, 2024

This could be related, but I'm using a custom docker image with auroraboot to generate an ISO. The Dockerfile is here:
https://github.com/6ixfalls/taonet-cloud/blob/main/containers/kairos-debian/Dockerfile

It also appears this issue was introduced between 3.0.0 and 3.0.3 - this appears to be a fix to the issue:
tyzbit/kairos-distros@e11adda

@tyzbit
Copy link

tyzbit commented Apr 30, 2024

A note: that was an attempted fix. It didn't fix it for me on 3.0.3 but I didn't try other versions.

@jimmykarily
Copy link
Contributor

Maybe this is relevant? https://github.com/kairos-io/packages/blob/718aaa27e4688559433cd889513f1944a7679ef4/packages/static/kairos-overlay-files/files/system/oem/12_nvidia.yaml#L10

oh wait, you are not on nvidia. On the other hand, maybe that module needs to be included somehow (?).

@jimmykarily
Copy link
Contributor

Maybe this is relevant? https://github.com/kairos-io/packages/blob/718aaa27e4688559433cd889513f1944a7679ef4/packages/static/kairos-overlay-files/files/system/oem/12_nvidia.yaml#L10

oh wait, you are not on nvidia. On the other hand, maybe that module needs to be included somehow (?).

maybe not that irrelevant after all: https://forums.fedoraforum.org/showthread.php?325865-dracut-FATAL-iscsi-requested-but-kernel-initrd-does-not-support-iscsi

you could try to omit iscsi in dracut to see if this helps

@6ixfalls
Copy link
Author

6ixfalls commented May 1, 2024

Maybe this is relevant? https://github.com/kairos-io/packages/blob/718aaa27e4688559433cd889513f1944a7679ef4/packages/static/kairos-overlay-files/files/system/oem/12_nvidia.yaml#L10

oh wait, you are not on nvidia. On the other hand, maybe that module needs to be included somehow (?).

maybe not that irrelevant after all: https://forums.fedoraforum.org/showthread.php?325865-dracut-FATAL-iscsi-requested-but-kernel-initrd-does-not-support-iscsi

you could try to omit iscsi in dracut to see if this helps

I'm not sure if this is how to correctly do it, but I tried this configuration and it did not fix the issue.

@tyzbit
Copy link

tyzbit commented May 1, 2024

If this does what I suspect, this would break compatibility with at least Longhorn. Can we see what it takes for the kernel to support iscsi?

@mudler mudler changed the title Unable to boot Kairos installer debian: Unable to boot Kairos installer May 6, 2024
@mudler
Copy link
Member

mudler commented May 6, 2024

It looks like we need to disable iscsi as we do already for nvidia:

# Disable ISCSI

@mudler mudler added chore and removed question Further information is requested triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels May 6, 2024
@mauromorales mauromorales self-assigned this May 6, 2024
mauromorales added a commit that referenced this issue May 7, 2024
fixes #2522

Signed-off-by: Mauro Morales <mauro.morales@spectrocloud.com>
@Itxaka
Copy link
Member

Itxaka commented May 7, 2024

looks like iscsi modules are not properly set in the initramfs as dracut failure indicates that its checking for the iscsi_tcp mod to be available

You could try to install iscsiuio alongside and regenerate the initramfs as that seems to bring the proper iscsi_tcp module needed by dracut

Im gonna try a qucik test here, but I can see already that once installing that package the modules are available and iscsi is added to the dracut modules

what cmdline are you using?

@Itxaka
Copy link
Member

Itxaka commented May 7, 2024

with a quick patch to install the package alongside Kairos and letting dracut regenerate the initramfs the proper module is available and loaded:

image

@mauromorales mauromorales removed their assignment May 7, 2024
@mauromorales mauromorales added the question Further information is requested label May 7, 2024
@athnoc-dev
Copy link

I can confirm that customizing the Debian image (only tested this one) from v3.0.0 and up produces the "iscsi error" for dracut. I followed this doc https://kairos.io/docs/advanced/customizing/ at first. Then I used this docker file (https://github.com/kairos-io/kairos/blob/master/images/Dockerfile.kairos-debian) to rebuild the image from scratch while adding packages I needed. Still the iscsi error from dracut appeared. After that I added the "iscsiuio" package and net booting with Aurora worked... the first time.

The second time I launched Aurora at tried to net boot the server, it gave me the same error. I inspected the temp directory to which Auroraboot extracts the ISO and the /netboot directory contains all the net boot artifacts. I inspected the kernel file and compared it to the kernel files in the ISO (which are unpacked in the temp directory).

I found that the net boot kernel (kairos-kernel) was the oldest kernel file and not the most recent, which is why it did not contain the iscsi module of which dracut complains it is not present in the kernel during net boot. I copied the latest kernel and used the other artifacts in /tmp/netboot to start pixiecore and everything worked as expected.

It looks like Auroraboot is picking the wrong kernel (sometimes) for booting, can you confirm?

@jimmykarily
Copy link
Contributor

Let's install iscsiuio by default (all flavors?) so that it makes it to the initramfs.

@tyzbit
Copy link

tyzbit commented May 20, 2024

I tried that and it did not seem to help tyzbit/kairos-distros@e11adda
It does strongly seem to be an AuroraBoot issue

@mauromorales mauromorales removed question Further information is requested chore labels May 23, 2024
@mauromorales
Copy link
Member

let's try to replicate in auroraboot and see if we can detect what the issue actually is

@athnoc-dev
Copy link

Check which kernel AuroraBoot is using in /tmp/netboot

In my case the errors persisted because an older kernel was used, instead of the latest that had the supporting iscsi modules.

I copied the latest kernel from the temp directory (the unpacked ISO) and replaced the kernel file and all worked fine.

@jimmykarily jimmykarily self-assigned this May 29, 2024
@jimmykarily
Copy link
Contributor

  • I created a patch in kairos:
~/workspace/kairos/kairos (master)*$ git diff
diff --git a/images/Dockerfile.debian b/images/Dockerfile.debian
index 39d94482..07862509 100644
--- a/images/Dockerfile.debian
+++ b/images/Dockerfile.debian
@@ -64,6 +64,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
     iputils-ping \
     isc-dhcp-common \
     isc-dhcp-client \
+    iscsiuio \
     jq \
     krb5-locales \
     less \
@@ -162,4 +163,4 @@ RUN systemctl enable systemd-networkd
 RUN systemctl enable ssh
 
 # Fixup sudo perms
-RUN chown root:root /usr/bin/sudo && chmod 4755 /usr/bin/sudo
\ No newline at end of file
+RUN chown root:root /usr/bin/sudo && chmod 4755 /usr/bin/sudo
diff --git a/images/Dockerfile.kairos-debian b/images/Dockerfile.kairos-debian
index 60c85c1d..3391363c 100644
--- a/images/Dockerfile.kairos-debian
+++ b/images/Dockerfile.kairos-debian
@@ -63,6 +63,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
     iputils-ping \
     isc-dhcp-common \
     isc-dhcp-client \
+    iscsiuio \
     jq \
     krb5-locales \
     less \
  • I built a container image:
earthly +base-image --VARIANT=core --FLAVOR=debian --FLAVOR_RELEASE=bookworm-slim  --BASE_IMAGE=debian:bookworm-slim  --MODEL=generic --FAMILY=debian
  • I started Auroraboot with the result image:
docker run --rm -ti -v /var/run/docker.sock:/var/run/docker.sock --net host quay.io/kairos/auroraboot --set "container_image=docker://quay.io/kairos/debian:bookworm-slim-core-amd64-generic-v3.0.4-73-g8ddb9092-dirty"
  • I started a VM in netboot mode (with virt-manager)

It successfully boots debian.

Since the docker command I used to run Auroraboot didn't mount any volumes, it's not possible to have cached any data between runs. @tyzbit how are you running Auroraboot? @athnoc-dev suggestion makes me think that some people might be using some command (from our docs?) that is mounting a volume and caches things. Is that the case?

@6ixfalls
Copy link
Author

Since the docker command I used to run Auroraboot didn't mount any volumes, it's not possible to have cached any data between runs. tyzbit how are you running Auroraboot? athnoc-dev suggestion makes me think that some people might be using some command (from our docs?) that is mounting a volume and caches things. Is that the case?

This is true in my case - I use auroraboot to generate ISOs to upload to my Kairos nodes, and as a result I have a mount so that I can access the completed ISO. I don't think it should be expected behavior for auroraboot to not generate a new kernel if there's an existing one present - but I'm also not sure if reusing the same directory for building has any effect on the speed of the builds themselves either.

I'm actually not too sure if this is a kernel issue, because as far as I remember this issue occurs with a fresh auroraboot install. However, another thing that appears to be common among everyone who has the issue is that the Kairos Dockerfile is modified (is it possible that the Github Action caching the Docker buildsteps leads to this issue?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Under review 🔍
Development

No branches or pull requests

8 participants