Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saved_entry grub options not getting applied on boot after upgrading kairos from 2.0.3 to 2.1.0 #1460

Closed
kpiyush17 opened this issue May 29, 2023 · 11 comments · Fixed by kairos-io/kairos-agent#36
Labels
bug Something isn't working

Comments

@kpiyush17
Copy link
Contributor

Kairos version:

NAME="openSUSE Leap"
VERSION="15.4"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.4"
PRETTY_NAME="openSUSE Leap 15.4"
KAIROS_NAME="kairos-core-opensuse-leap"
KAIROS_VERSION="v2.1.0"
KAIROS_ID="kairos"
KAIROS_ID_LIKE="kairos-core-opensuse-leap"
KAIROS_VERSION_ID="v2.1.0"
KAIROS_PRETTY_NAME="kairos-core-opensuse-leap v2.1.0"
KAIROS_IMAGE_REPO="quay.io/kairos/core-opensuse-leap"
KAIROS_IMAGE_LABEL="latest"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_VARIANT="core"
KAIROS_FLAVOR="opensuse-leap"

CPU architecture, OS, and Version:

5.14.21-150400.24.63-default #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f) x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

We are adding saved_entry grub_options as part of the elemental install config through kairos pkg github.com/kairos-io/kairos/pkg/config. These options get passed to the install config through a handler which gets executed by agent.install plugin event.

cloudInit.Install.Auto = true
cloudInit.Install.Reboot = !cloudInit.Install.Poweroff
cloudInit.Install.GrubOptions["saved_entry"] = "registration"

This is from 90_custom.yaml just after elemental install:

#cloud-config
install:
  auto: true
  grub_options:
    saved_entry: registration
  reboot: true

Here, setting the saved_entry flag to the registration grub. But after elemental installation is done, the next boot is not going to this grub instead it is going to the first entry in the grub menu. This has started to happen from v2.1.0.

Pasting my grubmenu.cfg here:

kubeadm-12:/oem # cat /etc/kairos/branding/grubmenu.cfg
menuentry "Palette eXtended Kubernetes Edge Reset" --id statereset {
    set img=/cOS/recovery.img
    search.fs_label COS_RECOVERY root
    set label=COS_SYSTEM
    loopback loop0 /$img
    set root=($root)
    source (loop0)/etc/cos/bootargs.cfg
    linux (loop0)$kernel $kernelcmd ${extra_cmdline} ${extra_recovery_cmdline} kairos.reset
    initrd (loop0)$initramfs
}

menuentry "Palette eXtended Kubernetes Edge Registration" --id registration {
    search --no-floppy --label --set=root COS_STATE
    set img=/cOS/active.img
    set label=COS_ACTIVE
    loopback loop0 /$img
    set root=($root)
    source (loop0)/etc/cos/bootargs.cfg
    linux (loop0)$kernel $kernelcmd ${extra_cmdline} ${extra_active_cmdline} stylus.registration
    initrd (loop0)$initramfs
}

One more thing I noticed is that in v2.1.0 after elemental installation, there is no grubenv file getting created in /oem.

In v2.0.3

kubeadm-11:/oem # ls
90_custom.yaml	grubenv  lost+found  userdata  userdata.yaml
kubeadm-11:/oem # cat grubenv
# GRUB Environment Block
# WARNING: Do not edit this file by tools other than grub2-editenv!!!
saved_entry=registration
#############################################################################################################################################################################################

In v2.1.0

kubeadm-12:/oem # ls
90_custom.yaml	lost+found  userdata  userdata.yaml
kubeadm-12:/oem #

Expected behavior
Expected to boot into proper grub menu entry. Here, in this case, it should boot into menuentry "Palette eXtended Kubernetes Edge Registration" --id registration. It was working fine till v2.0.3.

@kpiyush17 kpiyush17 added the bug Something isn't working label May 29, 2023
@mudler mudler removed their assignment May 29, 2023
@jimmykarily
Copy link
Contributor

I verified what @venkatnsrinivasan also noted in Slack, that on core v2.1.0 installing with this config file:

KAIROS_FLAVOR="opensuse-leap"kairos@localhost:~> cat /oem/90_custom.yaml 
#cloud-config

install:
    auto: true
    grub_options:
        saved_entry: registration
users:
    - name: kairos
      passwd: kairos

results in this file to be created:

kairos@localhost:~> cat /oem/grubenv 
# GRUB Environment Block
# WARNING: Do not edit this file by tools other than grub2-editenv!!!
saved_entry=registration

The problem lies either in the upgrade process or in the way the config is passed through the handler.

@venkatnsrinivasan
Copy link
Collaborator

There isnt an upgrade process, i think the upgrade in the ticket means , when using 2.0.3 vs 2.1.0. I tried creating a config yaml similar to above except the grub_options are set via our provider when kairos-agent sends the install event .It generates the 90_custom.yaml with the grub options but the grubenv isnt getting generated.

@jimmykarily
Copy link
Contributor

I think I know what's going on. The result from the provider is merged back to the object we pass to RunInstall but as top level keys (not merged into the cc key). But the RunInstall method, uses the cc key as the cloudInit configuration which doesn't have the keys sent from the provider.

@jimmykarily
Copy link
Contributor

Not only this but the kairos-install.pre is building it's own config by reading the CloudInitPaths completely ignoring the passed options object. This may or may not build the same config, it depends on the paths used to build each.

This also means, that the kairos-install.pre hook may run with different configuration than the installation.

Maybe it's better if we fix this as part of this: kairos-io/kairos-agent#32 which is already changing this part of the code, fixing some other bugs.

jimmykarily added a commit to kairos-io/kairos-agent that referenced this issue May 30, 2023
This is ugly and hackish. We merge back the provider's config into
`r["cc"]` because that's what RunInstall only uses as a cloud config.

Ideally, we should consolidate all those configurations and the
RunInstall should only take one argument which will be a struct that
holds all the information on how to perform the installation. This
refactoring will happen separately. This is just a quick fix for the
bug.

kairos-io/kairos#1460

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor

We discussed this and proper fix requires a heavy refactoring. We will try to provider a quick fix first and do the refactoring separately.

I created a possible quick fix here: kairos-io/kairos-agent@1672683 but I need a provider mock that sends back a cloud config, in order to try it out. I will try to create one and test.

@jimmykarily
Copy link
Contributor

I created a simple provider here: https://github.com/jimmykarily/kairos-provider-mock . I'll try to test it now.

jimmykarily added a commit to kairos-io/kairos-agent that referenced this issue May 30, 2023
This is ugly and hackish. We pass the provider's cloud config to the
collector so that it's is included in the `r["cc"]` that we pass to
the `RunInstall` method. That's what is uses as a cloud config.

Ideally, we should consolidate all those configurations and the
RunInstall should only take one argument which will be a struct that
holds all the information on how to perform the installation. This
refactoring will happen separately. This is just a quick fix for the
bug.

kairos-io/kairos#1460

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
jimmykarily added a commit to kairos-io/kairos-agent that referenced this issue May 30, 2023
This is ugly and hackish. We pass the provider's cloud config to the
collector so that it's is included in the `r["cc"]` that we pass to
the `RunInstall` method. That's what is uses as a cloud config.

Ideally, we should consolidate all those configurations and the
RunInstall should only take one argument which will be a struct that
holds all the information on how to perform the installation. This
refactoring will happen separately. This is just a quick fix for the
bug.

kairos-io/kairos#1460

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor

I amended the fix here: kairos-io/kairos-agent@a53d8a0

I tested it like this:

  • booted a kairos core image I built from kairos master (older releases produce some errors with the kairos-agent I've built from my branch)
  • I used scp to copy the custom kairos-agent binary inside the running VM
  • I compiled the mock provider with CGO_ENABLED=0 go build -ldflags="-extldflags=-static" -o agent-provider-dimitris and scpied that inside the VM too.
  • Created the directory /system/providers in the VM and moved the agent-provider-dimitris there.
  • Created a /oem/config.yaml file with this content:
#cloud-config
users:
  - name: kairos
    passwd: kairos

to be able to login after installation

  • Run ./kairos-agent install (NOTE: This is the custom binary I scpied, not the one in the PATH)

Installation went through and I can see this in the installed system:

kairos@localhost:~> cat /oem/90_custom.yaml 
#cloud-config
install:
  grub_options:
    saved_entry: registration
users:
- name: kairos
  passwd: kairos

So everything has worked except that there is still no /oem/grubenv. I suspect it's because I didn't set install.auto: true. I'll try that.

@jimmykarily
Copy link
Contributor

My fix is invalid. I was looking at the install.auto: true path which is the one that doesn't contact the provider at all. The install.auto: false path already merges the agent-provider's config back. I don't know what's going on.

In the meantime, setting install.auto: true in the cloud-config sent from the agent-provider might create some strange situations because up to that point, the value of that key is false (otherwise the provider wouldn't be called at all). After calling the provider, the value changes to true. An installation should either have auto: true or auto: false, not changing in the middle of the process. I don't think it's relevant to what we see in this issue but I thought I should bring it up in case this creates other issues.

@jimmykarily
Copy link
Contributor

GrubOptions hook is run here: https://github.com/kairos-io/kairos-agent/blob/5fe8eb8cc2617e96e609da282d1079a95ba06e0c/internal/agent/install.go#L345

but reboot happens some lines above: https://github.com/kairos-io/kairos-agent/blob/5fe8eb8cc2617e96e609da282d1079a95ba06e0c/internal/agent/install.go#L340

(when Reboot is true). This means, the system reboots without the hooks ever being run. I'll continue tomorrow with a fresh brain but I think that's the reason grubenv is not in place.

jimmykarily added a commit to kairos-io/kairos-agent that referenced this issue May 31, 2023
otherwise grub is not configured (among other things skipped)

Fixes kairos-io/kairos#1460

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor

This seems to work: kairos-io/kairos-agent#36

Of course, passing yet another config to the Run method makes it even more confusing. But given our plans to refactor this code, maybe this dirty fix is ok (?).

jimmykarily added a commit to kairos-io/kairos-agent that referenced this issue May 31, 2023
because the Lifecycle hook will do that

Fixes kairos-io/kairos#1460

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor

@Itxaka found the real reason it was failing and we adapted the PR. Will merge soon and prepare to include in the upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants