Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consecutive device flashes (UKI) with FDE can start failing #2511

Closed
Tracked by #2582
kreeuwijk opened this issue Apr 25, 2024 · 14 comments · Fixed by kairos-io/kairos-agent#350
Closed
Tracked by #2582

Consecutive device flashes (UKI) with FDE can start failing #2511

kreeuwijk opened this issue Apr 25, 2024 · 14 comments · Fixed by kairos-io/kairos-agent#350
Assignees
Labels
bug Something isn't working prio: high

Comments

@kreeuwijk
Copy link

Kairos version:

PRETTY_NAME="Ubuntu Noble Numbat (development branch)"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
KAIROS_PRETTY_NAME="kairos-core-ubuntu-24.04 v3.0.4-46-g10012f0"
KAIROS_ARTIFACT="kairos-ubuntu-24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_MODEL="generic"
KAIROS_FAMILY="ubuntu"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_ID="kairos"
KAIROS_VERSION="v3.0.4-46-g10012f0"
KAIROS_IMAGE_LABEL="24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_FLAVOR_RELEASE="24.04"
KAIROS_TARGETARCH="amd64"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_NAME="kairos-core-ubuntu-24.04"
KAIROS_ID_LIKE="kairos-core-ubuntu-24.04"
KAIROS_VERSION_ID="v3.0.4-46-g10012f0"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_FLAVOR="ubuntu"
KAIROS_VARIANT="core"
KAIROS_RELEASE="v3.0.4-46-g10012f0"

KAIROS_NAME="kairos-core-ubuntu"
KAIROS_VERSION="v3.0.6"
KAIROS_ID="ubuntu"
KAIROS_ID_LIKE="kairos-core-ubuntu"
KAIROS_VERSION_ID="v3.0.6"
KAIROS_PRETTY_NAME="kairos-core-ubuntu v3.0.6"
KAIROS_BUG_REPORT_URL="https://github.com/spectrocloud/CanvOS/issues"
KAIROS_HOME_URL="https://github.com/spectrocloud/CanvOS"
KAIROS_IMAGE_REPO="spectrocloud/CanvOS"
KAIROS_IMAGE_LABEL="latest"
KAIROS_GITHUB_REPO=""
KAIROS_VARIANT="ubuntu"
KAIROS_FLAVOR="ubuntu"
KAIROS_ARTIFACT="kairos-core-ubuntu-v3.0.6"PRETTY_NAME="Ubuntu Noble Numbat (development branch)"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
KAIROS_PRETTY_NAME="kairos-core-ubuntu-24.04 v3.0.4-46-g10012f0"
KAIROS_ARTIFACT="kairos-ubuntu-24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_MODEL="generic"
KAIROS_FAMILY="ubuntu"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_ID="kairos"
KAIROS_VERSION="v3.0.4-46-g10012f0"
KAIROS_IMAGE_LABEL="24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_FLAVOR_RELEASE="24.04"
KAIROS_TARGETARCH="amd64"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_NAME="kairos-core-ubuntu-24.04"
KAIROS_ID_LIKE="kairos-core-ubuntu-24.04"
KAIROS_VERSION_ID="v3.0.4-46-g10012f0"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-core-amd64-generic-v3.0.4-46-g10012f0"
KAIROS_FLAVOR="ubuntu"
KAIROS_VARIANT="core"
KAIROS_RELEASE="v3.0.4-46-g10012f0"

KAIROS_NAME="kairos-core-ubuntu"
KAIROS_VERSION="v3.0.6"
KAIROS_ID="ubuntu"
KAIROS_ID_LIKE="kairos-core-ubuntu"
KAIROS_VERSION_ID="v3.0.6"
KAIROS_PRETTY_NAME="kairos-core-ubuntu v3.0.6"
KAIROS_BUG_REPORT_URL="https://github.com/spectrocloud/CanvOS/issues"
KAIROS_HOME_URL="https://github.com/spectrocloud/CanvOS"
KAIROS_IMAGE_REPO="spectrocloud/CanvOS"
KAIROS_IMAGE_LABEL="latest"
KAIROS_GITHUB_REPO=""
KAIROS_VARIANT="ubuntu"
KAIROS_FLAVOR="ubuntu"
KAIROS_ARTIFACT="kairos-core-ubuntu-v3.0.6"

CPU architecture, OS, and Version:

Linux 0727e700-76d4-11e8-8be2-548351533800 6.8.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 20 00:40:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
When flashing a device - with a UKI image and Secure Boot + FDE enabled - for multiple consecutive times. the process starts failing on the mounting/unmounting of LUKS devices. Typically the failure looks like this:

INF Encrypting COS_OEM
New TPM2 token enrolled as key slot 1
INF Done encrypting COS_OEM
INF Encrypting COS_PERSISTENT
New TPM2 token enrolled as key slot 1
INF Done encrypting COS_PERSISTENT
Got luks UUID 85c39d0f-4067-5227-0334-f5eec606d9eb
  for partition nvme0n1p3
Unmounted Luks found at '/dev/nvme0n1p3'
INF Waiting for unlocked partition COS_OEM to appear
INF Partition found, continuing
INF Waiting for unlocked partition COS_PERSISTENT to appear
INF Partition found, continuing
could not mount: mount: /run/cos/oem: unknown filesystem type 'crypto_LUKS' .
            dmesg (1) may have more information after failed mount system call.
exit status 32

ERR could not close /dev/disk/by-label/COS_OEM: Device COS_OEM not found
ERR running uki encryption hooks: exit status 32

INF Unmounting disk partitions
1 error occurred:
             * exit status 32

When retrying another flash immediately after, sometimes this problem occurs:

INF Waiting for unlocked partition COS_OEM to appear
Got luks UUID 85c3940f-4867-5227-8334-f5eec606d9eb
  for partition nvme0n1p3
Device /dev/nvme0n1p3 seems to be mounted at /dev/mapper/nvme0n1p3, skipping
INF Waiting for unlocked partition COS_OEM to appear
INF Partition COS_OEM not found, waiting 7 seconds before retrying

INF Waiting for unlocked partition COS_OEM to appear
Got luks UUID 85c3940f-4867-5227-8334-f5eec606d9eb
  for partition nvme0n1p3
Device /dev/nvme0n1p3 seems to be mounted at /dev/mapper/nvme0n1p3, skipping
INF Waiting for unlocked partition COS_OEM to appear
INF Partition COS_OEM not found, waiting 8 seconds before retrying

INF Waiting for unlocked partition COS_OEM to appear
Got luks UUID 85c3940f-4867-5227-8334-f5eec606d9eb
  for partition nvme0n1p3
Device /dev/nvme0n1p3 seems to be mounted at /dev/mapper/nvme0n1p3, skipping
INF Waiting for unlocked partition COS_OEM to appear
INF Partition COS_OEM not found, waiting 9 seconds before retrying
Got luks UUID 85c3940f-4867-5227-8334-f5eec606d9eb
  for partition nvme0n1p3
Device /dev/nvme0n1p3 seems to be mounted at /dev/mapper/nvme0n1p3, skipping

ERR Partition COS_OEM not unlocked/found after 10 retries
ERR could not close /dev/disk/by-label/COS_OEM: Device /devdisk/by-label/COS_OEM is not active.

ERR running uki encryption hooks: partition COS_OEM not unlocked/found after 10 retries
INF Unmounting disk partitions
1 error occurred:
            * partition COS_OEM not unlocked/found after 10 retries

To Reproduce
Flash a real device multiple times with an image that performs Secure Boot & FDE

Expected behavior
Luks partitions get mounted/unmounted normally during device flash

Logs

Additional context
If I go into the BIOS and reset the TPM data, the next device flash tends to always succeed.
This might be coincidence but it has worked every time so far.

@kreeuwijk kreeuwijk added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Apr 25, 2024
@ci-robbot
Copy link
Collaborator

I apologize for the inconvenience. It seems that the issue you're experiencing with consecutive device flashes utilizing a UKI image with Secure Boot and FDE enabled is a valid concern. As a bot, I can't directly fix this issue, but I'll ensure that the problem is recognized as valid by labeling it as "triage". This means the issue is being looked into by our team.

@kreeuwijk
Copy link
Author

It also happened just now on the first flash, immediately after resetting the TPM. So it seems it is totally random when this happens.

@mudler mudler mentioned this issue May 2, 2024
33 tasks
@mudler
Copy link
Member

mudler commented May 6, 2024

could be that we are hitting TPM protections? https://ubuntu.com/core/docs/troubleshooting

@kreeuwijk
Copy link
Author

I verified this; the TPM unlocks normally every time. It also only affects the /etc/group file so I don't think it has anything to do with disk encryption.

@Itxaka Itxaka self-assigned this May 7, 2024
@Itxaka
Copy link
Member

Itxaka commented May 7, 2024

Seems to be that the OEM partition is not discovered properly, even after locking it.

I think this part is wrong:

INF Encrypting COS_OEM
New TPM2 token enrolled as key slot 1
INF Done encrypting COS_OEM
INF Encrypting COS_PERSISTENT
New TPM2 token enrolled as key slot 1
INF Done encrypting COS_PERSISTENT
Got luks UUID 85c39d0f-4067-5227-0334-f5eec606d9eb
  for partition nvme0n1p3
Unmounted Luks found at '/dev/nvme0n1p3'
INF Waiting for unlocked partition COS_OEM to appear
INF Partition found, continuing

As the oem is never found (nvme0n1p3 is persistent) so the check doesnt wait for it properly, then the mounting fails becuase its not there

could not mount: mount: /run/cos/oem: unknown filesystem type 'crypto_LUKS' .
            dmesg (1) may have more information after failed mount system call.
exit status 32

ERR could not close /dev/disk/by-label/COS_OEM: Device COS_OEM not found

So we may be missing a retry with a proper check that calls the unlock again to unlock the oem as well.

In the second case, issue seems similar bu the check works in that case, as oem is not available and calling kcrypt several times dont seem to work as it only shows the persistent one:

INF Waiting for unlocked partition COS_OEM to appear
Got luks UUID 85c3940f-4867-5227-8334-f5eec606d9eb
  for partition nvme0n1p3
Device /dev/nvme0n1p3 seems to be mounted at /dev/mapper/nvme0n1p3, skipping
INF Waiting for unlocked partition COS_OEM to appear
INF Partition COS_OEM not found, waiting 9 seconds before retrying

@jimmykarily
Copy link
Contributor

I build a UKI image locally with to try to reproduce on my Asus PN64:

earthly +base-image --VARIANT=core --FLAVOR=ubuntu --FLAVOR_RELEASE=24.04 --BASE_IMAGE=ubuntu:24.04  --MODEL=generic --FAMILY=ubuntu

I created a UKI image with enki but it doesn't boot:

photo_2024-05-13_17-17-32

I guess it's too big for the PN64 firmware 🤷 ?

@Itxaka
Copy link
Member

Itxaka commented May 16, 2024

I build a UKI image locally with to try to reproduce on my Asus PN64:

earthly +base-image --VARIANT=core --FLAVOR=ubuntu --FLAVOR_RELEASE=24.04 --BASE_IMAGE=ubuntu:24.04  --MODEL=generic --FAMILY=ubuntu

I created a UKI image with enki but it doesn't boot:

photo_2024-05-13_17-17-32

I guess it's too big for the PN64 firmware 🤷 ?

may be able to try with fedora which is usually smaller?

@kreeuwijk
Copy link
Author

If you uninstall the linux-firmware and linux-modules-extra packages, it will be a lot smaller

@jimmykarily
Copy link
Contributor

If you uninstall the linux-firmware and linux-modules-extra packages, it will be a lot smaller

I guess this fix? #2566

@kreeuwijk
Copy link
Author

The bug indeed is that sometimes the expected device LABEL is confused with the wrong partition.

  • Sometimes it looks for COS_PERSISTENT on /dev/nvme0n1p2
  • Sometimes it looks for COS_OEM on /dev/nvme0n1p3

When this happens, the device flash fails. I guess there's some randomization happening in the list of luks devices, sometimes causing this switchup.

@rishi-anand
Copy link

@jimmykarily Is there any update on this issue?

@jimmykarily
Copy link
Contributor

@jimmykarily Is there any update on this issue?

Not on my side. I'll try to give it a try with Ubuntu after applying this first: #2566 hoping the image will be small enough to allow me to boot it on my HW.

@mudler mudler mentioned this issue May 22, 2024
27 tasks
jimmykarily added a commit to kairos-io/kcrypt that referenced this issue May 23, 2024
because otherwise, sometimes the encrypted partition doesn't show up as
type: crypto_LUKS but as type: unknown making kcrypt skip it completely

Part of kairos-io/kairos#2511

(an additional seems to be needed in kairos-agent when locking the
partitions to fully fix the issue)

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
@jimmykarily
Copy link
Contributor

We found the issue. One PR created (kcrypt) one pending (kairos-agent).

@kreeuwijk
Copy link
Author

Great, well done!!

@Itxaka Itxaka removed unconfirmed triage Add this label to issues that should be triaged and prioretized in the next planning call labels May 23, 2024
Itxaka pushed a commit to kairos-io/kcrypt that referenced this issue May 28, 2024
because otherwise, sometimes the encrypted partition doesn't show up as
type: crypto_LUKS but as type: unknown making kcrypt skip it completely

Part of kairos-io/kairos#2511

(an additional seems to be needed in kairos-agent when locking the
partitions to fully fix the issue)

Signed-off-by: Dimitris Karakasilis <dimitris@karakasilis.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working prio: high
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants