Skip to content

Fedora Lima image gets stuck in reboot loop if mirror is unavailable #4440

@ascopes

Description

@ascopes

Description

@jandubois suggested I raise an issue here as well with regards to runfinch/finch#1632

It appears that if the Fedora mirrors are unavailable or inaccessible, dnf needs-restarting will return a non-zero exit code which the cloudinit scripts determine to be the same as needing a reboot.

This results in no visible logging (that I can see) and the VM getting stuck in a reboot loop, chewing up significant host resources and being very difficult to debug.

From the original issue, this was my analysis:


When running using a Corporate Proxy that has its own SSL certificates, bringing up a Finch VM is problematic. Right now, upon first boot, we observe that finch vm init just hangs for about 15 minutes and then crashes.

Upon pulling all of this configuration and code to pieces, I found that the problem lies within the ISO that is downloaded. The script causing us problems is the following:

#!/bin/sh

# SPDX-FileCopyrightText: Copyright The Lima Authors
# SPDX-License-Identifier: Apache-2.0

set -eux

# Check if cloud-init forgot to reboot_if_required
# (only implemented for apt at the moment, not dnf)

if command -v dnf >/dev/null 2>&1; then
	# dnf-utils needs to be installed, for needs-restarting
	if dnf -h needs-restarting >/dev/null 2>&1; then
		# needs-restarting returns "false" if needed (!)
		if ! dnf needs-restarting -r >/dev/null 2>&1; then
			systemctl reboot
		fi
	fi
fi

Specifically, take not of the if ! dnf needs-restarting -r >/dev/null 2>&1; then systemctl reboot. Whilst it is true that dnf needs-restarting will return a non-zero exit code if we need to reboot, it also returns a non-zero exit code if it failed to complete.

It turns out that dnf needs-restarting dials out to the Fedora repository mirrors... under a corporate proxy that operates on L3/L4 (e.g. as part of a ZTNA), this won't work. You'll just get the following output (which is somewhat unhelpfully suppressed and sent to /dev/null here):

└─[127] <> docker run --rm -it fedora                               
[root@97829c00283b /]# dnf needs-restarting
Updating and loading repositories:
 Fedora 42 - aarch64 - Updates                                                                                             ???% [  <=>             ] |   0.0   B/s |   0.0   B |  00m01s
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
>>> Curl error (60): SSL peer certificate or SSH remote key was not OK for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f42&arch=aarch64 [SSL certificate problem: u
...

This then exits with a non-zero exit code.

This means if you have no side-loaded CA certificates, finch vm init will get stuck in a loop of repeatedly restarting the VM every 5 seconds or so, while providing no output of what the issue is, since everything is sent to /dev/null.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingguest/fedoraGuest: Fedora

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions