METAL-1720: Add kernel file support for multi-architecture PXE boot#170
METAL-1720: Add kernel file support for multi-architecture PXE boot#170openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
Conversation
|
@honza: This pull request references METAL-1720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds kernel image support: new EnvInputs ( Changes
Sequence DiagramsequenceDiagram
participant Client
participant ImageProvider
participant ImageHandler
participant ImageFileSystem
Client->>ImageProvider: BuildImage(format=InitRD, arch)
ImageProvider->>ImageHandler: Discover images for arch
ImageHandler->>ImageFileSystem: Open image (kind decision)
ImageFileSystem->>ImageFileSystem: Select base or kernel file
ImageFileSystem-->>ImageHandler: Return image stream / kernel entry
ImageProvider->>ImageHandler: ServeKernel(arch)
ImageHandler->>ImageFileSystem: Lookup kernelFiles[arch]
ImageFileSystem-->>ImageHandler: Kernel path / URL
ImageHandler-->>ImageProvider: KernelURL
ImageProvider-->>Client: BuildImage response (KernelURL, ExtraKernelParams)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 6 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: honza The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold |
|
@honza: This pull request references METAL-1720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/imagehandler/imagefilesystem.go (1)
60-70:⚠️ Potential issue | 🟠 MajorPotential nil pointer dereference when base image is not found.
Both
getKernelandgetBaseImagecan returnnilif no matching image is found. Callingim.Init(baseImage)with anilbaseImage will cause a panic whenInsertIgnitionis invoked on the nil receiver.🐛 Proposed fix to add nil check
var baseImage baseFile if im.kernel { baseImage = f.getKernel(im.arch) } else { baseImage = f.getBaseImage(im.arch, im.initramfs) } + if baseImage == nil { + return nil, fs.ErrNotExist + } if err := im.Init(baseImage); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/imagehandler/imagefilesystem.go` around lines 60 - 70, The code calls im.Init(baseImage) without verifying baseImage is non-nil, risking a panic because f.getKernel or f.getBaseImage can return nil; update the logic around f.getKernel(f.getBaseImage) to check if baseImage == nil before calling im.Init, log an explanatory error via f.log.Error (include arch/initramfs/context), and return nil, error (or a new descriptive error) instead of proceeding to im.Init/InsertIgnition when no base image was found.
🧹 Nitpick comments (1)
pkg/imagehandler/imagehandler_test.go (1)
301-304: Consider using a descriptive string for imageKind in test errors.The error message uses
%dfor thekindfield, which outputs the integer value. While functional, it's less readable than a descriptive name.♻️ Optional: Add String() method for imageKind
In
imagehandler.go, you could add aString()method:func (k imageKind) String() string { switch k { case imageKindISO: return "ISO" case imageKindInitramfs: return "Initramfs" case imageKindKernel: return "Kernel" default: return fmt.Sprintf("Unknown(%d)", k) } }Then update test assertions to use
%sor%v:- t.Errorf("kind: expected %d but got %d", tc.kind, ii.kind) + t.Errorf("kind: expected %v but got %v", tc.kind, ii.kind)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/imagehandler/imagehandler_test.go` around lines 301 - 304, The test error prints the numeric imageKind value (ii.kind / tc.kind) which is hard to read; add a String() method on the imageKind type in imagehandler.go (handling imageKindISO, imageKindInitramfs, imageKindKernel and a default) and then update the test in imagehandler_test.go to format the values as strings (use %s or %v with the String method) in the t.Errorf call so failures show descriptive names instead of integers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pkg/imagehandler/imagefilesystem.go`:
- Around line 60-70: The code calls im.Init(baseImage) without verifying
baseImage is non-nil, risking a panic because f.getKernel or f.getBaseImage can
return nil; update the logic around f.getKernel(f.getBaseImage) to check if
baseImage == nil before calling im.Init, log an explanatory error via
f.log.Error (include arch/initramfs/context), and return nil, error (or a new
descriptive error) instead of proceeding to im.Init/InsertIgnition when no base
image was found.
---
Nitpick comments:
In `@pkg/imagehandler/imagehandler_test.go`:
- Around line 301-304: The test error prints the numeric imageKind value
(ii.kind / tc.kind) which is hard to read; add a String() method on the
imageKind type in imagehandler.go (handling imageKindISO, imageKindInitramfs,
imageKindKernel and a default) and then update the test in imagehandler_test.go
to format the values as strings (use %s or %v with the String method) in the
t.Errorf call so failures show descriptive names instead of integers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 47666b99-0d68-49e9-9f09-0ba23986daf9
📒 Files selected for processing (8)
cmd/static-server/main_test.gopkg/env/env.gopkg/imagehandler/basefile.gopkg/imagehandler/imagefile.gopkg/imagehandler/imagefilesystem.gopkg/imagehandler/imagehandler.gopkg/imagehandler/imagehandler_test.gopkg/imageprovider/rhcos.go
Enable PXE booting non-host architectures (e.g. aarch64 workers on an x86_64 cluster) by resolving two issues: 1. Add IRONIC_ROOTFS_URL to EnvInputs so the rootfs base URL is available to the image provider. 2. In BuildImage(), when building an initrd-format image for a non-host architecture, set ExtraKernelParams with an arch-specific rootfs URL. This overrides Ironic's global kernel_append_params which hardcodes the host architecture rootfs. BMO appends ExtraKernelParams after %default%, and since the last coreos.live.rootfs_url= on the kernel command line wins, the correct arch-specific rootfs is downloaded. Also resolve the Dockerfile merge conflict for the 4.22 rebase, adding the cli stage needed for the oc binary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@honza: This pull request references METAL-1720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/imagehandler/imagehandler_test.go`:
- Around line 686-706: The test hard-codes "aarch64" as the kernel-only
architecture which breaks on machines where env.HostArchitecture() == "aarch64";
instead pick an arch different from the host before creating the kernel-only
file and asserting HasImagesForArchitecture returns false (e.g., determine
kernelOnlyArch := if env.HostArchitecture() == "aarch64" then "x86_64" else
"aarch64"), create ipa_<kernelOnlyArch>.kernel, construct the ImageHandler via
NewImageHandler and call handler.HasImagesForArchitecture(kernelOnlyArch) to
assert false; update references to the hard-coded "aarch64" in this test to use
the computed kernelOnlyArch so the test behaves correctly regardless of host
architecture.
In `@pkg/imageprovider/rhcos.go`:
- Around line 115-128: For InitRD images (when data.Format ==
metal3.ImageFormatInitRD) treat an empty kernel URL from
ip.ImageHandler.ServeKernel(data.Architecture) as an error for non-host
architectures: call ServeKernel, if err != nil return it; if kernelURL == "" and
data.Architecture != env.HostArchitecture() return a descriptive error (do not
set generated.KernelURL); otherwise set generated.KernelURL = kernelURL and
continue with the existing arch-specific rootfs override logic. Ensure you
reference ServeKernel, ip.ImageHandler, data.Architecture,
env.HostArchitecture(), data.Format, metal3.ImageFormatInitRD, and
generated.KernelURL when implementing the check.
- Around line 134-140: The archSpecificURL function currently manipulates the
full URL string with filepath.Ext which can misplace the _arch suffix when
IRONIC_ROOTFS_URL contains queries/fragments; instead parse the URL with
net/url.Parse, operate only on u.Path using path.Ext and strings.TrimSuffix to
insert "_<arch>" before the path extension, assign the modified path back to
u.Path, and return u.String() so query and fragment components are preserved;
update references inside archSpecificURL to use url.Parse, u.Path, path.Ext, and
u.String().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f73f255-8e05-4fba-9524-87bec248cf38
📒 Files selected for processing (8)
cmd/static-server/main_test.gopkg/env/env.gopkg/imagehandler/basefile.gopkg/imagehandler/imagefile.gopkg/imagehandler/imagefilesystem.gopkg/imagehandler/imagehandler.gopkg/imagehandler/imagehandler_test.gopkg/imageprovider/rhcos.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/imagehandler/imagefilesystem.go
- pkg/env/env.go
|
/unhold |
|
@honza: This pull request references METAL-1720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/imagehandler/imagehandler.go`:
- Around line 147-157: The code currently adds only the literal DeployKernel
path to filePaths when ImageSharedDir is set, which prevents finding
arch-suffixed kernels in the kernel's directory; update the block that handles
deployPaths so that for each p you: 1) if filepath.Dir(p) !=
envInputs.ImageSharedDir, append filepath.Dir(p) to searchDirs so that the
directory is scanned for arch-specific variants; and 2) only append the literal
p to filePaths if the file actually exists (use os.Stat or equivalent) to avoid
registering a missing base path as a host kernel; keep existing behavior for
paths inside envInputs.ImageSharedDir.
- Around line 301-326: ServeKernel is inserting "kernel-<arch>" entries into the
same f.images/f.keys maps used by ServeImage which allows collisions with
user-provided keys; fix by giving kernel artifacts their own namespace: add new
maps (e.g. f.kernelImages map[string]*imageFile and f.kernelKeys
map[string]string) and change the ServeKernel code that currently references
f.images and f.keys to use f.kernelImages and f.kernelKeys instead (preserve the
imageFile struct and kernel flag), and update any lookup/resolve logic that
serves kernel URLs to consult the kernel maps so user images keyed like
"kernel-aarch64" no longer collide.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b020add0-8f46-4bad-89dd-3b9ba0143ed8
📒 Files selected for processing (3)
pkg/imagehandler/imagehandler.gopkg/imagehandler/imagehandler_test.gopkg/imageprovider/rhcos.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/imagehandler/imagehandler_test.go
- pkg/imageprovider/rhcos.go
| deployPaths := []string{envInputs.DeployISO, envInputs.DeployInitrd} | ||
| if envInputs.DeployKernel != "" { | ||
| deployPaths = append(deployPaths, envInputs.DeployKernel) | ||
| } | ||
|
|
||
| if envInputs.ImageSharedDir != "" { | ||
| searchDirs = append(searchDirs, envInputs.ImageSharedDir) | ||
|
|
||
| if filepath.Dir(envInputs.DeployISO) != envInputs.ImageSharedDir { | ||
| filePaths = append(filePaths, envInputs.DeployISO) | ||
| } | ||
| if filepath.Dir(envInputs.DeployInitrd) != envInputs.ImageSharedDir { | ||
| filePaths = append(filePaths, envInputs.DeployInitrd) | ||
| for _, p := range deployPaths { | ||
| if filepath.Dir(p) != envInputs.ImageSharedDir { | ||
| filePaths = append(filePaths, p) | ||
| } |
There was a problem hiding this comment.
Arch-suffixed kernels outside the shared dir still won't be discovered.
If DEPLOY_KERNEL points outside ImageSharedDir, this branch only appends the literal base path. A setup like /var/lib/ipa.kernel plus /var/lib/ipa_aarch64.kernel never scans /var/lib, so the real arch-specific kernel is missed and the missing base path is registered as a host kernel instead. Please scan the external directory here, and only add the direct path when it actually exists.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/imagehandler/imagehandler.go` around lines 147 - 157, The code currently
adds only the literal DeployKernel path to filePaths when ImageSharedDir is set,
which prevents finding arch-suffixed kernels in the kernel's directory; update
the block that handles deployPaths so that for each p you: 1) if filepath.Dir(p)
!= envInputs.ImageSharedDir, append filepath.Dir(p) to searchDirs so that the
directory is scanned for arch-specific variants; and 2) only append the literal
p to filePaths if the file actually exists (use os.Stat or equivalent) to avoid
registering a missing base path as a host kernel; keep existing behavior for
paths inside envInputs.ImageSharedDir.
| key := fmt.Sprintf("kernel-%s", arch) | ||
|
|
||
| f.mu.Lock() | ||
| defer f.mu.Unlock() | ||
|
|
||
| if img, exists := f.images[key]; exists { | ||
| p, err := url.Parse(fmt.Sprintf("/%s", img.name)) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| return f.baseURL.ResolveReference(p).String(), nil | ||
| } | ||
|
|
||
| name := key | ||
| p, err := url.Parse(fmt.Sprintf("/%s", name)) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
|
|
||
| f.keys[name] = key | ||
| f.images[key] = &imageFile{ | ||
| name: name, | ||
| arch: arch, | ||
| size: size, | ||
| kernel: true, | ||
| } |
There was a problem hiding this comment.
ServeKernel can collide with normal image keys.
These kernel-<arch> entries go into the same f.images/f.keys maps that ServeImage uses for caller-supplied keys. If a normal image is ever keyed as kernel-aarch64 (or similar), one entry aliases the other and the wrong artifact gets served. Kernel records need their own map or a truly separate namespace.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/imagehandler/imagehandler.go` around lines 301 - 326, ServeKernel is
inserting "kernel-<arch>" entries into the same f.images/f.keys maps used by
ServeImage which allows collisions with user-provided keys; fix by giving kernel
artifacts their own namespace: add new maps (e.g. f.kernelImages
map[string]*imageFile and f.kernelKeys map[string]string) and change the
ServeKernel code that currently references f.images and f.keys to use
f.kernelImages and f.kernelKeys instead (preserve the imageFile struct and
kernel flag), and update any lookup/resolve logic that serves kernel URLs to
consult the kernel maps so user images keyed like "kernel-aarch64" no longer
collide.
|
/lgtm |
|
/test e2e-metal-ipi-ovn-ipv6 |
|
/verified later @sgoveas |
|
@sgoveas: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@honza: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is the image-customization-controller counterpart to the BMO commit "Add multi-architecture PXE boot support" (a7fd6912f). BMO now selects a per-architecture kernel URL from GeneratedImage.KernelURL when available, falling back to the static DEPLOY_KERNEL_URL only when the image provider doesn't supply one. This commit makes the image-customization-controller populate that field.
Previously, the controller never set KernelURL in GeneratedImage. For PXE/InitRD boot, Ironic would fall back to BMO's single static DEPLOY_KERNEL_URL environment variable -- a single URL for all architectures. This breaks multi-arch PXE boot because different architectures need different kernel binaries.
Changes:
Add optional DEPLOY_KERNEL environment variable to EnvInputs. Not required since kernel files are only needed for InitRD/PXE boot.
Add baseKernel type (basefile.go) that serves kernel files unmodified via os.Open -- no ignition injection is needed for kernel binaries.
Extend osImage from a boolean iso flag to a kind enum (ISO, initramfs, kernel) to cleanly distinguish all three image types.
Add kernel file discovery to imagehandler using the same _ naming convention as ISO and initramfs files (e.g. ipa_aarch64.kernel).
Add ServeKernel(arch) method to ImageHandler interface. Returns a serving URL for the kernel file, or empty string if no kernel is available for the architecture.
Update BuildImage in rhcos.go to call ServeKernel for InitRD format requests and populate GeneratedImage.KernelURL.
Add tests for kernel pattern matching, ServeKernel URL generation, idempotency, architecture fallback, and verification that kernel presence doesn't affect HasImagesForArchitecture.