Skip to content

Reintroduce RHCOS Stream Diff Support with efficiency improvements#753

Open
sdodson wants to merge 5 commits intoopenshift:mainfrom
sdodson:rhcos-stream-changelogs
Open

Reintroduce RHCOS Stream Diff Support with efficiency improvements#753
sdodson wants to merge 5 commits intoopenshift:mainfrom
sdodson:rhcos-stream-changelogs

Conversation

@sdodson
Copy link
Copy Markdown
Member

@sdodson sdodson commented Apr 8, 2026

Summary

  • Concurrency cap: Introduces a process-wide semaphore (buffered channel, capacity 10) that gates all oc adm release info invocations using --rpmdb or --rpmdb-diff. These commands pull image layers and build RPM databases, making them significantly heavier than normal oc calls. When the limit is reached callers receive an explicit error rather than silently queuing, keeping groupcache error-caching semantics correct (errors are not cached, so callers can retry).
  • Error wrapping: Switches RpmListForStream, RpmDiff, and RpmDiffForStream from %v to %w so the full error chain (including the semaphore-exhaustion error) is preserved for callers using errors.Is/errors.As.

Performance

  • Extensions metadata caching: Caches extracted extensions.json files in /tmp/rpmdb/extensions-<sha256> keyed by the image digest. This avoids repeated oc image extract calls for the same extensions image across different requests or release versions, significantly improving response time for subsequent requests.

Test plan

  • Verify the binary builds cleanly: go build ./...
  • Confirm RPM list and diff pages render correctly on a dev release-controller instance
  • Confirm that requests beyond the concurrency limit return a clear error message rather than hanging
  • Verify extensions metadata is cached and reused across requests

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Query RPM lists/diffs per stream, lookup component image references, and discover machine OS streams with display names and titles
    • New standalone changelog-preview command to render changelogs locally
  • Caching

    • Expanded cache coverage for stream/component and machine-OS queries with stricter key validation and empty-result handling
  • Performance

    • Process-wide concurrency limiter for heavy release operations to fail fast when saturated
  • Changelog / Rendering

    • Improved node-image section generation and added Dual RHCOS (RHCOS 10) support

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Add a semaphore-protected wrapper for rpmdb-heavy oc adm release info calls; extend ReleaseInfo API and cache keys for stream/component RPM data and image resolution; implement machine-OS stream discovery and node-image rendering helpers; update changelog preview CLI and changelog HTTP handling to use the new flows.

Changes

Cohort / File(s) Summary
Concurrency & guarded oc invocation
pkg/release-controller/release_info.go
Introduce maxConcurrentRpmdbOCCalls and rpmdbOCSlots semaphore plus ocCmdRpmdb(...) wrapper to limit concurrent rpmdb oc calls and return a clear error when saturated; switch rpmdb-backed calls to use the guarded invocation.
ReleaseInfo API, caching keys & rpmdb stream implementations
pkg/release-controller/release_info.go
Extend ReleaseInfo and CachingReleaseInfo with RpmListForStream, RpmDiffForStream, ImageReferenceForComponent, ListMachineOSStreams; add cache key kinds (rpmdiffstream, rpmliststream, imagefor, machineosstreams) and stricter key validation; implement ExecReleaseInfo methods to resolve image-for, run guarded oc rpmdb invocations, extract extensions image digests, load/write /tmp/rpmdb/extensions-<sha256>, and improve error wrapping. Also update RpmList/RpmDiff to use ocCmdRpmdb(...).
Machine-OS stream discovery & formatting
pkg/release-controller/machine_os_tags.go, pkg/release-controller/machine_os_tags_test.go
Add MachineOSStreamInfo, ListMachineOSStreams(releaseImage), parsing logic for references.spec.tags to find *-extensions/coreos streams, extract optional display names from io.openshift.build.version-display-names, sort/prioritize tags, and MachineOSTitle; include unit tests.
Node image rendering & rhcos changes
pkg/rhcos/node_image.go, pkg/rhcos/rhcos.go, pkg/rhcos/rhcos_test.go
Add NodeImageSectionMarkdown that chooses single-stream or multi-stream (dual RHCOS) rendering, introduce CoreOS stream types and helpers, add RHCOS 10 support and regexes, refactor JSON enrichment, update markdown transformation for multiple CoreOS entries, and add tests.
Changelog CLI & API adjustments
hack/changelog-preview/main.go, cmd/release-controller-api/http_changelog.go
Add a changelog-preview CLI that renders changelog + optional node info without a cluster; modify API changelog handler to return early on JSON and to request node info when #node-image-info or discovered machine-OS streams exist; simplify node-info gathering to use NodeImageSectionMarkdown.
Semver helper & tests
pkg/release-controller/semver.go, pkg/release-controller/semver_dual_rhcos_test.go
Add ReleaseTagIsDualRHCOS to detect dual-RHCOS tags (>= 4.21) and associated unit tests.
Misc
.gitignore
Ignore hack/changelog-preview/sample-changelog.md generated output.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Exec as ExecReleaseInfo
    participant oc as "oc adm release info"
    participant Cache
    participant FS as Filesystem
    rect rgba(200,230,201,0.5)
      Client->>Exec: Request (ImageReferenceForComponent / RpmListForStream / RpmDiffForStream / ListMachineOSStreams)
      Exec->>Cache: Check cached key
      Cache-->>Exec: Hit / Miss
    end
    alt cache miss & rpmdb operation
      rect rgba(187,222,251,0.5)
        Exec->>oc: ocCmdRpmdb(...) (acquire semaphore slot)
        oc-->>Exec: stdout (image/pull-spec/rpmdb JSON)
        Exec->>FS: Read /tmp/rpmdb/extensions-<sha256> (best-effort)
        alt extensions missing
          Exec->>oc: oc image extract ... (extract extensions.json)
          oc-->>Exec: extensions.json
          Exec->>FS: Atomic write /tmp/rpmdb/extensions-<sha256>
        end
        Exec->>Cache: Store parsed result
        Exec-->>Client: Return result
      end
    end
    alt semaphore saturated
      Exec-->>Client: Error "too many concurrent rpmdb oc adm release info operations"
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through slots both snug and bright,
I guard each heavy oc call through the night,
I fetch pull-specs, diffs, and lists with glee,
Cache a crumb of JSON, set it free,
A little rabbit keeps rpmdb polite.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Reintroduce RHCOS Stream Diff Support with efficiency improvements' directly and clearly describes the main objective: reintroducing RHCOS stream diff functionality with performance enhancements, which aligns with the core changes introducing machine-OS stream discovery, concurrency limiting, and extensions metadata caching.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign hoxhaeris for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/release-controller/release_info.go (1)

63-72: Consider adding a shutdown mechanism for the background goroutine.

The goroutine starts unconditionally at package init and runs forever with no way to stop it. While this works fine in containerized production (process termination handles cleanup), it can cause goroutine leaks in tests and makes graceful shutdown impossible.

💡 Optional: Add cancellation support
+var (
+	pruneCtx    context.Context
+	pruneCancel context.CancelFunc
+)
+
 func init() {
+	pruneCtx, pruneCancel = context.WithCancel(context.Background())
 	go func() {
 		pruneRpmdbCache()
 		ticker := time.NewTicker(24 * time.Hour)
 		defer ticker.Stop()
-		for range ticker.C {
-			pruneRpmdbCache()
+		for {
+			select {
+			case <-ticker.C:
+				pruneRpmdbCache()
+			case <-pruneCtx.Done():
+				return
+			}
 		}
 	}()
 }
+
+// StopCachePruner stops the background cache pruning goroutine.
+func StopCachePruner() {
+	if pruneCancel != nil {
+		pruneCancel()
+	}
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 63 - 72, The background
goroutine started in init() runs forever calling pruneRpmdbCache() and needs a
shutdown mechanism; modify init() and the package to create a cancellable
context or stop channel (e.g., a package-level context.WithCancel or stopChan),
have the goroutine listen for ctx.Done()/stop signal in its select loop (stop
the ticker and return when signaled), and expose a package-level Shutdown/Stop
function that calls cancel() so tests and graceful shutdowns can stop the
goroutine; ensure ticker.Stop() is still called on exit and that
pruneRpmdbCache() behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/release_info.go`:
- Around line 74-87: The function pruneRpmdbCache currently checks file
modification time via info.ModTime() but the comment incorrectly says "accessed
in 30 days"; update the comment on pruneRpmdbCache (referencing rpmdbCacheDir
and the WalkDir callback using info.ModTime()) to state that it removes files
not modified in 30 days (or otherwise reflect that modification time is used) so
the comment matches the implementation.

---

Nitpick comments:
In `@pkg/release-controller/release_info.go`:
- Around line 63-72: The background goroutine started in init() runs forever
calling pruneRpmdbCache() and needs a shutdown mechanism; modify init() and the
package to create a cancellable context or stop channel (e.g., a package-level
context.WithCancel or stopChan), have the goroutine listen for ctx.Done()/stop
signal in its select loop (stop the ticker and return when signaled), and expose
a package-level Shutdown/Stop function that calls cancel() so tests and graceful
shutdowns can stop the goroutine; ensure ticker.Stop() is still called on exit
and that pruneRpmdbCache() behavior is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bebf5ec-cf5a-49f5-9748-afb7b1890e96

📥 Commits

Reviewing files that changed from the base of the PR and between 8fd3465 and 60baf4e.

📒 Files selected for processing (1)
  • pkg/release-controller/release_info.go

@sdodson sdodson force-pushed the rhcos-stream-changelogs branch from 60baf4e to c27fc2e Compare April 8, 2026 20:34
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2026
@sdodson sdodson force-pushed the rhcos-stream-changelogs branch from c27fc2e to 782d798 Compare April 8, 2026 20:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/release-controller/release_info.go (1)

399-406: Consider using a sentinel error for rpmdb saturation if selective error matching is needed.

The current approach creates a fresh fmt.Errorf on each saturation, which prevents callers from using errors.Is() or errors.As() to programmatically detect this specific condition. However, there are currently no callers attempting such matching; all error handling is simple if err != nil checks.

If future callers need to distinguish "rpmdb limit reached" from other failures, export a sentinel error and wrap it with %w:

🔧 Optional improvement
+var ErrRpmdbConcurrencyLimit = errors.New("too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations")
+
 func ocCmdRpmdb(args ...string) ([]byte, []byte, error) {
 	select {
 	case rpmdbOCSlots <- struct{}{}:
 	default:
-		return nil, nil, fmt.Errorf(
-			"too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations (limit %d)",
-			maxConcurrentRpmdbOCCalls,
-		)
+		return nil, nil, fmt.Errorf("%w (limit %d)", ErrRpmdbConcurrencyLimit, maxConcurrentRpmdbOCCalls)
 	}

This enables future callers to use errors.Is(err, ErrRpmdbConcurrencyLimit) if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 399 - 406, Create an
exported sentinel error (e.g., ErrRpmdbConcurrencyLimit := errors.New("rpmdb
concurrency limit reached")) and change ocCmdRpmdb so the saturation return
wraps that sentinel (use fmt.Errorf("%w: too many concurrent ... (limit %d)",
ErrRpmdbConcurrencyLimit, maxConcurrentRpmdbOCCalls)) instead of creating a
fresh fmt.Errorf; this preserves the existing message while allowing callers to
detect the condition with errors.Is(err, ErrRpmdbConcurrencyLimit).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/release_info.go`:
- Around line 103-107: The switch-case handling malformed cache keys (cases
"rpmdiffstream", "rpmliststream", "imagefor", "machineosstreams" in
release_info.go) currently checks for embedded NULs with strings.Contains after
strings.Split(key, "\x00"), which is ineffective; replace those contains checks
with exact-arity checks (use len(parts) != expectedCount) immediately after
splitting to reject keys with extra or missing fields (e.g., for "rpmdiffstream"
require len(parts) == 4, update the other cases to their correct expected
counts), and remove the redundant strings.Contains(..., "\x00") conditions so
the code consistently rejects malformed keys by exact arity.
- Around line 654-660: The extTag defaulting logic doesn't handle an empty
extTag (callers may pass ""), so set extTag to the canonical default before
special-casing CentOS Stream: if extTag == "" assign it from coreosTagName (or
from extensionsTagName fallback as appropriate), then apply the existing
centos-stream check and call ImageReferenceForComponent(releaseImage, extTag);
update the extTag variable initialization near its use in release_info.go so
that extTag is never empty when passed to r.ImageReferenceForComponent.

---

Nitpick comments:
In `@pkg/release-controller/release_info.go`:
- Around line 399-406: Create an exported sentinel error (e.g.,
ErrRpmdbConcurrencyLimit := errors.New("rpmdb concurrency limit reached")) and
change ocCmdRpmdb so the saturation return wraps that sentinel (use
fmt.Errorf("%w: too many concurrent ... (limit %d)", ErrRpmdbConcurrencyLimit,
maxConcurrentRpmdbOCCalls)) instead of creating a fresh fmt.Errorf; this
preserves the existing message while allowing callers to detect the condition
with errors.Is(err, ErrRpmdbConcurrencyLimit).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae986f49-0fd0-4e29-8781-0c4a3f1be700

📥 Commits

Reviewing files that changed from the base of the PR and between 60baf4e and 782d798.

📒 Files selected for processing (1)
  • pkg/release-controller/release_info.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/release-controller/release_info.go (1)

103-107: ⚠️ Potential issue | 🟡 Minor

Reject malformed cache keys by exact arity.

This still has the same bug from the earlier review: after strings.Split(key, "\x00"), the strings.Contains(..., "\x00") branches can never fire, and the current < N checks still accept extra fields. Please switch these cases to len(parts) != expected and drop the dead NUL checks.

Also applies to: 134-137, 152-155, 160-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 103 - 107, The switch
branches (e.g., case "rpmdiffstream") currently check len(parts) < N and then
test parts for "\x00" which is dead after strings.Split; change each of these to
reject keys with the exact expected arity by using len(parts) != expectedCount
(e.g., for rpmdiffstream require len(parts) == 4) and remove the
strings.Contains(..., "\x00") checks; apply the same change to the other similar
case blocks referenced in the review so every cache-key case enforces exact
arity and drops the NUL containment tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/release_info.go`:
- Around line 52-59: Create an exported sentinel error (e.g.,
ErrRpmdbOCCallsSaturated) and replace the ad-hoc fmt.Errorf(...) returned when
the rpmdbOCSlots channel is full with a wrapped error using %w (so callers can
reliably detect the saturation via errors.Is/As). Update the declaration near
rpmdbOCSlots/maxConcurrentRpmdbOCCalls and change the return that currently
constructs the unique fmt.Errorf to return fmt.Errorf("...: %w",
ErrRpmdbOCCallsSaturated) (or similar), keeping the original message but
wrapping the exported sentinel.

---

Duplicate comments:
In `@pkg/release-controller/release_info.go`:
- Around line 103-107: The switch branches (e.g., case "rpmdiffstream")
currently check len(parts) < N and then test parts for "\x00" which is dead
after strings.Split; change each of these to reject keys with the exact expected
arity by using len(parts) != expectedCount (e.g., for rpmdiffstream require
len(parts) == 4) and remove the strings.Contains(..., "\x00") checks; apply the
same change to the other similar case blocks referenced in the review so every
cache-key case enforces exact arity and drops the NUL containment tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: be33a834-1646-415c-896c-2d59bfcb4e03

📥 Commits

Reviewing files that changed from the base of the PR and between 782d798 and 4407df8.

📒 Files selected for processing (1)
  • pkg/release-controller/release_info.go

Comment on lines +52 to 59
// maxConcurrentRpmdbOCCalls limits parallel `oc adm release info` invocations that use
// --rpmdb / --rpmdb-diff (heavy image pulls and RPM work). Additional callers get a clear error.
const maxConcurrentRpmdbOCCalls = 10

var (
ocPath = ""
ocPath = ""
rpmdbOCSlots = make(chan struct{}, maxConcurrentRpmdbOCCalls)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Go, if a package returns a new fmt.Errorf("same text")value on each call, can callers in another package reliably detect that condition witherrors.Is/errors.As, or do they need an exported sentinel error or custom error type?

💡 Result:

No. If a package returns a freshly-constructed fmt.Errorf("same text") on each call (and it does not wrap an underlying error with %w), callers in another package cannot reliably detect “that specific condition” with errors.Is/errors.As.

  • errors.Is(err, target) matches by walking the unwrap chain and checking whether any error is equal to target (or implements an Is(error) bool method). If you create a new error value each time, it won’t be == to any previously-held “target” error value, even if the text is identical. [1]
  • The standard docs explicitly note that creating errors with the same text yields distinct values; matching by value requires reusing a sentinel. [1]
  • errors.As only helps if there’s a detectable type in the chain (or an As(any) bool method). The concrete type returned by fmt.Errorf("...") in this case is not something callers can name/use as a stable API, so As won’t give you a reliable semantic discriminator. [1]

If callers need to branch on that condition, you should expose either:

  1. an exported sentinel error (and return/wrap it, typically fmt.Errorf("context: %w", ErrSomething) so callers use errors.Is), or
  2. an exported custom error type (so callers can use errors.As), optionally with an Is method for matching behavior. [1][2]

Sources: [1] Go errors package docs (Is, and note about distinct error values / sentinels) (pkg.go.dev), [2] Go blog on Go 1.13 error wrapping and API patterns (sentinels/types and errors.Is) (go.dev)

Citations:


🏁 Script executed:

# First, find and examine the release_info.go file
fd -type f -name "release_info.go"

Repository: openshift/release-controller

Length of output: 1896


🏁 Script executed:

# Once found, read lines around 52-59 and 403-406 to verify the issue
cat -n pkg/release-controller/release_info.go | sed -n '45,70p'

Repository: openshift/release-controller

Length of output: 1909


🏁 Script executed:

# Also check the saturation error creation around lines 403-406
cat -n pkg/release-controller/release_info.go | sed -n '395,415p'

Repository: openshift/release-controller

Length of output: 836


🏁 Script executed:

# Check if there are wrapped errors with %w as mentioned in the review comment
rg -n "%w" pkg/release-controller/release_info.go

Repository: openshift/release-controller

Length of output: 922


Export a stable error sentinel for the saturation condition.

Lines 403-406 create a fresh fmt.Errorf(...) on each invocation. Unlike the downstream errors wrapped with %w elsewhere in this file, this saturation error cannot be reliably detected by callers using errors.Is or errors.As because each call produces a distinct error value. Callers outside this package cannot safely classify this condition without string matching. Add an exported sentinel error and wrap it with %w.

🔧 Proposed fix
 var (
-	ocPath       = ""
-	rpmdbOCSlots = make(chan struct{}, maxConcurrentRpmdbOCCalls)
+	ocPath                      = ""
+	rpmdbOCSlots                = make(chan struct{}, maxConcurrentRpmdbOCCalls)
+	ErrTooManyConcurrentRpmdbOC = stdErrors.New("too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations")
 )
@@
 	default:
-		return nil, nil, fmt.Errorf(
-			"too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations (limit %d)",
-			maxConcurrentRpmdbOCCalls,
-		)
+		return nil, nil, fmt.Errorf("%w (limit %d)", ErrTooManyConcurrentRpmdbOC, maxConcurrentRpmdbOCCalls)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 52 - 59, Create an
exported sentinel error (e.g., ErrRpmdbOCCallsSaturated) and replace the ad-hoc
fmt.Errorf(...) returned when the rpmdbOCSlots channel is full with a wrapped
error using %w (so callers can reliably detect the saturation via errors.Is/As).
Update the declaration near rpmdbOCSlots/maxConcurrentRpmdbOCCalls and change
the return that currently constructs the unique fmt.Errorf to return
fmt.Errorf("...: %w", ErrRpmdbOCCallsSaturated) (or similar), keeping the
original message but wrapping the exported sentinel.

@sdodson sdodson changed the title release-controller: rpmdb concurrency cap, cache cleanup, and error wrapping Reintroduce RHCOS Stream Diff Support with concurrency cap Apr 9, 2026
@sdodson sdodson changed the title Reintroduce RHCOS Stream Diff Support with concurrency cap Reintroduce RHCOS Stream Diff Support with efficiency improvements Apr 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/release-controller/release_info.go (2)

103-107: ⚠️ Potential issue | 🟡 Minor

Reject malformed cache keys by exact field count.

After strings.Split(key, "\x00"), the strings.Contains(..., "\x00") checks are dead. A key with extra separators still passes the current < N guards, and the trailing fields are silently ignored. Require exact arity instead.

🔧 Proposed fix
 		case "rpmdiffstream":
-			if len(parts) < 4 {
+			if len(parts) != 4 {
 				s, err = "", fmt.Errorf("invalid rpmdiffstream key")
-			} else if strings.Contains(parts[1], "\x00") || strings.Contains(parts[2], "\x00") || strings.Contains(parts[3], "\x00") {
-				s, err = "", fmt.Errorf("invalid from/to/component")
 			} else {
@@
 		case "rpmliststream":
-			if len(parts) < 4 {
+			if len(parts) != 4 {
 				s, err = "", fmt.Errorf("invalid rpmliststream key")
-			} else if strings.Contains(parts[1], "\x00") || strings.Contains(parts[2], "\x00") || strings.Contains(parts[3], "\x00") {
-				s, err = "", fmt.Errorf("invalid release/tag")
 			} else {
@@
 		case "imagefor":
-			if len(parts) < 3 {
+			if len(parts) != 3 {
 				s, err = "", fmt.Errorf("invalid imagefor key")
-			} else if strings.Contains(parts[1], "\x00") || strings.Contains(parts[2], "\x00") {
-				s, err = "", fmt.Errorf("invalid release/component")
 			} else {
@@
 		case "machineosstreams":
-			if len(parts) < 2 {
+			if len(parts) != 2 {
 				s, err = "", fmt.Errorf("invalid machineosstreams key")
-			} else if strings.Contains(parts[1], "\x00") {
-				s, err = "", fmt.Errorf("invalid release")
 			} else {

Also applies to: 134-137, 152-155, 160-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 103 - 107, The
rpmdiffstream cache-key parsing currently allows keys with extra NUL separators
because it only checks for minimum field count and then tests for NUL in fields;
change the logic to require an exact number of fields after strings.Split(key,
"\x00") (e.g., len(parts) == 4 for the "rpmdiffstream" case) and reject any key
with a different arity, removing the redundant strings.Contains(..., "\x00")
checks; apply the same exact-arity check fix to the other similar branches
referenced (around the cases at the other locations corresponding to lines
~134-137, ~152-155, ~160-163) so each case validates len(parts) == expectedCount
and returns an error otherwise.

397-406: ⚠️ Potential issue | 🟠 Major

Export a stable saturation sentinel.

Line 403 returns a new error value on every saturation. That means the new %w wrapping in RpmList, RpmDiff, and RpmDiffForStream still does not let callers detect this condition via errors.Is, which is the main contract this PR is trying to preserve.

🔧 Proposed fix
 var (
-	ocPath       = ""
-	rpmdbOCSlots = make(chan struct{}, maxConcurrentRpmdbOCCalls)
+	ocPath                   = ""
+	rpmdbOCSlots             = make(chan struct{}, maxConcurrentRpmdbOCCalls)
+	ErrRpmdbOCCallsSaturated = stdErrors.New("too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations")
 )
@@
 	default:
-		return nil, nil, fmt.Errorf(
-			"too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations (limit %d)",
-			maxConcurrentRpmdbOCCalls,
-		)
+		return nil, nil, fmt.Errorf("%w (limit %d)", ErrRpmdbOCCallsSaturated, maxConcurrentRpmdbOCCalls)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 397 - 406, The
saturation branch in ocCmdRpmdb creates a new error on each call, preventing
callers from detecting saturation via errors.Is; declare and export a
package-level sentinel error (e.g., ErrTooManyConcurrentRpmdbCalls or
ErrRpmdbSaturated) using errors.New with the same message, and return that
sentinel (or a wrapped form with %w) instead of building a fresh fmt.Errorf each
time; update ocCmdRpmdb's default case to return nil, nil, sentinelError and
ensure callers that wrap it use errors.Wrapf/fmt.Errorf("%w", ...) so errors.Is
works.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/release_info.go`:
- Around line 673-703: The cache write is racy: avoid os.WriteFile(cachedPath,
...) and instead write the downloaded extensions bytes to a temporary file in
the same directory (e.g. cachedPath+".tmp" or use ioutil.TempFile), fsync the
file, close it, then atomically replace the cache via os.Rename(tempPath,
cachedPath); update the code paths around cachedPath and the Write step (where
extensions is written) to use this approach; additionally make json.Unmarshal
failures when reading cachedPath (the cachedPath read + json.Unmarshal block
used by RpmListForStream) treated as a cache miss (delete or ignore the bad
cache and proceed to ocCmdExt extraction) so a partial/corrupt file does not
cause permanent failures.

---

Duplicate comments:
In `@pkg/release-controller/release_info.go`:
- Around line 103-107: The rpmdiffstream cache-key parsing currently allows keys
with extra NUL separators because it only checks for minimum field count and
then tests for NUL in fields; change the logic to require an exact number of
fields after strings.Split(key, "\x00") (e.g., len(parts) == 4 for the
"rpmdiffstream" case) and reject any key with a different arity, removing the
redundant strings.Contains(..., "\x00") checks; apply the same exact-arity check
fix to the other similar branches referenced (around the cases at the other
locations corresponding to lines ~134-137, ~152-155, ~160-163) so each case
validates len(parts) == expectedCount and returns an error otherwise.
- Around line 397-406: The saturation branch in ocCmdRpmdb creates a new error
on each call, preventing callers from detecting saturation via errors.Is;
declare and export a package-level sentinel error (e.g.,
ErrTooManyConcurrentRpmdbCalls or ErrRpmdbSaturated) using errors.New with the
same message, and return that sentinel (or a wrapped form with %w) instead of
building a fresh fmt.Errorf each time; update ocCmdRpmdb's default case to
return nil, nil, sentinelError and ensure callers that wrap it use
errors.Wrapf/fmt.Errorf("%w", ...) so errors.Is works.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7fedefaf-7a09-4a12-add0-2d8a3b80994c

📥 Commits

Reviewing files that changed from the base of the PR and between 4407df8 and 1a008de.

📒 Files selected for processing (1)
  • pkg/release-controller/release_info.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/release_info.go`:
- Around line 623-625: The error returns in ImageReferenceForComponent and the
JSON-unmarshal code paths are flattening errors by formatting with %v/%s;
replace those fmt.Errorf calls to wrap the underlying error with %w so error
chains are preserved (e.g., change fmt.Errorf("failed to resolve image for
component %q in %s: %v", componentName, releaseImage, err) to fmt.Errorf("failed
to resolve image for component %q in %s: %w", componentName, releaseImage,
err)), and do the same conversion for the JSON unmarshaling error returns
referenced (unmarshaling RPM list/extensions/RPM diff) to enable callers to use
errors.Is / errors.As. Ensure each modified return still includes the same
descriptive text but uses %w with the original err variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 09ad2ef5-4f0f-41bd-816d-ccfe731d7b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 1a008de and 629058f.

📒 Files selected for processing (1)
  • pkg/release-controller/release_info.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/release-controller/machine_os_tags.go (1)

138-142: Consider moving priority map to package level to avoid repeated allocations.

The prio map is allocated on every comparison call during sorting. For n tags, this creates O(n log n) small maps. While the impact is minimal for small slices, moving it to package level is more idiomatic.

♻️ Suggested refactor
+var machineOSTagPriority = map[string]int{
+	"rhel-coreos":   0,
+	"stream-coreos": 1,
+}
+
 // machineOSTagLess is a comparison function for machine OS tag names.
 // Returns true if a should sort before b. Prioritizes rhel-coreos (0) and stream-coreos (1),
 // then falls back to lexicographic ordering for all other tags.
 func machineOSTagLess(a, b string) bool {
-	prio := map[string]int{
-		"rhel-coreos":   0,
-		"stream-coreos": 1,
-	}
-	pa, okA := prio[a]
-	pb, okB := prio[b]
+	pa, okA := machineOSTagPriority[a]
+	pb, okB := machineOSTagPriority[b]
 	switch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/machine_os_tags.go` around lines 138 - 142, The local
priority map `prio` inside machineOSTagLess is allocated on every call; move it
to a package-level variable (e.g., unexported var machineOSTagPriority =
map[string]int{ "rhel-coreos": 0, "stream-coreos": 1 }) and update
machineOSTagLess to reference that package-level map instead of creating a new
one, removing the local declaration to avoid repeated allocations and improve
clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/release-controller/machine_os_tags.go`:
- Around line 91-93: Update the function comment for
machineOSDisplayNameFromAnnotations to reference the correct annotation key
"io.openshift.build.version-display-names" (the constant defined earlier)
instead of "release.openshift.io/version-display-names"; keep the rest of the
comment (format description and behavior) unchanged so it accurately describes
the code.

---

Nitpick comments:
In `@pkg/release-controller/machine_os_tags.go`:
- Around line 138-142: The local priority map `prio` inside machineOSTagLess is
allocated on every call; move it to a package-level variable (e.g., unexported
var machineOSTagPriority = map[string]int{ "rhel-coreos": 0, "stream-coreos": 1
}) and update machineOSTagLess to reference that package-level map instead of
creating a new one, removing the local declaration to avoid repeated allocations
and improve clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e37f1d9-aa12-44c7-8578-a8775c51a0df

📥 Commits

Reviewing files that changed from the base of the PR and between 1a008de and b8ceb2b.

📒 Files selected for processing (2)
  • pkg/release-controller/machine_os_tags.go
  • pkg/release-controller/release_info.go

@sdodson sdodson force-pushed the rhcos-stream-changelogs branch from 277bd7e to faa2e25 Compare April 10, 2026 13:22
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
pkg/release-controller/machine_os_tags.go (1)

91-93: ⚠️ Potential issue | 🟡 Minor

Comment references incorrect annotation key name.

The comment mentions release.openshift.io/version-display-names but the constant on line 28 defines io.openshift.build.version-display-names. The code is correct but the comment is misleading.

📝 Suggested fix
 // machineOSDisplayNameFromAnnotations extracts the machine-os display name from the
-// release.openshift.io/version-display-names annotation. Returns empty string if not found.
+// io.openshift.build.version-display-names annotation. Returns empty string if not found.
 // Typical format: "machine-os=Red Hat Enterprise Linux CoreOS" (comma-separated key=value pairs).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/machine_os_tags.go` around lines 91 - 93, Update the
comment above machineOSDisplayNameFromAnnotations to reference the correct
annotation key name used by the code (the constant
io.openshift.build.version-display-names) instead of
release.openshift.io/version-display-names so the docstring matches the actual
constant and behavior; ensure the example format still shows the "machine-os=Red
Hat Enterprise Linux CoreOS" comma-separated key=value pairs and remove or
replace the incorrect annotation name.
pkg/release-controller/release_info.go (3)

103-107: ⚠️ Potential issue | 🟡 Minor

Reject malformed cache keys by exact arity.

After strings.Split(key, "\x00"), the strings.Contains(..., "\x00") checks are redundant (an embedded separator becomes an extra field). The < N checks accept keys with extra parts. Use len(parts) != N for strict validation.

🔧 Proposed fix
 		case "rpmdiffstream":
-			if len(parts) < 4 {
+			if len(parts) != 4 {
 				s, err = "", fmt.Errorf("invalid rpmdiffstream key")
-			} else if strings.Contains(parts[1], "\x00") || strings.Contains(parts[2], "\x00") || strings.Contains(parts[3], "\x00") {
-				s, err = "", fmt.Errorf("invalid from/to/component")
 			} else {

Apply the same pattern (len(parts) != expectedCount and remove the strings.Contains checks) to rpmliststream, imagefor, and machineosstreams.

Also applies to: 134-137, 152-155, 160-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 103 - 107, The
rpmdiffstream branch currently uses `len(parts) < 4` and redundant
`strings.Contains(..., "\x00")` checks which allow extra fields; change the
arity checks to require exact counts (e.g. use `len(parts) != 4` for the
"rpmdiffstream" case) and remove the `strings.Contains` validations; apply the
same exact-arity pattern to the other affected cases (`rpmliststream`,
`imagefor`, `machineosstreams`) so each switch case verifies `len(parts) ==
expectedCount` (use `!=` to reject malformed keys) and drop the
`strings.Contains` checks.

573-573: ⚠️ Potential issue | 🟡 Minor

Convert remaining error wrappers from %v/%s to %w for proper error chain preservation.

Several error returns flatten the underlying error, preventing callers from using errors.Is/errors.As. This undermines the error-chain preservation being added elsewhere.

Lines needing conversion:

  • Line 573: unmarshaling RPM list: %s
  • Line 628: failed to resolve image for component: %v
  • Line 656: unmarshaling RPM list: %s
  • Line 684: unmarshaling extensions from cache: %s
  • Line 721: unmarshaling extensions: %s
  • Line 748: unmarshaling RPM diff: %s
  • Line 778: unmarshaling RPM diff: %s
🔧 Proposed fix (example)
-		return RpmList{}, fmt.Errorf("unmarshaling RPM list: %s", err)
+		return RpmList{}, fmt.Errorf("unmarshaling RPM list: %w", err)

Apply similar changes to all listed lines.

Also applies to: 628-628, 656-656, 684-684, 721-721, 748-748, 778-778

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` at line 573, Replace the error
formatting verbs for the listed error returns so the original errors are wrapped
(use %w) instead of flattened (%v/%s): update the return that currently does
`return RpmList{}, fmt.Errorf("unmarshaling RPM list: %s", err)` and the other
returns whose messages contain "failed to resolve image for component: %v",
"unmarshaling RPM list: %s" (second occurrence), "unmarshaling extensions from
cache: %s", "unmarshaling extensions: %s", "unmarshaling RPM diff: %s" (two
occurrences) to use `fmt.Errorf("...: %w", err)` so callers can use
errors.Is/errors.As; locate each by the unique message strings in
release_info.go and replace the formatting verb only.

52-59: ⚠️ Potential issue | 🟠 Major

Export a stable error sentinel for the semaphore saturation condition.

The error returned when rpmdbOCSlots is full (lines 403-406) is created fresh each time via fmt.Errorf. Callers cannot reliably detect this condition using errors.Is/errors.As. Export a sentinel and wrap it.

🔧 Proposed fix
+// ErrTooManyConcurrentRpmdbOC is returned when the rpmdb operation concurrency limit is reached.
+var ErrTooManyConcurrentRpmdbOC = stdErrors.New("too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations")
+
 var (
 	ocPath       = ""
 	rpmdbOCSlots = make(chan struct{}, maxConcurrentRpmdbOCCalls)
 )

Then at lines 403-406:

 	default:
-		return nil, nil, fmt.Errorf(
-			"too many concurrent oc adm release info --rpmdb/--rpmdb-diff operations (limit %d)",
-			maxConcurrentRpmdbOCCalls,
-		)
+		return nil, nil, fmt.Errorf("%w (limit %d)", ErrTooManyConcurrentRpmdbOC, maxConcurrentRpmdbOCCalls)
 	}

Also applies to: 403-406

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/release_info.go` around lines 52 - 59, Create and
export a sentinel error (e.g., var ErrRpmdbOCSlotsFull = errors.New("rpmdb oc
slots full")) and change the logic that detects rpmdbOCSlots being full (the
code that currently returns a freshly created fmt.Errorf at the rpmdbOCSlots
channel acquire failure) to return a wrapped error using fmt.Errorf("%w:
additional context", ErrRpmdbOCSlotsFull) so callers can reliably detect the
semaphore saturation with errors.Is; update any tests or callers that compared
the previous string error to use errors.Is against ErrRpmdbOCSlotsFull.
🧹 Nitpick comments (1)
pkg/release-controller/machine_os_tags.go (1)

135-155: Consider hoisting the priority map to avoid repeated allocation.

The prio map is recreated on every comparison call. For small slices this is negligible, but could be hoisted for clarity.

♻️ Optional improvement
+var machineOSTagPriority = map[string]int{
+	"rhel-coreos":   0,
+	"stream-coreos": 1,
+}
+
 // machineOSTagLess is a comparison function for machine OS tag names.
 // Returns true if a should sort before b. Prioritizes rhel-coreos (0) and stream-coreos (1),
 // then falls back to lexicographic ordering for all other tags.
 func machineOSTagLess(a, b string) bool {
-	prio := map[string]int{
-		"rhel-coreos":   0,
-		"stream-coreos": 1,
-	}
-	pa, okA := prio[a]
-	pb, okB := prio[b]
+	pa, okA := machineOSTagPriority[a]
+	pb, okB := machineOSTagPriority[b]
 	switch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/machine_os_tags.go` around lines 135 - 155, Hoist the
priority map out of machineOSTagLess so it isn't allocated on every call: move
the prio map literal into a package-level variable (e.g., var machineOSPriority
= map[string]int{...}) and update machineOSTagLess to reference
machineOSPriority instead of redeclaring prio; keep the same keys
("rhel-coreos", "stream-coreos") and preserve the existing logic that uses pa/pb
and okA/okB.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/release-controller/machine_os_tags.go`:
- Around line 91-93: Update the comment above
machineOSDisplayNameFromAnnotations to reference the correct annotation key name
used by the code (the constant io.openshift.build.version-display-names) instead
of release.openshift.io/version-display-names so the docstring matches the
actual constant and behavior; ensure the example format still shows the
"machine-os=Red Hat Enterprise Linux CoreOS" comma-separated key=value pairs and
remove or replace the incorrect annotation name.

In `@pkg/release-controller/release_info.go`:
- Around line 103-107: The rpmdiffstream branch currently uses `len(parts) < 4`
and redundant `strings.Contains(..., "\x00")` checks which allow extra fields;
change the arity checks to require exact counts (e.g. use `len(parts) != 4` for
the "rpmdiffstream" case) and remove the `strings.Contains` validations; apply
the same exact-arity pattern to the other affected cases (`rpmliststream`,
`imagefor`, `machineosstreams`) so each switch case verifies `len(parts) ==
expectedCount` (use `!=` to reject malformed keys) and drop the
`strings.Contains` checks.
- Line 573: Replace the error formatting verbs for the listed error returns so
the original errors are wrapped (use %w) instead of flattened (%v/%s): update
the return that currently does `return RpmList{}, fmt.Errorf("unmarshaling RPM
list: %s", err)` and the other returns whose messages contain "failed to resolve
image for component: %v", "unmarshaling RPM list: %s" (second occurrence),
"unmarshaling extensions from cache: %s", "unmarshaling extensions: %s",
"unmarshaling RPM diff: %s" (two occurrences) to use `fmt.Errorf("...: %w",
err)` so callers can use errors.Is/errors.As; locate each by the unique message
strings in release_info.go and replace the formatting verb only.
- Around line 52-59: Create and export a sentinel error (e.g., var
ErrRpmdbOCSlotsFull = errors.New("rpmdb oc slots full")) and change the logic
that detects rpmdbOCSlots being full (the code that currently returns a freshly
created fmt.Errorf at the rpmdbOCSlots channel acquire failure) to return a
wrapped error using fmt.Errorf("%w: additional context", ErrRpmdbOCSlotsFull) so
callers can reliably detect the semaphore saturation with errors.Is; update any
tests or callers that compared the previous string error to use errors.Is
against ErrRpmdbOCSlotsFull.

---

Nitpick comments:
In `@pkg/release-controller/machine_os_tags.go`:
- Around line 135-155: Hoist the priority map out of machineOSTagLess so it
isn't allocated on every call: move the prio map literal into a package-level
variable (e.g., var machineOSPriority = map[string]int{...}) and update
machineOSTagLess to reference machineOSPriority instead of redeclaring prio;
keep the same keys ("rhel-coreos", "stream-coreos") and preserve the existing
logic that uses pa/pb and okA/okB.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 42c8aead-4ae5-4150-9edc-8e0bedc7679c

📥 Commits

Reviewing files that changed from the base of the PR and between 1a008de and faa2e25.

📒 Files selected for processing (2)
  • pkg/release-controller/machine_os_tags.go
  • pkg/release-controller/release_info.go

sdodson and others added 5 commits April 10, 2026 12:19
Adds support for discovering and rendering multiple RHCOS streams in the
Node Image Info section of release changelogs. Starting with 4.21, releases
can include multiple CoreOS streams (e.g., rhel-coreos for el9.6 and
rhel-coreos-10 for el10.2).

New functionality:
- ListMachineOSStreams: Discovers available CoreOS streams in a release by
  identifying machine-os components with -extensions tags
- RpmListForStream/RpmDiffForStream: Query RPM data for specific streams
  using --rpmdb-image flag
- MachineOSTitle: Generates consistent markdown section titles with display
  names from release annotations
- Sorted output with rhel-coreos prioritized first

The extensions tag defaulting logic now handles empty extensionsTagName by
deriving the default from coreosTagName, ensuring backwards compatibility
with older releases.

Includes changelog-preview CLI tool for testing output locally without a
running cluster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Introduce a process-wide semaphore (buffered channel, capacity 10) that
gates all oc invocations using --rpmdb or --rpmdb-diff. These commands
pull image layers and build RPM databases, making them significantly
heavier than normal oc calls. When the limit is reached callers receive
an explicit error rather than silently queueing, keeping groupcache error
caching semantics correct (errors are not cached, so callers can retry).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Avoid repeated `oc image extract` calls for the same extensions image by
caching extensions.json in /tmp/rpmdb/extensions-<sha256>. The cache is
keyed by the image digest from the pullspec, so identical images (even
across different release versions) reuse the same cached file.

Uses atomic writes (temp file + rename) to prevent race conditions when
multiple processes write to the same cache file simultaneously. The atomic
rename ensures readers never observe partial writes - they see either the
complete old file, no file, or the complete new file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wraps errors with %w for proper error chain preservation and fixes cache
key validation to reject malformed keys.

Error wrapping: Converts error formatting from %v/%s to %w in RpmList,
RpmDiff, RpmDiffForStream, and ImageReferenceForComponent so callers can
use errors.Is/errors.As to detect specific error conditions.

Cache key validation: Uses exact arity checks (len(parts) != N) instead of
minimum checks (< N) to reject keys with extra fields. Removes dead
strings.Contains("\x00") checks that never fire after strings.Split.
Affects rpmdiffstream, rpmliststream, imagefor, and machineosstreams cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…port

Adds docstrings for all functions introduced or modified for multi-stream
RHCOS support.

rpmdb functions:
- RpmListForStream: Documents extensions metadata caching behavior
- RpmDiff: Describes semaphore usage and /tmp/rpmdb/ cache sharing
- RpmDiffForStream: Documents stream-specific diffs and delegation
- ImageReferenceForComponent: Documents component pullspec resolution

machine OS helpers:
- machineOSDisplayNameFromAnnotations: Annotation extraction and format
- sortMachineOSTags: Stable sort with priority ordering
- machineOSTagLess: Comparison function with explicit priority values

Corrects annotation key reference to match actual constant
(io.openshift.build.version-display-names).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sdodson sdodson force-pushed the rhcos-stream-changelogs branch from faa2e25 to 0113575 Compare April 10, 2026 16:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (9)
pkg/rhcos/rhcos_test.go (1)

248-265: Consider adding assertion for both components having populated URLs.

The test checks that "versionUrl" appears at least twice, but doesn't verify that both CoreOS components (RHCOS 9.x and RHCOS 10) were actually enriched. A more robust assertion could unmarshal the output and verify each component has the expected URL fields populated.

💡 Suggested enhancement
// Optionally unmarshal and check each component:
var result releasecontroller.ChangeLog
if err := json.Unmarshal([]byte(out), &result); err != nil {
    t.Fatal(err)
}
for i, c := range result.Components {
    if c.VersionUrl == "" {
        t.Errorf("component %d (%s) missing versionUrl", i, c.Name)
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhcos/rhcos_test.go` around lines 248 - 265, Update the
TestTransformJsonOutputDualCoreOS to unmarshal the returned JSON from
TransformJsonOutput and assert each component's VersionUrl is non-empty instead
of only counting occurrences; specifically, unmarshal into
releasecontroller.ChangeLog (or the appropriate struct used by
TransformJsonOutput), iterate result.Components and for each component (use
c.Name to identify RHCOS 9.x and RHCOS 10) assert c.VersionUrl != "" and fail
the test with a clear message if any component is missing its URL.
hack/changelog-preview/main.go (1)

42-52: Architecture normalization duplicates logic from http_changelog.go.

The architecture mapping (amd64→x86_64, arm64→aarch64) is duplicated between this file (lines 43-52) and http_changelog.go (lines 57-66). Consider extracting this to a shared helper function to maintain consistency.

💡 Suggested approach

Extract to a shared function in pkg/rhcos/ or a common utility package:

// NormalizeArchitecture converts Go architecture names to RHCOS names and returns the extension.
func NormalizeArchitecture(arch string) (name, ext string) {
    switch arch {
    case "amd64":
        return "x86_64", ""
    case "arm64":
        name = "aarch64"
        return name, "-" + name
    default:
        return arch, "-" + arch
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hack/changelog-preview/main.go` around lines 42 - 52, Architecture
normalization logic is duplicated between hack/changelog-preview/main.go and
http_changelog.go; extract it into a shared helper (e.g.,
NormalizeArchitecture(arch string) (name, ext string) placed in pkg/rhcos or a
common util package) that maps "amd64"→"x86_64", "arm64"→"aarch64" and returns
the extension (empty for x86_64, "-<name>" otherwise); then replace the local
switch in hack/changelog-preview/main.go (where archName and archExt are set)
and the switch in http_changelog.go (lines 57-66) to call NormalizeArchitecture
and use its returned values.
pkg/rhcos/rhcos.go (3)

380-388: Heading level logic could be simplified.

The heading level determination has overlapping conditions: when stream.Title != "", h becomes "####". When dual is true but stream.Title is empty, h also becomes "####". The dual flag only affects the heading when Title is empty, which is a subtle distinction.

Consider adding a brief comment explaining the heading hierarchy: ### for single-stream without title, #### for multi-stream or when a title subsection is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhcos/rhcos.go` around lines 380 - 388, The heading-level logic in
renderOneNodeStream is a bit subtle: set h to "###" by default, but use "####"
when either dual is true (multi-stream context) or when stream.Title is
non-empty (a titled subsection); update the code around the h assignment in
renderOneNodeStream to make that intent explicit (e.g., consolidate the
conditions into a single clear conditional) and add a short comment explaining
the hierarchy: "### for single-stream without title; #### for multi-stream or
when a title subsection is present," referencing h, stream.Title and dual so
future readers understand why both conditions map to "####".

362-378: RenderDualNodeImageInfo returns empty string for empty streams, but caller may expect different behavior.

When streams is empty, the function returns "". However, RenderNodeImageInfo (line 358-360) calls this with a single-element slice, so it won't hit this case. The empty check is defensive but may silently produce no output when called directly with an empty slice, which could mask issues upstream.

Consider whether returning an empty string is the intended behavior or if it should return a placeholder message indicating no streams were found.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhcos/rhcos.go` around lines 362 - 378, RenderDualNodeImageInfo currently
returns an empty string when streams is empty which can silently suppress
output; change the empty-case to emit a meaningful result instead—e.g., write a
short placeholder like "No node streams found." (or nothing) followed by
baseLayerFooter(markdown) and return that. Update the branch in
RenderDualNodeImageInfo that checks len(streams) == 0 to build and return the
footer plus an optional placeholder so callers always receive the footer and a
clear indicator when there are no streams.

44-46: RHCOS 10 regex pattern may not match the intended format.

The regex reMdRHCoS10Diff uses Red Hat Enterprise Linux CoreOS 10(?: \d+\.\d+)? which expects an optional space followed by a minor version like 10.0. However, looking at the test case in rhcos_test.go line 224, the input is "Red Hat Enterprise Linux CoreOS 10 10.0 upgraded from..." — note the space between "10" and "10.0". The current regex expects 10 immediately followed by an optional \d+\.\d+, which will match 10 10.0 only if the space is present. This works but may be fragile if the format varies.

Consider whether the intent is to match "CoreOS 10" (major version only) with an optional minor version suffix, or "CoreOS 10.x" as a combined version identifier. The current pattern works for the test case but the comment should clarify the expected format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhcos/rhcos.go` around lines 44 - 46, The RHCOS-10 regexes
(reMdRHCoS10Diff and reMdRHCoS10Version) are fragile about spacing between the
literal "10" and an optional minor version; update both patterns to allow
flexible whitespace and an optional minor version by changing the fragment
`10(?: \d+\.\d+)?` to `10(?:\s+\d+\.\d+)?` (or an equivalent that accepts one or
more spaces) and adjust the surrounding comment to state you match "CoreOS 10"
with an optional minor version suffix; keep the rest of the capture groups
intact so existing upgrade-from/to groups remain unchanged.
pkg/rhcos/node_image.go (1)

50-55: Error from ImageReferenceForComponent is silently ignored.

When ImageReferenceForComponent fails (line 50), errFrom is checked but not logged or returned. While this is intentional (to skip diff when the from-release lacks the stream), silently ignoring errors could mask unexpected failures.

Consider logging when the from-release doesn't have the stream at debug level, which would help with troubleshooting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rhcos/node_image.go` around lines 50 - 55, The call to
ImageReferenceForComponent(fromReleasePullSpec, ms.Tag) checks errFrom but
ignores it; update the branch so that when ImageReferenceForComponent returns an
error you emit a debug-level log containing the error and context (including
fromReleasePullSpec and ms.Tag) but preserve the existing behavior of skipping
the diff; i.e., inside the if/else around ImageReferenceForComponent (the
errFrom check) add a debug log statement using the package's existing logger
(same logger used elsewhere in pkg/rhcos/node_image.go) to record the error
before proceeding to skip calling info.RpmDiffForStream.
pkg/release-controller/machine_os_tags_test.go (1)

71-78: Test could cover additional MachineOSTitle fallback cases.

The test covers DisplayName present and empty (custom-stream) cases, but doesn't test the hardcoded fallbacks for rhel-coreos, rhel-coreos-10, or stream-coreos when DisplayName is empty. Consider adding cases to ensure these fallbacks work correctly.

💡 Suggested additional test cases
// Test hardcoded fallbacks
if got := MachineOSTitle(MachineOSStreamInfo{Tag: "rhel-coreos", DisplayName: ""}); got != "Red Hat Enterprise Linux CoreOS (`rhel-coreos`)" {
    t.Errorf("rhel-coreos fallback: got %q", got)
}
if got := MachineOSTitle(MachineOSStreamInfo{Tag: "rhel-coreos-10", DisplayName: ""}); got != "Red Hat Enterprise Linux CoreOS 10 (`rhel-coreos-10`)" {
    t.Errorf("rhel-coreos-10 fallback: got %q", got)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/machine_os_tags_test.go` around lines 71 - 78, Add
tests in TestMachineOSTitle to cover hardcoded fallback titles when
MachineOSStreamInfo.DisplayName is empty: call MachineOSTitle with Tag
"rhel-coreos" and empty DisplayName and assert it returns "Red Hat Enterprise
Linux CoreOS (`rhel-coreos`)", with Tag "rhel-coreos-10" assert "Red Hat
Enterprise Linux CoreOS 10 (`rhel-coreos-10`)", and with Tag "stream-coreos"
assert the expected CoreOS fallback title (using the same format with the tag);
locate these checks near existing TestMachineOSTitle which uses
MachineOSStreamInfo and MachineOSTitle.
cmd/release-controller-api/http_changelog.go (1)

183-192: Duplicate GetImageInfo call for the same toPull image.

GetImageInfo(c.releaseInfo, c.architecture, toPull) is called here, but the same call was already made in getChangeLog (line 34). Since renderChangeLog calls getChangeLog via goroutine and waits for results, the image info could potentially be passed through the channel or cached to avoid redundant work.

This is a minor efficiency concern since GetImageInfo likely has internal caching, but worth noting for optimization if this becomes a hot path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/release-controller-api/http_changelog.go` around lines 183 - 192, The
code redundantly calls GetImageInfo(c.releaseInfo, c.architecture, toPull) in
renderChangeLog after getChangeLog already fetched it; fix by having
getChangeLog (called by renderChangeLog) return or send the obtained ImageInfo
through the existing result channel (or store it in a small request-scoped
cache) so renderChangeLog can reuse that ImageInfo instead of calling
GetImageInfo again for the same toPull; update getChangeLog’s return/result type
and renderChangeLog to read and use the passed ImageInfo (references:
GetImageInfo, getChangeLog, renderChangeLog, variable toPull).
pkg/release-controller/machine_os_tags.go (1)

139-142: Priority map allocated on every comparison call.

The machineOSTagLess function allocates a new map on every comparison. For small slices this is negligible, but for consistency and efficiency, consider moving the priority map to package level or using a switch statement.

♻️ Optional refactor using switch
 func machineOSTagLess(a, b string) bool {
-	prio := map[string]int{
-		"rhel-coreos":   0,
-		"stream-coreos": 1,
-	}
-	pa, okA := prio[a]
-	pb, okB := prio[b]
+	priority := func(s string) (int, bool) {
+		switch s {
+		case "rhel-coreos":
+			return 0, true
+		case "stream-coreos":
+			return 1, true
+		default:
+			return 0, false
+		}
+	}
+	pa, okA := priority(a)
+	pb, okB := priority(b)
 	switch {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/release-controller/machine_os_tags.go` around lines 139 - 142, The
priority map is being recreated on every call to machineOSTagLess; extract the
prio map to a package-level variable (e.g., machineOSTagPriority map[string]int)
initialized once, or replace the map with a switch in machineOSTagLess to avoid
per-call allocation; update machineOSTagLess to reference machineOSTagPriority
(or the switch branches) instead of allocating prio inside the function so
comparisons reuse the single shared priority definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/release-controller-api/http_changelog.go`:
- Around line 183-192: The code redundantly calls GetImageInfo(c.releaseInfo,
c.architecture, toPull) in renderChangeLog after getChangeLog already fetched
it; fix by having getChangeLog (called by renderChangeLog) return or send the
obtained ImageInfo through the existing result channel (or store it in a small
request-scoped cache) so renderChangeLog can reuse that ImageInfo instead of
calling GetImageInfo again for the same toPull; update getChangeLog’s
return/result type and renderChangeLog to read and use the passed ImageInfo
(references: GetImageInfo, getChangeLog, renderChangeLog, variable toPull).

In `@hack/changelog-preview/main.go`:
- Around line 42-52: Architecture normalization logic is duplicated between
hack/changelog-preview/main.go and http_changelog.go; extract it into a shared
helper (e.g., NormalizeArchitecture(arch string) (name, ext string) placed in
pkg/rhcos or a common util package) that maps "amd64"→"x86_64",
"arm64"→"aarch64" and returns the extension (empty for x86_64, "-<name>"
otherwise); then replace the local switch in hack/changelog-preview/main.go
(where archName and archExt are set) and the switch in http_changelog.go (lines
57-66) to call NormalizeArchitecture and use its returned values.

In `@pkg/release-controller/machine_os_tags_test.go`:
- Around line 71-78: Add tests in TestMachineOSTitle to cover hardcoded fallback
titles when MachineOSStreamInfo.DisplayName is empty: call MachineOSTitle with
Tag "rhel-coreos" and empty DisplayName and assert it returns "Red Hat
Enterprise Linux CoreOS (`rhel-coreos`)", with Tag "rhel-coreos-10" assert "Red
Hat Enterprise Linux CoreOS 10 (`rhel-coreos-10`)", and with Tag "stream-coreos"
assert the expected CoreOS fallback title (using the same format with the tag);
locate these checks near existing TestMachineOSTitle which uses
MachineOSStreamInfo and MachineOSTitle.

In `@pkg/release-controller/machine_os_tags.go`:
- Around line 139-142: The priority map is being recreated on every call to
machineOSTagLess; extract the prio map to a package-level variable (e.g.,
machineOSTagPriority map[string]int) initialized once, or replace the map with a
switch in machineOSTagLess to avoid per-call allocation; update machineOSTagLess
to reference machineOSTagPriority (or the switch branches) instead of allocating
prio inside the function so comparisons reuse the single shared priority
definition.

In `@pkg/rhcos/node_image.go`:
- Around line 50-55: The call to ImageReferenceForComponent(fromReleasePullSpec,
ms.Tag) checks errFrom but ignores it; update the branch so that when
ImageReferenceForComponent returns an error you emit a debug-level log
containing the error and context (including fromReleasePullSpec and ms.Tag) but
preserve the existing behavior of skipping the diff; i.e., inside the if/else
around ImageReferenceForComponent (the errFrom check) add a debug log statement
using the package's existing logger (same logger used elsewhere in
pkg/rhcos/node_image.go) to record the error before proceeding to skip calling
info.RpmDiffForStream.

In `@pkg/rhcos/rhcos_test.go`:
- Around line 248-265: Update the TestTransformJsonOutputDualCoreOS to unmarshal
the returned JSON from TransformJsonOutput and assert each component's
VersionUrl is non-empty instead of only counting occurrences; specifically,
unmarshal into releasecontroller.ChangeLog (or the appropriate struct used by
TransformJsonOutput), iterate result.Components and for each component (use
c.Name to identify RHCOS 9.x and RHCOS 10) assert c.VersionUrl != "" and fail
the test with a clear message if any component is missing its URL.

In `@pkg/rhcos/rhcos.go`:
- Around line 380-388: The heading-level logic in renderOneNodeStream is a bit
subtle: set h to "###" by default, but use "####" when either dual is true
(multi-stream context) or when stream.Title is non-empty (a titled subsection);
update the code around the h assignment in renderOneNodeStream to make that
intent explicit (e.g., consolidate the conditions into a single clear
conditional) and add a short comment explaining the hierarchy: "### for
single-stream without title; #### for multi-stream or when a title subsection is
present," referencing h, stream.Title and dual so future readers understand why
both conditions map to "####".
- Around line 362-378: RenderDualNodeImageInfo currently returns an empty string
when streams is empty which can silently suppress output; change the empty-case
to emit a meaningful result instead—e.g., write a short placeholder like "No
node streams found." (or nothing) followed by baseLayerFooter(markdown) and
return that. Update the branch in RenderDualNodeImageInfo that checks
len(streams) == 0 to build and return the footer plus an optional placeholder so
callers always receive the footer and a clear indicator when there are no
streams.
- Around line 44-46: The RHCOS-10 regexes (reMdRHCoS10Diff and
reMdRHCoS10Version) are fragile about spacing between the literal "10" and an
optional minor version; update both patterns to allow flexible whitespace and an
optional minor version by changing the fragment `10(?: \d+\.\d+)?` to
`10(?:\s+\d+\.\d+)?` (or an equivalent that accepts one or more spaces) and
adjust the surrounding comment to state you match "CoreOS 10" with an optional
minor version suffix; keep the rest of the capture groups intact so existing
upgrade-from/to groups remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07cf25a5-ab40-4644-b616-8eb624b12c16

📥 Commits

Reviewing files that changed from the base of the PR and between faa2e25 and 0113575.

📒 Files selected for processing (11)
  • .gitignore
  • cmd/release-controller-api/http_changelog.go
  • hack/changelog-preview/main.go
  • pkg/release-controller/machine_os_tags.go
  • pkg/release-controller/machine_os_tags_test.go
  • pkg/release-controller/release_info.go
  • pkg/release-controller/semver.go
  • pkg/release-controller/semver_dual_rhcos_test.go
  • pkg/rhcos/node_image.go
  • pkg/rhcos/rhcos.go
  • pkg/rhcos/rhcos_test.go
✅ Files skipped from review due to trivial changes (1)
  • .gitignore
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/release-controller/release_info.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bradmwilliams
Copy link
Copy Markdown
Collaborator

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 10, 2026
@sdodson
Copy link
Copy Markdown
Member Author

sdodson commented Apr 10, 2026

@bradmwilliams Thanks for you help with this, I think this is about as good as I'm going to get it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants