Skip to content

OCPBUGS-78420: preserve user-provided configs when multiple on-disk assets coexist#10390

Open
pawanpinjarkar wants to merge 1 commit intoopenshift:mainfrom
pawanpinjarkar:consume-nmstateconfig
Open

OCPBUGS-78420: preserve user-provided configs when multiple on-disk assets coexist#10390
pawanpinjarkar wants to merge 1 commit intoopenshift:mainfrom
pawanpinjarkar:consume-nmstateconfig

Conversation

@pawanpinjarkar
Copy link
Contributor

@pawanpinjarkar pawanpinjarkar commented Mar 13, 2026

The asset store's dependency tracking previously marked any asset as "dirty" when a parent dependency was loaded from disk, triggering regeneration even when both parent and child were user-provided configuration files that should coexist.

This broke the OVE use case where agent-config.yaml (rendezvousIP only) and nmstateconfig.yaml (network config) can coexist without install-config.yaml. The installer would discard nmstateconfig.yaml, then fail to regenerate it since agent-config has no hosts.

Fix by adding "dirtyDueToUserConfigs" tracking to assetState:

  • Propagates through dependency chains including intermediate assets
  • Preserves on-disk assets when dirty only due to user configs AND not in state file (newly user-provided)
  • Still regenerates assets in state file when dependencies change (backward compatibility)

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Summary by CodeRabbit

  • Tests
    • Added test case for agent configuration with RendezvousIP-only setup and NMStateConfig integration
    • Extended test coverage for user-provided configuration scenarios with state file interactions, asset preservation, and regeneration behavior

…ssets coexist

The asset store's dependency tracking previously marked any asset as
"dirty" when a parent dependency was loaded from disk, triggering
regeneration even when both parent and child were user-provided
configuration files that should coexist.

This broke the OVE use case where agent-config.yaml (rendezvousIP only)
and nmstateconfig.yaml (network config) can coexist without
install-config.yaml. The installer would discard nmstateconfig.yaml,
then fail to regenerate it since agent-config has no hosts.

Fix by adding "dirtyDueToUserConfigs" tracking to assetState:
- Propagates through dependency chains including intermediate assets
- Preserves on-disk assets when dirty only due to user configs AND not
in state file (newly user-provided)
- Still regenerates assets in state file when dependencies change
(backward compatibility)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 13, 2026
@openshift-ci-robot
Copy link
Contributor

@pawanpinjarkar: This pull request references Jira Issue OCPBUGS-78420, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @bmanzari

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The asset store's dependency tracking previously marked any asset as "dirty" when a parent dependency was loaded from disk, triggering regeneration even when both parent and child were user-provided configuration files that should coexist.

This broke the OVE use case where agent-config.yaml (rendezvousIP only) and nmstateconfig.yaml (network config) can coexist without install-config.yaml. The installer would discard nmstateconfig.yaml, then fail to regenerate it since agent-config has no hosts.

Fix by adding "dirtyDueToUserConfigs" tracking to assetState:

  • Propagates through dependency chains including intermediate assets
  • Preserves on-disk assets when dirty only due to user configs AND not in state file (newly user-provided)
  • Still regenerates assets in state file when dependencies change (backward compatibility)

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Mar 13, 2026
@openshift-ci openshift-ci bot requested a review from bmanzari March 13, 2026 00:38
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Walkthrough

The changes add a new test case for agent configuration with RendezvousIP only and introduce tracking for configuration-induced dirtiness in the asset store to preserve on-disk assets and manage state-file interactions.

Changes

Cohort / File(s) Summary
Agent Config Test
pkg/asset/agent/image/unconfigured_ignition_test.go
Added test fixture and helper function for agent-config with RendezvousIP only; introduced new test case verifying generated files and enabled services with NMStateConfig.
Asset Store State Tracking
pkg/asset/store/store.go
Introduced dirtyDueToUserConfigs field to track dirtiness caused by user-provided configurations; adjusted load logic to preserve on-disk assets when parent dependencies are dirty due to user configs; extended asset state construction with dirtiness propagation constraints.
Asset Store Tests
pkg/asset/store/store_test.go
Added state-file simulation infrastructure; introduced new test function TestStoreFetchUserProvidedConfigs with multiple cases examining on-disk and state-file asset interactions; enhanced existing test setup to handle state-file assets and generation sources.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ❓ Inconclusive Test files use standard Go testing (testing.T) rather than Ginkgo framework, making Ginkgo-specific quality criteria inapplicable. Clarify whether quality checks should apply only to Ginkgo tests or also to standard Go tests with appropriate criteria, and provide actual test file changes for review.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: preserving user-provided configs when multiple on-disk assets coexist, which is the core issue fixed in this PR.
Stable And Deterministic Test Names ✅ Passed All test case names in the modified test files use static string literals with no dynamic content like timestamps, UUIDs, or generated identifiers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

@pawanpinjarkar
Copy link
Contributor Author

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tthvo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested a review from jhixson74 March 13, 2026 00:39
@openshift-ci-robot
Copy link
Contributor

@pawanpinjarkar: This pull request references Jira Issue OCPBUGS-78420, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @bmanzari

Details

In response to this:

The asset store's dependency tracking previously marked any asset as "dirty" when a parent dependency was loaded from disk, triggering regeneration even when both parent and child were user-provided configuration files that should coexist.

This broke the OVE use case where agent-config.yaml (rendezvousIP only) and nmstateconfig.yaml (network config) can coexist without install-config.yaml. The installer would discard nmstateconfig.yaml, then fail to regenerate it since agent-config has no hosts.

Fix by adding "dirtyDueToUserConfigs" tracking to assetState:

  • Propagates through dependency chains including intermediate assets
  • Preserves on-disk assets when dirty only due to user configs AND not in state file (newly user-provided)
  • Still regenerates assets in state file when dependencies change (backward compatibility)

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Summary by CodeRabbit

  • Tests
  • Added test case for agent configuration with RendezvousIP-only setup and NMStateConfig integration
  • Extended test coverage for user-provided configuration scenarios with state file interactions, asset preservation, and regeneration behavior

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/asset/store/store.go (1)

331-335: ⚠️ Potential issue | 🔴 Critical

Keep discarded on-disk assets attached to unfetched states.

When this branch sees foundOnDisk, presentOnDisk stays true but assetToStore remains nil. The new Line 326 fast path can then return from fetch() before this descendant is regenerated, and purge() later dereferences assetState.asset for every presentOnDisk entry on Lines 372-376. That will panic for graphs like A(on disk, not in state) -> C(on disk + in state) -> B(on disk).

Possible fix
 	case anyParentsDirty:
 		if foundOnDisk {
 			logrus.Warningf("%sDiscarding the %s that was provided in the target directory because its dependencies are dirty and it needs to be regenerated", indent, a.Name())
+			assetToStore = onDiskAsset
 		}
 		source = unfetched
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/store/store.go` around lines 331 - 335, In the anyParentsDirty
branch, when foundOnDisk is true we currently set source = unfetched but clear
assetToStore, which leaves presentOnDisk true while assetState.asset is nil and
later causes a panic in purge(); retain the on-disk asset instead of discarding
it: in the case anyParentsDirty block (where foundOnDisk is checked) leave
assetToStore (or reattach assetState.asset) populated when foundOnDisk is true
so fetch()'s fast path and later purge() see a non-nil assetState.asset; ensure
you only mark source = unfetched (so it will be regenerated) but do not drop
assetToStore/assetState.asset for presentOnDisk entries.
🧹 Nitpick comments (1)
pkg/asset/store/store_test.go (1)

521-549: Please cover the public Fetch() path here.

These cases call store.fetch() and seed stateFileAssets by hand, so they never exercise Fetch()'s saveStateFile() / purge() path in pkg/asset/store/store.go (Lines 84-89 and Lines 167-195). A Fetch() + newStore() round-trip would catch the exact regressions this change is trying to prevent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/store/store_test.go` around lines 521 - 549, The test currently
calls the unexported store.fetch() and mutates store.stateFileAssets directly,
so update the test to exercise the public Fetch() path by creating the store via
newStore() (or whichever public constructor creates a storeImpl and its on-disk
state file handling), seed the on-disk state by writing the expected state JSON
to the store's state file location (not by setting store.stateFileAssets), then
call store.Fetch(ctx, assets[target]) and assert no error; this will exercise
saveStateFile() and purge() paths. Reference storeImpl, Fetch(), fetch(),
newStore(), saveStateFile(), purge(), and stateFileAssets when locating the code
to change. Ensure the test still builds deterministically by using the same
tempDir and assets setup as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/asset/store/store.go`:
- Around line 320-329: The current logic treats !foundInStateFile as
"user-provided" but saveStateFile() still writes every onDiskSource asset into
the state file, so after one run the provenance is lost; fix by persisting
explicit provenance/source in the stored state and restoring it on load instead
of inferring from presence. Specifically: extend the state entry struct written
by saveStateFile() to include a Source/Provenance enum/flag (e.g.,
"user-provided" vs "generated") and set that when you populate entries from
onDiskSource/onDiskAsset; update the state load path to read that field and set
a new boolean (or use it to compute foundInStateFile/provenanceUserProvided)
used by the branch that checks anyParentsDirty, dirtyDueToUserConfigs,
foundOnDisk, and foundInStateFile so the branch uses the persisted provenance
value rather than mere presence; ensure saveStateFile still writes the
provenance for on-disk assets created by users.

---

Outside diff comments:
In `@pkg/asset/store/store.go`:
- Around line 331-335: In the anyParentsDirty branch, when foundOnDisk is true
we currently set source = unfetched but clear assetToStore, which leaves
presentOnDisk true while assetState.asset is nil and later causes a panic in
purge(); retain the on-disk asset instead of discarding it: in the case
anyParentsDirty block (where foundOnDisk is checked) leave assetToStore (or
reattach assetState.asset) populated when foundOnDisk is true so fetch()'s fast
path and later purge() see a non-nil assetState.asset; ensure you only mark
source = unfetched (so it will be regenerated) but do not drop
assetToStore/assetState.asset for presentOnDisk entries.

---

Nitpick comments:
In `@pkg/asset/store/store_test.go`:
- Around line 521-549: The test currently calls the unexported store.fetch() and
mutates store.stateFileAssets directly, so update the test to exercise the
public Fetch() path by creating the store via newStore() (or whichever public
constructor creates a storeImpl and its on-disk state file handling), seed the
on-disk state by writing the expected state JSON to the store's state file
location (not by setting store.stateFileAssets), then call store.Fetch(ctx,
assets[target]) and assert no error; this will exercise saveStateFile() and
purge() paths. Reference storeImpl, Fetch(), fetch(), newStore(),
saveStateFile(), purge(), and stateFileAssets when locating the code to change.
Ensure the test still builds deterministically by using the same tempDir and
assets setup as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c447f9d9-2bfd-4f13-8070-a190947c4147

📥 Commits

Reviewing files that changed from the base of the PR and between b9c6f8e and bb16a4f.

📒 Files selected for processing (3)
  • pkg/asset/agent/image/unconfigured_ignition_test.go
  • pkg/asset/store/store.go
  • pkg/asset/store/store_test.go

Comment on lines +320 to +329
// Special case: A parent is dirty, but the dirtiness is only due to user-provided configs from disk,
// AND this asset is also found on disk AND is not in the state file (newly user-provided).
// In this case, preserve the on-disk version instead of regenerating, since multiple user-provided
// configs should coexist (e.g., agent-config.yaml with rendezvousIP + nmstateconfig.yaml with
// network config in the OVE use case). If the asset IS in the state file, it was previously generated
// and should be regenerated when its inputs change.
case anyParentsDirty && dirtyDueToUserConfigs && foundOnDisk && !foundInStateFile:
logrus.Debugf("%sUsing %s loaded from target directory (preserving user-provided config despite on-disk dependencies)", indent, a.Name())
assetToStore = onDiskAsset
source = onDiskSource
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This preservation disappears after the first save/load cycle.

This branch treats !foundInStateFile as "user-provided", but saveStateFile() on Lines 172-180 still writes every onDiskSource asset into .openshift_install_state.json. After one successful run, the same on-disk config will now look state-file-backed on the next invocation, so it falls out of this branch and gets regenerated/discarded again even though it was never generated in the first place. You need to persist provenance/source, not infer it from state-file presence alone.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/store/store.go` around lines 320 - 329, The current logic treats
!foundInStateFile as "user-provided" but saveStateFile() still writes every
onDiskSource asset into the state file, so after one run the provenance is lost;
fix by persisting explicit provenance/source in the stored state and restoring
it on load instead of inferring from presence. Specifically: extend the state
entry struct written by saveStateFile() to include a Source/Provenance enum/flag
(e.g., "user-provided" vs "generated") and set that when you populate entries
from onDiskSource/onDiskAsset; update the state load path to read that field and
set a new boolean (or use it to compute foundInStateFile/provenanceUserProvided)
used by the branch that checks anyParentsDirty, dirtyDueToUserConfigs,
foundOnDisk, and foundInStateFile so the branch uses the persisted provenance
value rather than mere presence; ensure saveStateFile still writes the
provenance for on-disk assets created by users.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

@pawanpinjarkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-two-node-fencing-ipv4 bb16a4f link false /test e2e-agent-two-node-fencing-ipv4
ci/prow/unit bb16a4f link true /test unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// dirtyDueToUserConfigs is true if this asset is dirty only because of user-provided
// WritableAssets loaded from disk (not because of actual code changes or regeneration needs).
// This is used to preserve on-disk WritableAssets when multiple user configs coexist.
dirtyDueToUserConfigs bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store type is a very fundamental type for the whole installer and I don't think it's a good idea to introduce such low level change with potentially a huge impact area across the repo. It looks like more an issue on how we shaped the specific assets dependencies, so I'd prefer to review that first

/hold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is NOT the best way as it changes the core asset store functionality.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants