🌱 Ensure COS phase immutability for referenced object approach#2635
🌱 Ensure COS phase immutability for referenced object approach#2635pedjak wants to merge 1 commit intooperator-framework:mainfrom
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Enforces ClusterObjectSet phase immutability when objects are sourced via referenced Secrets by requiring referenced Secrets to be immutable and by detecting Secret content changes using recorded hashes.
Changes:
- Add Secret verification in the COS reconciler (immutable requirement + SHA-256 hash comparison) and block reconciliation when verification fails.
- Persist referenced Secret hashes in
.status.observedObjectContainersand extend CRD/applyconfigurations accordingly. - Add/extend unit + e2e coverage and update docs to reflect the new behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/steps/steps.go | Adds new e2e steps for triggering reconciliation, message fragment matching, and checking observed Secret hashes in status. |
| test/e2e/features/revision.feature | Adds e2e scenarios for “mutable Secret” and “recreated Secret with changed content” blocking behavior. |
| manifests/experimental.yaml | Extends CRD schema with .status.observedObjectContainers. |
| manifests/experimental-e2e.yaml | Extends e2e CRD schema with .status.observedObjectContainers. |
| internal/operator-controller/controllers/resolve_ref_test.go | Updates ref-resolution tests to use immutable Secrets. |
| internal/operator-controller/controllers/clusterobjectset_controller_internal_test.go | Adds unit tests for Secret hashing and referenced-Secret verification. |
| internal/operator-controller/controllers/clusterobjectset_controller.go | Implements referenced Secret verification + hashing and blocks reconciliation on violations. |
| helm/olmv1/base/operator-controller/crd/experimental/olm.operatorframework.io_clusterobjectsets.yaml | Mirrors CRD schema changes for Helm packaging. |
| docs/draft/concepts/large-bundle-support.md | Updates documented behavior/conventions for referenced Secrets (immutability + hash enforcement). |
| applyconfigurations/utils.go | Registers apply-configuration kind for ObservedObjectContainer. |
| applyconfigurations/api/v1/observedobjectcontainer.go | Adds generated apply configuration for the new status type. |
| applyconfigurations/api/v1/clusterobjectsetstatus.go | Adds apply support for .status.observedObjectContainers. |
| api/v1/zz_generated.deepcopy.go | Adds deepcopy support for ObservedObjectContainer and status field. |
| api/v1/clusterobjectset_types.go | Introduces ObservedObjectContainer API type and status field. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterobjectset_controller.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterobjectset_controller.go
Outdated
Show resolved
Hide resolved
| func ClusterObjectSetReportsConditionWithMessageFragment(ctx context.Context, revisionName, conditionType, conditionStatus, conditionReason string, msgFragment *godog.DocString) error { | ||
| msgCmp := alwaysMatch | ||
| if msgFragment != nil { | ||
| expectedMsgFragment := substituteScenarioVars(strings.Join(strings.Fields(msgFragment.Content), " "), scenarioCtx(ctx)) | ||
| msgCmp = func(actualMsg string) bool { | ||
| return strings.Contains(actualMsg, expectedMsgFragment) | ||
| } | ||
| } | ||
| return waitForCondition(ctx, "clusterobjectset", substituteScenarioVars(revisionName, scenarioCtx(ctx)), conditionType, conditionStatus, &conditionReason, msgCmp) | ||
| } |
There was a problem hiding this comment.
This step normalizes whitespace in the expected fragment but not in the actual condition message; if the controller formats messages with newlines/multiple spaces (common when wrapping), strings.Contains may fail unexpectedly. Consider normalizing actualMsg with the same strings.Fields + Join approach before Contains to reduce e2e flakiness while still matching by fragment.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
internal/operator-controller/controllers/clusterobjectset_controller.go:1
- Treating
Progressing=False, Reason=Blockedas terminal at the predicate level can prevent the controller from ever reconciling again, even if the user fixes the root cause (e.g., recreates the Secret as immutable, or changes COS spec/generation). If “Blocked” is intended to be recoverable, remove it from this predicate or gate suppression more narrowly (e.g., only suppress when generation hasn’t changed, while still allowing spec updates / explicit triggers to enqueue reconciles).
//go:build !standard
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterobjectset_controller.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterobjectset_controller.go
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2635 +/- ##
==========================================
+ Coverage 68.92% 68.94% +0.02%
==========================================
Files 140 141 +1
Lines 9905 10005 +100
==========================================
+ Hits 6827 6898 +71
- Misses 2566 2593 +27
- Partials 512 514 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
api/v1/clusterobjectset_types.go
Outdated
| // +listType=map | ||
| // +listMapKey=name | ||
| // +optional | ||
| ObservedObjectContainers []ObservedObjectContainer `json:"observedObjectContainers,omitempty"` |
There was a problem hiding this comment.
Instead of requiring the individual objects to never change (and keeping track of per-object hashes), WDYT about computing a single hash of the resulting phases content. If that remains stable, do we care if the underlying organization of the secrets changed?
If we do that, it also means we abstract away any details of how we ultimately end up with those phases (i.e. we don't care if it is inline vs from secrets vs from a bundle ref, etc.), which would be more futureproof.
There was a problem hiding this comment.
If we do that, it also means we abstract away any details of how we ultimately end up with those phases (i.e. we don't care if it is inline vs from secrets vs from a bundle ref, etc.), which would be more futureproof.
interesting idea! implemented, ptal.
|
|
||
| func (c *ClusterObjectSetReconciler) SetupWithManager(mgr ctrl.Manager) error { | ||
| skipProgressDeadlineExceededPredicate := predicate.Funcs{ | ||
| skipTerminallyBlockedPredicate := predicate.Funcs{ |
There was a problem hiding this comment.
A terminally blocked COS could become unblocked if the user puts back the expected content though, right? We should continue reconciling to check if the expected content is back in place.
There was a problem hiding this comment.
Done — removed ClusterObjectSetReasonBlocked from the skip predicate so blocked COS can be re-reconciled. If the original content is restored, the digest check passes and the COS resumes normal operation. Added an e2e recovery scenario to verify this.
api/v1/clusterobjectset_types.go
Outdated
| // object content at first successful reconciliation. | ||
| // | ||
| // +required | ||
| Hash string `json:"hash"` |
There was a problem hiding this comment.
By creating the resolution digest we still needing this one: https://github.com/operator-framework/operator-controller/pull/2610/changes#diff-c804d15cab34d4c7c30fcd68d40d0269084c11db0736d844bd6ee8f60262a924R68
If so, could we change the name PhaseDigest?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| currentPhases := make([]ocv1.ObservedPhase, 0, len(phases)) | ||
| for _, phase := range phases { | ||
| hash, err := computePhaseHash(phase) | ||
| if err != nil { | ||
| setRetryingConditions(cos, err.Error()) | ||
| return ctrl.Result{}, fmt.Errorf("computing phase hash: %v", err) | ||
| } | ||
| currentPhases = append(currentPhases, ocv1.ObservedPhase{Name: phase.GetName(), Hash: hash}) | ||
| } |
There was a problem hiding this comment.
The phase hash is computed from the boxcutter phases returned by buildBoxcutterPhases(), after the reconciler mutates each resolved object (e.g., injecting labels.OwnerNameKey). This couples the stored .status.observedPhases hashes to controller-internal mutations and COS metadata, so future controller changes (or label changes) could falsely appear as “referenced content changed” and permanently block reconciliation. Consider hashing the pre-mutation resolved manifests (or normalizing/stripping controller-added fields) so the hash reflects only the referenced object content you’re trying to make immutable.
There was a problem hiding this comment.
+1 to this comment. We should hash what the user provides.
There was a problem hiding this comment.
Done — digest is now computed before controller mutations (label injection, collision protection). The hash reflects only the user-provided resolved content.
internal/operator-controller/controllers/clusterobjectset_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if len(cos.Status.ObservedPhases) == 0 { | ||
| cos.Status.ObservedPhases = currentPhases | ||
| } else if err := verifyObservedPhases(cos.Status.ObservedPhases, currentPhases); err != nil { | ||
| l.Error(err, "resolved phases content changed, blocking reconciliation") | ||
| markAsNotProgressing(cos, ocv1.ClusterObjectSetReasonBlocked, err.Error()) | ||
| return ctrl.Result{}, nil | ||
| } |
There was a problem hiding this comment.
ObservedPhases is only populated when empty (line 172), but is never updated to include newly added phases. This breaks the immutability guarantee for phases added after initial reconciliation. When a new phase is added, it bypasses verification (line 796 checks only stored phases), and subsequent content changes to that phase won't be detected because it was never added to ObservedPhases. After verification passes, any new phases in currentPhases should be appended to cos.Status.ObservedPhases so they're tracked for future tamper detection.
There was a problem hiding this comment.
Not a valid concern. The COS spec (including phases) is immutable — enforced by a CEL validation rule that prevents spec changes after creation. Phases can never be "added after initial reconciliation." The set of phases is fixed at COS creation time, so ObservedPhases will always capture all phases on the first successful resolution, and subsequent reconciliations will always have the same set of phases to verify against.
camilamacedo86
left a comment
There was a problem hiding this comment.
It seems fine for me 🎉
Thank you for looking on that.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: camilamacedo86 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| // name is the phase name matching a phase in spec.phases. | ||
| // | ||
| // +required | ||
| Name string `json:"name"` |
There was a problem hiding this comment.
Include validation markers to match spec phase name?
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=63
// +kubebuilder:validation:XValidation:rule=`!format.dns1123Label().validate(self).hasValue()`,message="the value must consist of only lowercase alphanumeric characters and hyphens, and must start with an alphabetic character and end with an alphanumeric character."
There was a problem hiding this comment.
Done — added MinLength, MaxLength, and DNS1123 label validation markers matching spec phase name.
| // object content at first successful resolution. | ||
| // | ||
| // +required | ||
| Digest string `json:"digest"` |
There was a problem hiding this comment.
Include validation markers that reject invalid digest strings? For futureproofing, should we make the format of this string like the following?
<digestAlgoritm>:<digest>
There was a problem hiding this comment.
Done — digest now uses <algorithm>:<hex> format (e.g., sha256:abcdef...). Added validation marker: self.matches('^[a-z0-9]+:[a-f0-9]+$').
| // +listType=map | ||
| // +listMapKey=name | ||
| // +optional | ||
| ObservedPhases []ObservedPhase `json:"observedPhases,omitempty"` |
There was a problem hiding this comment.
Similar validations for the spec phases list?
// +kubebuilder:validation:XValidation:rule="self == oldSelf || oldSelf.size() == 0", message="observedPhases is immutable"
// +kubebuilder:validation:MaxItems=20
There was a problem hiding this comment.
Done — added immutability CEL rule (self == oldSelf || oldSelf.size() == 0) and MaxItems=20.
| if cnd := meta.FindStatusCondition(rev.Status.Conditions, ocv1.ClusterObjectSetTypeProgressing); cnd != nil && cnd.Status == metav1.ConditionFalse { | ||
| if cnd.Reason == ocv1.ReasonProgressDeadlineExceeded { | ||
| return false | ||
| } |
There was a problem hiding this comment.
Before and after are exactly equivalent here, right? If so, revert this change to avoid the extra conditional nesting?
There was a problem hiding this comment.
Reverted — flattened back to a single condition.
| func computePhaseDigest(phase boxcutter.Phase) (string, error) { | ||
| h := sha256.New() | ||
| fmt.Fprintf(h, "phase:%s\n", phase.GetName()) | ||
| for _, obj := range phase.GetObjects() { |
There was a problem hiding this comment.
Double checking: phase.GetObjects() returns the list of objects in a deterministic order?
There was a problem hiding this comment.
Yes — GetObjects() returns objects in the same order as specPhase.Objects, which is an immutable spec slice. The order is deterministic across reconciliations.
| // produces a canonical encoding with sorted map keys. | ||
| func computePhaseDigest(phase boxcutter.Phase) (string, error) { | ||
| h := sha256.New() | ||
| fmt.Fprintf(h, "phase:%s\n", phase.GetName()) |
There was a problem hiding this comment.
WDYT about this:
phaseMap := map[string]any{
"name": phase.GetName(),
"objects": phase.GetObjects(),
}
phaseData, err := json.Marshal(phaseMap)
// handleErr
h.Write(phaseData)There was a problem hiding this comment.
Done — switched to single json.Marshal of a map[string]any{"name": ..., "objects": ...}.
There was a problem hiding this comment.
Done — adopted this approach. computePhaseDigest now marshals a map with name + objects into a single JSON blob before hashing.
| if prev, ok := storedMap[c.Name]; ok && prev != c.Digest { | ||
| return fmt.Errorf( | ||
| "resolved content of phase %q has changed (expected digest %s, got %s): "+ | ||
| "a referenced object source may have been deleted and recreated with different content", | ||
| c.Name, prev, c.Digest) | ||
| } |
There was a problem hiding this comment.
Nit: collect all the diffs and report all of them?
There was a problem hiding this comment.
Done — verifyObservedPhases now collects all mismatched phases and reports them in a single error message.
| if secret.Immutable == nil || !*secret.Immutable { | ||
| return fmt.Errorf("secret %s/%s is not immutable: referenced secrets must have immutable set to true", ref.namespace, ref.name) | ||
| } |
There was a problem hiding this comment.
Same nit: collect all mutable secrets and report all of them?
There was a problem hiding this comment.
Done — verifyReferencedSecretsImmutable now collects all mutable secrets and reports them in a single error message.
| } | ||
|
|
||
| t.Run("deterministic for same content", func(t *testing.T) { | ||
| phase := makePhase("deploy", makeObj("v1", "ConfigMap", "cm1")) |
There was a problem hiding this comment.
Put 100 different objects into the phase and compute the digest 100 times and make sure it's always the same?
Seems like that would cover my earlier question about deterministic ordering of the objects with a very high probability.
Also maybe a different test with the same objects but provided in a different order, which would prove that the input order matches the output order, which would cause the digest to change?
There was a problem hiding this comment.
Done — added a "deterministic with many objects" test that creates 100 ConfigMaps, computes the digest 100 times, and asserts they're all identical. Also added a "different order produces different digest" test.
| t.Run("passes when current has new phase not in stored", func(t *testing.T) { | ||
| stored := []ocv1.ObservedPhase{{Name: "deploy", Digest: "sha256:abc123"}} | ||
| current := []ocv1.ObservedPhase{ | ||
| {Name: "deploy", Digest: "sha256:abc123"}, | ||
| {Name: "crds", Digest: "sha256:def456"}, | ||
| } | ||
| assert.NoError(t, verifyObservedPhases(stored, current)) | ||
| }) |
There was a problem hiding this comment.
In theory, this should never happen, right? But if it does happen shouldn't this return an error?
There was a problem hiding this comment.
Done — verifyObservedPhases now returns an error when stored is empty. Also added a length check (stored != current count) to catch phase count changes.
| require.NoError(t, err) | ||
| }) | ||
|
|
||
| t.Run("deduplicates refs to same secret", func(t *testing.T) { |
There was a problem hiding this comment.
Nit: the test doesn't seem like it proves anything about deduplicating refs to the same secret.
There was a problem hiding this comment.
Done — renamed to "checks secret only once when referenced multiple times".
test/e2e/features/revision.feature
Outdated
| And resource "deployment/test-httpd" is installed | ||
| And ClusterObjectSet "${COS_NAME}" has observed phase "resources" with a non-empty digest | ||
|
|
||
| Scenario: ClusterObjectSet blocks reconciliation when referenced Secret is not immutable |
There was a problem hiding this comment.
Nit: double negative
| Scenario: ClusterObjectSet blocks reconciliation when referenced Secret is not immutable | |
| Scenario: ClusterObjectSet blocks reconciliation when referenced Secret is mutable |
There was a problem hiding this comment.
Done — scenario title is now "ClusterObjectSet blocks reconciliation when referenced Secret is mutable".
| secret ${TEST_NAMESPACE}/${COS_NAME}-mutable-secret is not immutable: referenced secrets must have immutable set to true | ||
| """ | ||
|
|
||
| Scenario: ClusterObjectSet blocks reconciliation when referenced Secret content changes |
There was a problem hiding this comment.
This (or another) scenario that verifies that we will start reconciling again if the original content is put back and the hashes match once again?
There was a problem hiding this comment.
Done — added recovery steps to the content change scenario: after the COS is blocked, we restore the original secret content, trigger reconciliation, and verify the COS resumes with Progressing=True/Succeeded.
| // verifyObservedPhases compares current per-phase digests against stored | ||
| // digests. Returns an error naming the first mismatched phase. | ||
| func verifyObservedPhases(stored, current []ocv1.ObservedPhase) error { | ||
| storedMap := make(map[string]string, len(stored)) |
There was a problem hiding this comment.
A couple of checks we should add for doing the pairwise comparison:
if len(stored) == 0 {
return nil
}
if len(stored) != len(current) {
return fmt.Errorf("number of phases has changed (expected %d phases, got %d)", len(stored), len(current))
}There was a problem hiding this comment.
Done — verifyObservedPhases now checks: (1) stored is not empty, (2) len(stored) == len(current), (3) all digests match. Tests cover all three cases.
.bingo/gojq.mod
Outdated
There was a problem hiding this comment.
Something wrong with the rebase? We removed gojq in a different commit recently, right?
There was a problem hiding this comment.
yeah, Claude messed things up... fixed now.
ClusterObjectSet phases are immutable by design, but when objects are stored in external Secrets via refs, the Secret content could be changed by deleting and recreating the Secret. This enforces phase immutability by: - Verifying that referenced Secrets have `immutable: true` set - Computing a per-phase SHA-256 content digest of pre-mutation resolved objects and recording it in `.status.observedPhases` - Blocking reconciliation (`Progressing=False, Reason=Blocked`) if any referenced Secret is mutable or any phase's digest has changed - Allowing blocked COS to recover when original content is restored The digest is source-agnostic — it covers fully resolved phase content regardless of whether objects are inline or from Secrets, making it forward-compatible with future object sources. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
ClusterObjectSet phases are immutable by design, but when objects are stored in
external Secrets via refs, the Secret content could be changed by deleting and
recreating the Secret with the same name. This enforces phase immutability for
the referenced object approach by:
immutable: truesetand recording it in
.status.observedPhasesProgressing=False, Reason=Blocked) if any referencedSecret is mutable or any phase's resolved content digest has changed
The digest is source-agnostic — it covers the fully resolved phase content
regardless of whether objects are inline or referenced from Secrets, making it
forward-compatible with future object sources (e.g., bundle refs).
Blocked COS resources are recoverable: they can be re-reconciled when triggered
(e.g., via annotation), re-evaluating the condition.
Reviewer Checklist