Introduce MutateAndPatchSpec and adopt across spec-patch sites#5004
Introduce MutateAndPatchSpec and adopt across spec-patch sites#5004jhrozek wants to merge 3 commits intostacklok:mainfrom
Conversation
The inline DeepCopy + Patch(MergeFromWithOptimisticLock) pattern from stacklok#4914 landed at five MCPServer spec-write sites, each carrying the same four-line rationale comment. Extract it into a MutateAndPatchSpec[T] generic helper in cmd/thv-operator/pkg/controllerutil, siblinged with the existing MutateAndPatchStatus. The helper mirrors its sibling exactly -- same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow -- with two deliberate differences: - Uses MergeFromWithOptions(original, MergeFromWithOptimisticLock{}) so concurrent writers get 409-and-requeue instead of silent clobber. This is the property that defends spec.authzConfig, which the forthcoming authorization controller will own, from being zeroed on every reconcile. - No no-op short-circuit. MergeFromWithOptimisticLock always emits metadata.resourceVersion into the body, so the status helper's "body == {}" check never fires; and every current spec call site carries a real mutation. All five inline sites (mcpserver_controller.go finalizer add/remove and restart-annotation stamp; toolconfig_controller.go and mcpexternalauthconfig_controller.go config-hash stamps) adopt the helper. Pure refactor at the call sites -- no behavior change. Tests mirror the status helper's shape: happy-path + optimistic-lock wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj rejection, and disjoint-spec-field preservation (the regression guard that would fire if the helper were ever swapped back to r.Update). Existing TestMCPServerSpecPatchesAreOptimisticLock and the mcpserver_spec_patch_integration_test.go envtest remain green. The operator-rules section on spec/metadata patching is updated to reference the helper as the canonical pattern.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5004 +/- ##
==========================================
- Coverage 69.11% 69.05% -0.07%
==========================================
Files 554 555 +1
Lines 73176 73171 -5
==========================================
- Hits 50577 50528 -49
- Misses 19590 19633 +43
- Partials 3009 3010 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ChrisJBurns
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: code-reviewer, kubernetes-go-expert, unit-test-writer
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 1 | No test pins documented "no short-circuit" divergence from status helper | 8/10 | MEDIUM | Add test |
| 2 | Doc example uses controllerutil. prefix, not the ctrlutil alias used at every call site |
7/10 | LOW | Fix example |
| 3 | Doc should note obj is mutated even when Patch errors |
8/10 | LOW | One-sentence doc addition |
| 4 | DeepCopy isolation test could strengthen with NotContains "existing" |
7/10 | LOW | One assertion |
| 5 | Nil mutate not guarded, asymmetric with nil-obj guard |
7/10 | LOW | Decide the contract |
Overall
Clean, well-scoped refactor. The helper mirrors MutateAndPatchStatus exactly with two deliberate divergences (MergeFromWithOptimisticLock, no no-op short-circuit), both justified in the docstring and the PR body. The five adoption sites are behaviorally equivalent to the pre-refactor inline code, the ctrlutil alias is consistent with existing usage, and the 4-line rationale comment is cleanly consolidated into the helper's godoc. Wire-level OL semantics and 409 propagation are preserved.
The findings are all polish-level — no blockers. The single MEDIUM is the absence of a no-op-mutate test that would pin the documented "no short-circuit" contract: a future refactor that copy-pasted the status helper's short-circuit into the spec helper would silently pass every existing test while breaking OL-on-every-reconcile semantics. The four LOWs are mechanical one-line fixes (doc example alias, doc sentence on post-error obj state, one test assertion, nil-mutate symmetry).
The #4767 regression guard (TestMutateAndPatchSpec_PreservesDisjointSpecFields) is the strongest test in the file and directly exercises the reason the helper exists.
Documentation
.claude/rules/operator.md update lands correctly — new example uses the ctrlutil. alias, the removed "don't mutate between DeepCopy and Patch" warning is now implicit in the helper's closure-based API, and the forward-reference to #4633 is resolved. doc.go index entry added in the expected position.
Generated with Claude Code
| // "mutated":"after" by the time MergeFrom computes the diff, and the | ||
| // body would lack the new annotation. Its presence proves the snapshot | ||
| // captured the pre-mutation state. | ||
| assert.Contains(t, body, "mutated", |
There was a problem hiding this comment.
[LOW] DeepCopy isolation test could assert pre-existing annotation is absent from patch body (consensus 7/10)
The test seeds Annotations["existing"] = "before" and mutates to add Annotations["mutated"] = "after". Asserting the mutation is in the body proves it lands; asserting "existing" is absent is the cleaner signal that the snapshot captured pre-mutation state (an alias bug could still produce the mutated value otherwise).
Suggest adding after the existing assertions:
assert.NotContains(t, body, "existing",
"pre-mutation fields must not leak into the merge-patch diff; body=%s", body)Raised by: kubernetes-go-expert
There was a problem hiding this comment.
Taking a pass on this one, open to revisiting if you disagree. Walking through the alias-bug case: if DeepCopy aliased original to obj, then at MergeFrom(original).Data(post) we'd have original == post, and an empty {} body — which already fails the existing assert.Contains(body, "mutated"). The proposed NotContains "existing" asserts a property of client.MergeFrom (unchanged fields are omitted from diffs), not of this helper — .claude/rules/testing.md's "Test Scope" rule steers against dependency testing. Happy to add it anyway if you'd rather have the redundancy.
ChrisJBurns
left a comment
There was a problem hiding this comment.
Some minor low/mediums, can address if you want or leave
Three of five review findings applied (two skipped with rebuttal in the
PR thread):
- Add TestMutateAndPatchSpec_NoOpMutateStillPatches. The existing
AppliesMutationWithOptimisticLock test asserts resourceVersion is
present in the body when a patch is issued; it cannot detect a
short-circuit regression that issues zero patches. Pin the
documented divergence from MutateAndPatchStatus (which DOES
short-circuit) with a direct no-op-mutate wire-level assertion.
- Fix the godoc "Typical usage" example in both patch.go and
status.go to use the ctrlutil alias. operator.md already uses this
alias, and every real caller aliases the package to avoid colliding
with controller-runtime's own controllerutil package. patch.go is
updated to match; status.go is brought along for symmetry.
- Document that obj is mutated before Patch is issued, so on error
the caller's in-memory copy is post-mutation. Callers must re-fetch
rather than retrying in place. Added to both helpers. The standard
reconciler return-and-requeue pattern is the correct retry path;
the sentence names it explicitly so a future caller reading the
doc does not invent an in-place retry loop.
Also tightened the "don't use this for status" cross-reference in
status.go to name the sibling MutateAndPatchSpec directly.
|
@ChrisJBurns addressed a subset :-) |
Summary
r.Updatecall sites, plus two additional annotation-stamp sites intoolconfig_controller.goandmcpexternalauthconfig_controller.go, to an inlineDeepCopy + r.Patch(MergeFromWithOptimisticLock{})pattern. Each site carries the same four-line rationale comment, and the reviewer on PR Patch MCPServer spec instead of Update #4914 explicitly suggested extracting a helper. This PR implements that suggestion.controllerutil.MutateAndPatchSpec[T]incmd/thv-operator/pkg/controllerutil/patch.go, siblinged with the existingMutateAndPatchStatus. Same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow — but usingMergeFromWithOptimisticLockfor 409-on-stale-resourceVersion semantics, which is what defendsspec.authzConfig(owned by a forthcoming authorization controller) from being zeroed on every reconcile. Intentionally omits the status helper's no-op short-circuit because the OL header always embedsmetadata.resourceVersioninto the body; docstring explains why..claude/rules/operator.md"Spec / metadata patching" to referenceMutateAndPatchSpecas the canonical pattern, replacing the inline snippet.Type of change
Test plan
task test) — the new helper has five tests (happy path + OL wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj rejection, disjoint-spec-field preservation); the existingTestMCPServerSpecPatchesAreOptimisticLockremains green; all controller unit tests pass.task lint-fix) — clean.ginkgo -pagainstsetup-envtest1.31.0. All green, includingmcpserver_spec_patch_integration_test.gowhich writesspec.authzConfigout-of-band and asserts it survives both the finalizer-add reconcile and the restart-annotation reconcile.Changes
cmd/thv-operator/pkg/controllerutil/patch.goMutateAndPatchSpec[T]helper.cmd/thv-operator/pkg/controllerutil/patch_test.gocmd/thv-operator/pkg/controllerutil/doc.gocmd/thv-operator/controllers/mcpserver_controller.gocmd/thv-operator/controllers/toolconfig_controller.gocmd/thv-operator/controllers/mcpexternalauthconfig_controller.go.claude/rules/operator.mdDoes this introduce a user-facing change?
No.
Implementation plan
Approved implementation plan
Approach
Mirror
MutateAndPatchStatusexactly — same generic signature, same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow — with two deliberate differences:client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})instead of plainclient.MergeFrom. Attaches theresourceVersionprecondition so concurrent writers get 409-and-requeue instead of silent clobber.metadata.resourceVersioninto the body, so a naive{}check never fires; and none of the five call sites execute a "maybe no-op" mutate. Docstring explains the decision.Each call site becomes a single
MutateAndPatchSpec(ctx, r.Client, obj, func(o *T) { ... })invocation with the existing mutation lifted verbatim into the closure; the stale four-line rationale comment is deleted from each site (the rationale now lives once in the helper's doc comment).Reference implementations mirrored
cmd/thv-operator/pkg/controllerutil/status.go— nil-guard pattern, doc-comment shape, SPDX header.cmd/thv-operator/pkg/controllerutil/status_test.go—statusPatchRecordingClientis the template for the spec helper's recording client (adapted to intercept top-levelPatchinstead ofStatus().Patch);TestMutateAndPatchStatus_RejectsNilObjis the exact template for the nil-guard test; error text parity:"MutateAndPatchSpec: obj must be non-nil".MoE review panel (completed before push)
Three agents reviewed in parallel:
r.Update). Addressed:TestMutateAndPatchSpec_PreservesDisjointSpecFieldsinpatch_test.go.ctrlutilalias consistent with existing usage.r.Status().Update(ctx, toolConfig)advancestoolConfig.resourceVersion, not the server list-copy); the underlying "should fan-out sites fresh-Get before Patch" question is a pre-existing design discussion that this refactor does not change.Special notes for reviewers
main— PR Patch MCPServer spec instead of Update #4914 is in74a11845b, this PR is the single commit2873a373fon top.patch.gorather thanspec.go(my first cut) because the siblingstatus.gois already scoped to the status subresource, andspec.goas a generic name in a 20-file package was weak signal. Did not renamestatus.goto keep this PR's scope focused — that's a trivial rename a future PR can do if we want full symmetry.