Skip to content

Introduce MutateAndPatchSpec and adopt across spec-patch sites#5004

Open
jhrozek wants to merge 3 commits intostacklok:mainfrom
jhrozek:update-to-patch-spec-helper
Open

Introduce MutateAndPatchSpec and adopt across spec-patch sites#5004
jhrozek wants to merge 3 commits intostacklok:mainfrom
jhrozek:update-to-patch-spec-helper

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented Apr 22, 2026

Summary

  • PR Patch MCPServer spec instead of Update #4914 migrated three MCPServer r.Update call sites, plus two additional annotation-stamp sites in toolconfig_controller.go and mcpexternalauthconfig_controller.go, to an inline DeepCopy + r.Patch(MergeFromWithOptimisticLock{}) pattern. Each site carries the same four-line rationale comment, and the reviewer on PR Patch MCPServer spec instead of Update #4914 explicitly suggested extracting a helper. This PR implements that suggestion.
  • Introduces controllerutil.MutateAndPatchSpec[T] in cmd/thv-operator/pkg/controllerutil/patch.go, siblinged with the existing MutateAndPatchStatus. Same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow — but using MergeFromWithOptimisticLock for 409-on-stale-resourceVersion semantics, which is what defends spec.authzConfig (owned by a forthcoming authorization controller) from being zeroed on every reconcile. Intentionally omits the status helper's no-op short-circuit because the OL header always embeds metadata.resourceVersion into the body; docstring explains why.
  • Adopts the helper at all five call sites. Pure refactor — no behavior change. The four-line rationale comment is removed from each site and lives in the helper's doc comment.
  • Updates .claude/rules/operator.md "Spec / metadata patching" to reference MutateAndPatchSpec as the canonical pattern, replacing the inline snippet.

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test) — the new helper has five tests (happy path + OL wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj rejection, disjoint-spec-field preservation); the existing TestMCPServerSpecPatchesAreOptimisticLock remains green; all controller unit tests pass.
  • Linting (task lint-fix) — clean.
  • Manual testing (describe below) — ran the mcp-server, mcp-toolconfig, and mcp-external-auth envtest integration suites via ginkgo -p against setup-envtest 1.31.0. All green, including mcpserver_spec_patch_integration_test.go which writes spec.authzConfig out-of-band and asserts it survives both the finalizer-add reconcile and the restart-annotation reconcile.

Changes

File Change
cmd/thv-operator/pkg/controllerutil/patch.go New: MutateAndPatchSpec[T] helper.
cmd/thv-operator/pkg/controllerutil/patch_test.go New: five unit tests mirroring the status helper's shape.
cmd/thv-operator/pkg/controllerutil/doc.go Index entry for the new file.
cmd/thv-operator/controllers/mcpserver_controller.go Adopt at three sites (RemoveFinalizer, AddFinalizer, restart-annotation stamp).
cmd/thv-operator/controllers/toolconfig_controller.go Adopt at the config-hash annotation stamp in the fan-out loop.
cmd/thv-operator/controllers/mcpexternalauthconfig_controller.go Adopt at the config-hash annotation stamp in the fan-out loop.
.claude/rules/operator.md Point the "Spec / metadata patching" section at the helper.

Does this introduce a user-facing change?

No.

Implementation plan

Approved implementation plan

Approach

Mirror MutateAndPatchStatus exactly — same generic signature, same reflection-based nil-guard, same DeepCopy-then-mutate-then-Patch flow — with two deliberate differences:

  1. Use client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{}) instead of plain client.MergeFrom. Attaches the resourceVersion precondition so concurrent writers get 409-and-requeue instead of silent clobber.
  2. No no-op short-circuit. The OL header always emits metadata.resourceVersion into the body, so a naive {} check never fires; and none of the five call sites execute a "maybe no-op" mutate. Docstring explains the decision.

Each call site becomes a single MutateAndPatchSpec(ctx, r.Client, obj, func(o *T) { ... }) invocation with the existing mutation lifted verbatim into the closure; the stale four-line rationale comment is deleted from each site (the rationale now lives once in the helper's doc comment).

Reference implementations mirrored

  • cmd/thv-operator/pkg/controllerutil/status.go — nil-guard pattern, doc-comment shape, SPDX header.
  • cmd/thv-operator/pkg/controllerutil/status_test.gostatusPatchRecordingClient is the template for the spec helper's recording client (adapted to intercept top-level Patch instead of Status().Patch); TestMutateAndPatchStatus_RejectsNilObj is the exact template for the nil-guard test; error text parity: "MutateAndPatchSpec: obj must be non-nil".

MoE review panel (completed before push)

Three agents reviewed in parallel:

  • go-architect — flagged a missing disjoint-spec-field preservation test (the regression guard that would fire if the helper were ever swapped back to r.Update). Addressed: TestMutateAndPatchSpec_PreservesDisjointSpecFields in patch_test.go.
  • code-reviewer — LGTM. Confirmed all five sites converted faithfully, no behavior drift, imports clean, ctrlutil alias consistent with existing usage.
  • kubernetes-go-expert — confirmed OL wire-level semantics preserved, DeepCopy ordering correct, 409 propagation sound, no-op short-circuit decision justified. Raised a freshness concern on the two loop-iteration sites that on closer reading conflates two objects (our r.Status().Update(ctx, toolConfig) advances toolConfig.resourceVersion, not the server list-copy); the underlying "should fan-out sites fresh-Get before Patch" question is a pre-existing design discussion that this refactor does not change.

Special notes for reviewers

  • Stacks cleanly on merged main — PR Patch MCPServer spec instead of Update #4914 is in 74a11845b, this PR is the single commit 2873a373f on top.
  • The helper file is named patch.go rather than spec.go (my first cut) because the sibling status.go is already scoped to the status subresource, and spec.go as a generic name in a 20-file package was weak signal. Did not rename status.go to keep this PR's scope focused — that's a trivial rename a future PR can do if we want full symmetry.

The inline DeepCopy + Patch(MergeFromWithOptimisticLock) pattern from
stacklok#4914 landed at five MCPServer spec-write sites, each carrying the same
four-line rationale comment. Extract it into a MutateAndPatchSpec[T]
generic helper in cmd/thv-operator/pkg/controllerutil, siblinged with
the existing MutateAndPatchStatus.

The helper mirrors its sibling exactly -- same reflection-based
nil-guard, same DeepCopy-then-mutate-then-Patch flow -- with two
deliberate differences:

  - Uses MergeFromWithOptions(original, MergeFromWithOptimisticLock{})
    so concurrent writers get 409-and-requeue instead of silent clobber.
    This is the property that defends spec.authzConfig, which the
    forthcoming authorization controller will own, from being zeroed on
    every reconcile.
  - No no-op short-circuit. MergeFromWithOptimisticLock always emits
    metadata.resourceVersion into the body, so the status helper's
    "body == {}" check never fires; and every current spec call site
    carries a real mutation.

All five inline sites (mcpserver_controller.go finalizer add/remove and
restart-annotation stamp; toolconfig_controller.go and
mcpexternalauthconfig_controller.go config-hash stamps) adopt the
helper. Pure refactor at the call sites -- no behavior change.

Tests mirror the status helper's shape: happy-path + optimistic-lock
wire signal, DeepCopy isolation, 409 Conflict propagation, nil-obj
rejection, and disjoint-spec-field preservation (the regression guard
that would fire if the helper were ever swapped back to r.Update).
Existing TestMCPServerSpecPatchesAreOptimisticLock and the
mcpserver_spec_patch_integration_test.go envtest remain green.

The operator-rules section on spec/metadata patching is updated to
reference the helper as the canonical pattern.
@github-actions github-actions Bot added the size/M Medium PR: 300-599 lines changed label Apr 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 82.05128% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.05%. Comparing base (bc5b9a3) to head (9c7bcdf).

Files with missing lines Patch % Lines
...d/thv-operator/controllers/mcpserver_controller.go 58.33% 2 Missing and 3 partials ⚠️
...or/controllers/mcpexternalauthconfig_controller.go 85.71% 1 Missing ⚠️
.../thv-operator/controllers/toolconfig_controller.go 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5004      +/-   ##
==========================================
- Coverage   69.11%   69.05%   -0.07%     
==========================================
  Files         554      555       +1     
  Lines       73176    73171       -5     
==========================================
- Hits        50577    50528      -49     
- Misses      19590    19633      +43     
- Partials     3009     3010       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: code-reviewer, kubernetes-go-expert, unit-test-writer

Consensus Summary

# Finding Consensus Severity Action
1 No test pins documented "no short-circuit" divergence from status helper 8/10 MEDIUM Add test
2 Doc example uses controllerutil. prefix, not the ctrlutil alias used at every call site 7/10 LOW Fix example
3 Doc should note obj is mutated even when Patch errors 8/10 LOW One-sentence doc addition
4 DeepCopy isolation test could strengthen with NotContains "existing" 7/10 LOW One assertion
5 Nil mutate not guarded, asymmetric with nil-obj guard 7/10 LOW Decide the contract

Overall

Clean, well-scoped refactor. The helper mirrors MutateAndPatchStatus exactly with two deliberate divergences (MergeFromWithOptimisticLock, no no-op short-circuit), both justified in the docstring and the PR body. The five adoption sites are behaviorally equivalent to the pre-refactor inline code, the ctrlutil alias is consistent with existing usage, and the 4-line rationale comment is cleanly consolidated into the helper's godoc. Wire-level OL semantics and 409 propagation are preserved.

The findings are all polish-level — no blockers. The single MEDIUM is the absence of a no-op-mutate test that would pin the documented "no short-circuit" contract: a future refactor that copy-pasted the status helper's short-circuit into the spec helper would silently pass every existing test while breaking OL-on-every-reconcile semantics. The four LOWs are mechanical one-line fixes (doc example alias, doc sentence on post-error obj state, one test assertion, nil-mutate symmetry).

The #4767 regression guard (TestMutateAndPatchSpec_PreservesDisjointSpecFields) is the strongest test in the file and directly exercises the reason the helper exists.

Documentation

.claude/rules/operator.md update lands correctly — new example uses the ctrlutil. alias, the removed "don't mutate between DeepCopy and Patch" warning is now implicit in the helper's closure-based API, and the forward-reference to #4633 is resolved. doc.go index entry added in the expected position.


Generated with Claude Code

Comment thread cmd/thv-operator/pkg/controllerutil/patch.go Outdated
Comment thread cmd/thv-operator/pkg/controllerutil/patch.go
Comment thread cmd/thv-operator/pkg/controllerutil/patch.go
// "mutated":"after" by the time MergeFrom computes the diff, and the
// body would lack the new annotation. Its presence proves the snapshot
// captured the pre-mutation state.
assert.Contains(t, body, "mutated",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] DeepCopy isolation test could assert pre-existing annotation is absent from patch body (consensus 7/10)

The test seeds Annotations["existing"] = "before" and mutates to add Annotations["mutated"] = "after". Asserting the mutation is in the body proves it lands; asserting "existing" is absent is the cleaner signal that the snapshot captured pre-mutation state (an alias bug could still produce the mutated value otherwise).

Suggest adding after the existing assertions:

assert.NotContains(t, body, "existing",
    "pre-mutation fields must not leak into the merge-patch diff; body=%s", body)

Raised by: kubernetes-go-expert

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a pass on this one, open to revisiting if you disagree. Walking through the alias-bug case: if DeepCopy aliased original to obj, then at MergeFrom(original).Data(post) we'd have original == post, and an empty {} body — which already fails the existing assert.Contains(body, "mutated"). The proposed NotContains "existing" asserts a property of client.MergeFrom (unchanged fields are omitted from diffs), not of this helper — .claude/rules/testing.md's "Test Scope" rule steers against dependency testing. Happy to add it anyway if you'd rather have the redundancy.

Comment thread cmd/thv-operator/pkg/controllerutil/patch_test.go
ChrisJBurns
ChrisJBurns previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor low/mediums, can address if you want or leave

Three of five review findings applied (two skipped with rebuttal in the
PR thread):

  - Add TestMutateAndPatchSpec_NoOpMutateStillPatches. The existing
    AppliesMutationWithOptimisticLock test asserts resourceVersion is
    present in the body when a patch is issued; it cannot detect a
    short-circuit regression that issues zero patches. Pin the
    documented divergence from MutateAndPatchStatus (which DOES
    short-circuit) with a direct no-op-mutate wire-level assertion.

  - Fix the godoc "Typical usage" example in both patch.go and
    status.go to use the ctrlutil alias. operator.md already uses this
    alias, and every real caller aliases the package to avoid colliding
    with controller-runtime's own controllerutil package. patch.go is
    updated to match; status.go is brought along for symmetry.

  - Document that obj is mutated before Patch is issued, so on error
    the caller's in-memory copy is post-mutation. Callers must re-fetch
    rather than retrying in place. Added to both helpers. The standard
    reconciler return-and-requeue pattern is the correct retry path;
    the sentence names it explicitly so a future caller reading the
    doc does not invent an in-place retry loop.

Also tightened the "don't use this for status" cross-reference in
status.go to name the sibling MutateAndPatchSpec directly.
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 22, 2026
@jhrozek
Copy link
Copy Markdown
Contributor Author

jhrozek commented Apr 22, 2026

@ChrisJBurns addressed a subset :-)

@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants