Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4914 +/- ##
==========================================
- Coverage 69.03% 69.01% -0.03%
==========================================
Files 552 553 +1
Lines 72996 73035 +39
==========================================
+ Hits 50395 50407 +12
- Misses 19601 19621 +20
- Partials 3000 3007 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dee0e05 to
0513e4e
Compare
0513e4e to
3187fdc
Compare
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
| task crdref-gen # Generate CRD API docs (run from cmd/thv-operator/) | ||
| ``` | ||
|
|
||
| ## Spec / metadata patching |
3a9a6d3 to
1c137de
Compare
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision.
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: kubernetes-expert, go-expert-developer, code-reviewer, toolhive-expert
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 2 | Other MCPServer-writing reconcilers still Update spec | 8/10 | HIGH | Discuss scope |
| 3 | Verbatim 4-line rationale comment duplicated 3× → helper | 7/10 | MEDIUM | Fix |
| 4 | Hardcoded finalizer string; other CRDs have exported constants | 7/10 | MEDIUM | Fix |
| 5 | Rule text misstates why optimistic lock defends finalizers | 7/10 | MEDIUM | Fix |
| 6 | envtest finalizer-add case can degenerate into a no-op | 7/10 | LOW | Fix |
| 7 | patchRecordingClient silently discards patch.Data() errors |
7/10 | LOW | Fix |
| 8 | envtest cleanupServer swallows errors; uses non-lock MergeFrom |
7/10 | LOW | Fix |
Overall
The controller migration itself is correct: DeepCopy → mutate → Patch(MergeFromWithOptions(orig, MergeFromWithOptimisticLock{})) is the right tool for the "external controller owns a disjoint spec field" problem, the finalizer-defense reasoning is sound, and MergeFromWithOptimisticLock strictly improves on Update for this use case.
The PR scope closes the clobber hazard in mcpserver_controller.go, but toolconfig_controller.go:167 and mcpexternalauthconfig_controller.go:186 still call r.Update(ctx, &server) on MCPServer — so an external SSA writer of spec.authzConfig can still lose its write on any MCPToolConfig or MCPExternalAuthConfig reconcile.
The medium findings are polish: a DRY helper for the 3-site copy-paste pattern, an exported MCPServerFinalizerName constant to match every other CRD, and a correction to how the rule file explains the optimistic lock's role in defending finalizers.
Documentation
The rule file addition in .claude/rules/operator.md is load-bearing for the convention being introduced. See finding #5 — the "merge-patch has no array-merge semantics" paragraph conflates two independent mechanisms. The optimistic lock does not change merge-patch's array-replacement behavior; it forces a requeue on staleness, which then triggers a fresh Get that observes the concurrently-added finalizer.
Generated with Claude Code
Relates to #4633. A shared helper that collapses the "DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom to a single call so remaining r.Status().Update sites can migrate without each one re-implementing the DeepCopy-before-mutate discipline by hand. Status writes deliberately use a plain merge patch, not an optimistic-lock one: the operator and the runtime reporter write disjoint status fields on every reconcile and must coexist without forcing a 409 on every overlap. Spec and metadata writes still require optimistic locking — see #4767 (tracking) / #4914 (MCPServer migration). The helper does not make every multi-writer pattern safe. The Caller contract in the doc comment spells out two footguns it cannot defend against: - JSON merge-patch replaces arrays wholesale for CRDs, so a writer to Status.Conditions must be the sole owner of the entire array. Any concurrent writer whose Patch lands between this caller's Get and Patch — on any condition type, including ones this caller does not touch — will be erased. A fresh Get narrows but does not eliminate the TOCTOU window. - A scalar re-computed from a stale snapshot that differs from the live value will overwrite a concurrent writer's update. The codified checklist for new call sites lives in .claude/rules/operator.md. Operational safeguards in the helper itself: - No-op mutations (empty merge-patch body) short-circuit before the wire call; the apiserver runs admission and audit for every PATCH regardless of body content, so steady-state reconcilers must not generate {} traffic. - A nil obj returns a descriptive error rather than panicking in the downstream type assertion. The helper lives in cmd/thv-operator/pkg/controllerutil alongside the existing controller helpers. It may move to a shared location later if a non-operator caller needs it. Pure addition — no call-site changes in this PR. Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover: - Happy path and DeepCopy isolation. - No-op mutate skips the wire call. - Disjoint-writer preservation: with a stale snapshot, a second writer owning disjoint scalar fields survives the patch. - Stale snapshot clobbers conditions from another writer — guards the documented Caller contract so the behaviour stays load- bearing against future changes. - Stale scalar computation: re-assigning the read value is a no-op at the wire level (concurrent writer preserved); assigning a differing value overwrites live state. - Nil obj is rejected with a descriptive error, no PATCH issued. - Error propagation: apiserver failures from Status().Patch are returned unchanged for the controller's requeue decision. Co-authored-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Fixes #4767. The controller writes finalizers, finalizer removal, and the restart-processed annotation via r.Update. Update is a full PUT, so any spec field the operator does not track — most importantly spec.authzConfig, which a separate authorization controller will soon own — is zeroed on every reconcile. Replace the three Update call sites with an optimistic-lock merge patch. The merge-patch body carries only fields the caller changed, so untouched fields never hit the wire and cannot be clobbered. MergeFromWithOptimisticLock sends resourceVersion as a precondition, giving 409-on-collision semantics for concurrent writers and defending metadata.finalizers (which has no array-merge semantics under merge-patch) against wholesale replacement when another controller is mid-flight adding its own entry. Tests: - Envtest suite writes spec.authzConfig out-of-band and asserts it survives both the finalizer-add reconcile and the restart-annotation reconcile. - Unit suite uses a patch-recording client to assert each migrated call site emits a body carrying the resourceVersion precondition — a deterministic wire-level signal that MergeFromWithOptimisticLock is in effect. A regression to plain MergeFrom would drop the precondition and fail the assertion independent of the higher-level survival test. Also: - .claude/rules/operator.md: new "Spec / metadata patching" section documenting the pattern for future CR writes. Status patching is a separate follow-up (#4633). - Rename the mock-client flag failOnMCPServerUpdate → failOnMCPServerWrite; it now intercepts both Update and Patch on MCPServer, so the name matches reality.
1c137de to
daf9140
Compare
Summary
r.Update.Updateis a full PUT, so any spec field the operator does not track — most importantlyspec.authzConfig, which an external operator will soon own via server-side apply — is zeroed on every reconcile.Updatecall sites with an optimistic-lock merge patch (client.MergeFromWithOptions(orig, client.MergeFromWithOptimisticLock{})). The merge-patch body carries only fields the caller changed, so untouched fields never hit the wire and cannot be clobbered.MergeFromWithOptimisticLocksendsresourceVersionas a precondition, giving 409-on-collision semantics for concurrent writers and defendingmetadata.finalizers(which has no array-merge semantics under merge-patch) against wholesale replacement when another controller is mid-flight adding its own entry.Fixes #4767.
Type of change
Test plan
task test)task lint-fix)Unit tests:
cmd/thv-operator/controllers/mcpserver_spec_patch_test.gouses a patch-recording client wrapper to assert that each of the three migrated call sites (AddFinalizer, RemoveFinalizer, RestartAnnotation) emits a merge-patch body carrying theresourceVersionprecondition — a deterministic wire-level signal thatMergeFromWithOptimisticLockis in effect. A regression to plainMergeFromwould drop the precondition and fail the assertion independent of the higher-level survival test.Envtest integration:
cmd/thv-operator/test-integration/mcp-server/mcpserver_spec_patch_integration_test.gocreates an MCPServer, writesspec.authzConfigout-of-band from a second client, and asserts the field survives both the finalizer-add reconcile and the restart-annotation reconcile. Run viatask operator-test-integration.Manual:
kubectl applyan MCPServer,kubectl patch --type=mergeto setspec.authzConfig, wait for reconcile,kubectl get mcpserver -o yaml—spec.authzConfigpersists across reconciles.Does this introduce a user-facing change?
No.
Implementation plan
Approved implementation plan
First PR of the
r.Update→r.Patchmigration tracked in #4767 and #4633. This PR covers Track A (#4767): the three MCPServer spec writes that must move to an optimistic-lock merge patch before an external operator starts writingspec.authzConfigvia SSA. Status-subresource migration (#4633) is the separate Track B, starting in PR 2.Key design decisions:
MergeFromWithOptimisticLock{}for MCPServer spec patches — preserves conflict-detection parity withUpdate, forces requeue on concurrent SSA instead of silent clobber, defends againstmetadata.finalizersarray replacement.Special notes for reviewers
cmd/thv-operator/controllers/mcpserver_controller.go:196(RemoveFinalizer),:212(AddFinalizer), and:768(restart annotation). Each follows the sameDeepCopy → mutate → Patch(MergeFromWithOptions)idiom.mcpserver_restart_test.gorenames the mock-client flagfailOnMCPServerUpdate→failOnMCPServerWritebecause it now intercepts bothUpdateandPatchon MCPServer..claude/rules/operator.mddocumenting the pattern for future CR writes.Generated with Claude Code