Skip to content

Introduce MutateAndPatchStatus helper for status writes#4942

Merged
ChrisJBurns merged 3 commits intomainfrom
update-to-patch-1b-status-helper
Apr 21, 2026
Merged

Introduce MutateAndPatchStatus helper for status writes#4942
ChrisJBurns merged 3 commits intomainfrom
update-to-patch-1b-status-helper

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented Apr 20, 2026

Summary

  • The operator's status writes use r.Status().Update, which sends a full
    PUT of the status stanza and zeros any field a disjoint writer owns
    (e.g. a runtime reporter writing status on the same CR). Switch status writes from Update to Patch across all controllers #4633 tracks
    migrating to merge-patch; this PR introduces the helper the migration
    will call at every site so the DeepCopy-before-mutate discipline is
    captured in one place instead of re-implemented per call.
  • MutateAndPatchStatus[T client.Object](ctx, c, obj, mutate) collapses
    the DeepCopy → mutate → Status().Patch(MergeFrom(original)) idiom to
    a single call. Plain merge patch, not optimistic-lock: status-subresource
    writers with disjoint fields must coexist without forcing a 409 on every
    overlap. Spec and metadata writes still require optimistic locking —
    see Fix r.Update to r.Patch plus regression guard in MCPServer controller #4767 (tracking) / Patch MCPServer spec instead of Update #4914 (MCPServer migration).
  • The helper does not make every multi-writer pattern safe. The doc
    comment's Caller contract spells out what it cannot defend against:
    Conditions-array writers must be the sole owner of the entire array
    (merge-patch replaces arrays wholesale for CRDs; +listType=map is
    only honored by strategic-merge-patch), and a scalar re-computed from
    a stale snapshot that differs from the live value will clobber a
    concurrent writer.
  • Operational safeguards: no-op mutations skip the wire call to avoid
    steady-state PATCH noise on fast-requeue reconcilers, and a nil obj
    returns a descriptive error instead of panicking in the .(T) assertion.
  • Codified checklist for new call sites added to .claude/rules/operator.md.
  • Pure addition — no existing call-site touched in this PR. Migration of
    the existing r.Status().Update call sites will follow.

Relates to #4633

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Tests in cmd/thv-operator/pkg/controllerutil/status_test.go cover:

  • Happy path: mutation applied in-memory AND reflected in the patch body.
  • DeepCopy isolation: regression guard for snapshot aliasing.
  • No-op mutate: helper skips the wire call when the diff is {}.
  • Disjoint-writer preservation: stale snapshot + second writer on a different
    scalar field; fresh Get confirms both the mutated field and the second
    writer's fields survive.
  • Stale snapshot clobbers conditions from another writer: guards the
    documented Caller contract (distinct condition types do not survive a
    stale writer's merge patch, so callers must own the entire array).
  • Stale scalar computation: re-assigning the read value is a wire-level
    no-op (concurrent writer preserved); assigning a differing value
    overwrites live state.
  • Nil obj rejected with a descriptive error, no PATCH issued.
  • Error propagation: apiserver failure returned unchanged for requeue.

Does this introduce a user-facing change?

No.

Generated with Claude Code

@github-actions github-actions Bot added the size/M Medium PR: 300-599 lines changed label Apr 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.00%. Comparing base (a543b35) to head (2987cb0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/controllerutil/status.go 85.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4942      +/-   ##
==========================================
- Coverage   69.00%   69.00%   -0.01%     
==========================================
  Files         552      553       +1     
  Lines       72996    73016      +20     
==========================================
+ Hits        50370    50383      +13     
- Misses      19628    19633       +5     
- Partials     2998     3000       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: kubernetes-expert, go-expert-developer, code-reviewer, toolhive-expert

Consensus Summary

# Finding Consensus Severity Action
1 Conditions-array doc claim is only true when writers Get before mutating 8/10 HIGH Clarify doc
2 Comment references non-existent mcpserver_spec_patch_test.go 10/10 MEDIUM Fix / remove
3 \#4767 referenced as if spec-side helper exists 8/10 MEDIUM Clarify doc
4 Typed-nil obj triggers raw panic 7/10 MEDIUM Document or guard
5 Stale-snapshot scalar-field clobber — doc gap 7/10 MEDIUM Document contract
6 No-op mutate still PATCHes 9/10 LOW One-line doc note
7 Envtest variant preferred over fake-client for merge-patch semantics 7/10 LOW Follow-up test

Overall

The helper itself is correct, small (48 lines), well-tested, and solves a real problem from #4633. The DeepCopyObject().(T) pattern, the decision to skip optimistic locking for status, the test structure (parallel, wire-body assertions, disjoint-writer regression), and the package placement are all sound. Four specialized agents reviewed the change and none disputed the fundamental design.

The findings cluster around documentation precision rather than the code behavior. The HIGH finding is the most important: the doc claim that "Conditions-array overlap stays safe via the per-condition idiom" is load-bearing on an invariant the helper does not enforce. client.MergeFrom produces RFC 7396 merge patches (confirmed in controller-runtime@v0.23.3/pkg/client/patch.gojsonpatch.CreateMergePatch) which REPLACE arrays wholesale — the +listType=map marker on Conditions is only honored by strategic-merge-patch, which CRDs don't support. So the claim is true in practice only if every writer Gets the live object immediately before calling the helper. Worth stating that precondition explicitly so future call-site migrations don't take the doc at face value.

The two cross-references (mcpserver_spec_patch_test.go and \#4767 "spec-side rationale") point at code that does not yet exist in the repo. \#4767 is an open tracking issue, not a landed pattern, and there is no spec-patch-recording test file anywhere. These are fixable with a one-line edit each — either remove the forward references or rephrase to make clear they describe planned follow-up work.

None of these block merge in my view; a follow-up or amended commit covers them. The migration PRs that land on top of this helper will benefit from tighter documentation here.

Documentation

Per CLAUDE.md, architecture doc sync was considered. This is an internal helper with no user-facing or architecture-doc surface — docs/arch/ does not need updates. The in-package doc.go was correctly updated.


Generated with Claude Code

Comment thread cmd/thv-operator/pkg/controllerutil/status.go Outdated
Comment thread cmd/thv-operator/pkg/controllerutil/status_test.go Outdated
Comment thread cmd/thv-operator/pkg/controllerutil/status.go Outdated
Comment thread cmd/thv-operator/pkg/controllerutil/status.go
Comment thread cmd/thv-operator/pkg/controllerutil/status.go
Comment thread cmd/thv-operator/pkg/controllerutil/status.go
Comment thread cmd/thv-operator/pkg/controllerutil/status_test.go
@jhrozek jhrozek force-pushed the update-to-patch-1b-status-helper branch from 8034149 to f6fab6a Compare April 21, 2026 16:04
@jhrozek jhrozek requested a review from rdimitrov as a code owner April 21, 2026 16:04
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/M Medium PR: 300-599 lines changed size/L Large PR: 600-999 lines changed labels Apr 21, 2026
@jhrozek jhrozek force-pushed the update-to-patch-1b-status-helper branch from f6fab6a to 9ba7b41 Compare April 21, 2026 16:44
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 21, 2026
Relates to #4633. A shared helper that collapses the
"DeepCopy → mutate → r.Status().Patch(MergeFrom(original))" idiom
to a single call so remaining r.Status().Update sites can migrate
without each one re-implementing the DeepCopy-before-mutate
discipline by hand.

Status writes deliberately use a plain merge patch, not an
optimistic-lock one: the operator and the runtime reporter write
disjoint status fields on every reconcile and must coexist without
forcing a 409 on every overlap. Spec and metadata writes still
require optimistic locking — see #4767 (tracking) / #4914
(MCPServer migration).

The helper does not make every multi-writer pattern safe. The
Caller contract in the doc comment spells out two footguns it
cannot defend against:

- JSON merge-patch replaces arrays wholesale for CRDs, so a writer
  to Status.Conditions must be the sole owner of the entire array.
  Any concurrent writer whose Patch lands between this caller's
  Get and Patch — on any condition type, including ones this
  caller does not touch — will be erased. A fresh Get narrows but
  does not eliminate the TOCTOU window.
- A scalar re-computed from a stale snapshot that differs from the
  live value will overwrite a concurrent writer's update.

The codified checklist for new call sites lives in
.claude/rules/operator.md.

Operational safeguards in the helper itself:

- No-op mutations (empty merge-patch body) short-circuit before
  the wire call; the apiserver runs admission and audit for every
  PATCH regardless of body content, so steady-state reconcilers
  must not generate {} traffic.
- A nil obj returns a descriptive error rather than panicking in
  the downstream type assertion.

The helper lives in cmd/thv-operator/pkg/controllerutil alongside
the existing controller helpers. It may move to a shared location
later if a non-operator caller needs it.

Pure addition — no call-site changes in this PR.

Tests (cmd/thv-operator/pkg/controllerutil/status_test.go) cover:

- Happy path and DeepCopy isolation.
- No-op mutate skips the wire call.
- Disjoint-writer preservation: with a stale snapshot, a second
  writer owning disjoint scalar fields survives the patch.
- Stale snapshot clobbers conditions from another writer — guards
  the documented Caller contract so the behaviour stays load-
  bearing against future changes.
- Stale scalar computation: re-assigning the read value is a no-op
  at the wire level (concurrent writer preserved); assigning a
  differing value overwrites live state.
- Nil obj is rejected with a descriptive error, no PATCH issued.
- Error propagation: apiserver failures from Status().Patch are
  returned unchanged for the controller's requeue decision.
@jhrozek jhrozek force-pushed the update-to-patch-1b-status-helper branch from 9ba7b41 to 02cdcbc Compare April 21, 2026 21:11
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 21, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 21, 2026
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Apr 21, 2026
@ChrisJBurns ChrisJBurns merged commit 2d809bc into main Apr 21, 2026
43 checks passed
@ChrisJBurns ChrisJBurns deleted the update-to-patch-1b-status-helper branch April 21, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants