Skip to content

Persist config ref condition on recovery from transient errors#4533

Merged
ChrisJBurns merged 2 commits intomainfrom
fix/status-condition-persistence-4511
Apr 3, 2026
Merged

Persist config ref condition on recovery from transient errors#4533
ChrisJBurns merged 2 commits intomainfrom
fix/status-condition-persistence-4511

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

Summary

When a referenced MCPOIDCConfig or MCPTelemetryConfig became temporarily unavailable and then returned with the same content hash, the OIDCConfigRefValidated / TelemetryConfigRefValidated condition was set to True in memory but never persisted — the Status().Update() call only ran inside the "hash changed" branch. This left a stale False condition visible in kubectl until the next hash-changing event.

The fix tracks whether the condition is transitioning from non-True to True and includes that in the "needs status update" decision. The fetch-and-validate logic is extracted into helper methods to keep cyclomatic complexity within the linter threshold.

Fixes #4511

Type of change

  • Bug fix

Changes

File Change
mcpserver_controller.go Extract fetchAndValidateOIDCConfig helper; persist status when condition transitions to True
mcpremoteproxy_controller.go Extract fetchAndValidateOIDCConfig helper; same condition persistence fix
mcpserver_telemetryconfig.go Same condition persistence fix for TelemetryConfigRefValidated
mcpserver_oidcconfig_test.go Add regression test: condition persisted when hash unchanged but condition was previously False

Test plan

  • Unit tests (task test) — all pass, including new regression test
  • Linting (task lint) — 0 issues

Special notes for reviewers

  • VirtualMCPServer is not affected by this bug because it uses a batched StatusManager that persists conditions independently of the hash-change branch.
  • The fetchAndValidateOIDCConfig extraction is a no-op refactor that moves existing error-handling logic to reduce cyclomatic complexity in the main handleOIDCConfig methods.

Generated with Claude Code

When a referenced MCPOIDCConfig or MCPTelemetryConfig became temporarily
unavailable and then returned with the same content hash, the
OIDCConfigRefValidated/TelemetryConfigRefValidated condition was set to
True in memory but never persisted because the status update only ran
inside the "hash changed" branch.

Track whether the condition is transitioning from non-True to True and
include that in the "needs update" decision. Extract fetch-and-validate
logic into helper methods to keep cyclomatic complexity within limits.

Fixes #4511

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Apr 3, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please review this with the agents. only raise critical/high issues and medium issues that are non controversial. ignore low/informational style / nits. recommend approval or rejection.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @ChrisJBurns's task in 2m 4s —— View job


Code Review

  • Gather context and read changed files
  • Review mcpserver_controller.go changes
  • Review mcpremoteproxy_controller.go changes
  • Review mcpserver_telemetryconfig.go changes
  • Review mcpserver_oidcconfig_test.go changes
  • Provide final recommendation

Summary

The fix is correct and the approach is sound. One medium issue found with the regression test; no critical or high issues.


Medium Issue: Regression Test Doesn't Actually Verify Persistence

File: cmd/thv-operator/controllers/mcpserver_oidcconfig_test.go, lines 224–235

The new test case "condition persisted when recovering from transient error with same hash" validates the wrong object. It checks:

for _, cond := range tt.mcpServer.Status.Conditions {
    if cond.Type == mcpv1alpha1.ConditionOIDCConfigRefValidated {
        assert.Equal(t, string(*tt.expectConditionStatus), string(cond.Status))

tt.mcpServer is the in-memory Go struct passed directly to handleOIDCConfig. meta.SetStatusCondition mutates this struct in memory unconditionally — before Status().Update() is ever called. This means the test would pass even with the bug still present, because tt.mcpServer.Status.Conditions shows True regardless of whether the Kubernetes API was updated.

To actually prove the condition is persisted, the test must re-read from the fake client after the call:

var persisted mcpv1alpha1.MCPServer
require.NoError(t, fakeClient.Get(ctx, client.ObjectKeyFromObject(tt.mcpServer), &persisted))
// then check persisted.Status.Conditions

Without this, the regression test does not guard against a future revert of the needsUpdate logic.

Fix this →


What's Right

  • Core logic is correct: checking prevCondition before calling meta.SetStatusCondition ensures the needsUpdate flag is set before the condition changes to True, avoiding a race in its own evaluation.
  • No double-write risk introduced: the error paths in fetchAndValidateOIDCConfig call Status().Update() then return the error; this pre-existing pattern is unchanged.
  • Consistent across all three files: the same needsUpdate pattern is applied identically to mcpserver_controller.go, mcpremoteproxy_controller.go, and mcpserver_telemetryconfig.go.
  • Cyclomatic complexity reduction: the fetchAndValidateOIDCConfig extraction is clean and semantics-preserving.

Recommendation: Approve with fix requested

The logic fix is correct and the refactoring is safe. The medium issue (test checks in-memory state instead of persisted state) should be addressed before merging to ensure the regression test actually catches future regressions. No blocking correctness issues.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 35.38462% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.83%. Comparing base (d7e37b8) to head (de19062).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...-operator/controllers/mcpremoteproxy_controller.go 0.00% 32 Missing ⚠️
...d/thv-operator/controllers/mcpserver_controller.go 82.14% 3 Missing and 2 partials ⚠️
...-operator/controllers/mcpserver_telemetryconfig.go 0.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4533   +/-   ##
=======================================
  Coverage   68.83%   68.83%           
=======================================
  Files         505      505           
  Lines       52322    52342   +20     
=======================================
+ Hits        36015    36030   +15     
- Misses      13514    13521    +7     
+ Partials     2793     2791    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JAORMX
JAORMX previously approved these changes Apr 3, 2026
The previous regression test checked the in-memory Go struct after
handleOIDCConfig, which would pass even without the fix since
meta.SetStatusCondition mutates in memory before Status().Update() is
called. Replace with a standalone test that re-reads the MCPServer from
the fake client to verify the condition was actually persisted to the
API server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Apr 3, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please re-review to check if the original review comments has been addressed

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @ChrisJBurns's task in 42s —— View job


Re-review

  • Read original review feedback
  • Check if medium issue (test verifies in-memory state vs persisted state) was addressed
  • Provide final re-review verdict

Medium Issue: Resolved

The original concern was that the regression test checked tt.mcpServer.Status.Conditions (in-memory) rather than re-reading from the fake client, meaning the test would pass even with the bug present.

The fix: A new dedicated test function TestMCPServerReconciler_handleOIDCConfig_ConditionPersistedOnRecovery (lines 260–319) was added that correctly:

  1. Pre-populates the fake client with an MCPServer whose hash matches but whose condition is a stale False
  2. Calls handleOIDCConfig
  3. Re-reads from the fake client via fakeClient.Get(ctx, client.ObjectKeyFromObject(mcpServer), &persisted) (line 312)
  4. Asserts persisted.Status.Conditions — the API-server-persisted state, not the in-memory struct

This is exactly what the review requested. The test will now fail if the needsUpdate logic is reverted.


Recommendation: Approve

The core logic fix is correct, the regression test now properly verifies persistence against the fake client, and linting/unit tests pass. The secondary suggestion (adding analogous persistence tests for mcpremoteproxy_controller.go and mcpserver_telemetryconfig.go) remains a nice-to-have but is not blocking.

@ChrisJBurns ChrisJBurns merged commit 69ea2de into main Apr 3, 2026
40 checks passed
@ChrisJBurns ChrisJBurns deleted the fix/status-condition-persistence-4511 branch April 3, 2026 20:57
MatteoManzoni pushed a commit to DocPlanner/toolhive that referenced this pull request Apr 4, 2026
…lok#4533)

* Persist config ref condition on recovery from transient errors

When a referenced MCPOIDCConfig or MCPTelemetryConfig became temporarily
unavailable and then returned with the same content hash, the
OIDCConfigRefValidated/TelemetryConfigRefValidated condition was set to
True in memory but never persisted because the status update only ran
inside the "hash changed" branch.

Track whether the condition is transitioning from non-True to True and
include that in the "needs update" decision. Extract fetch-and-validate
logic into helper methods to keep cyclomatic complexity within limits.

Fixes stacklok#4511

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Verify condition persistence via fake client in regression test

The previous regression test checked the in-memory Go struct after
handleOIDCConfig, which would pass even without the fix since
meta.SetStatusCondition mutates in memory before Status().Update() is
called. Replace with a standalone test that re-reads the MCPServer from
the fake client to verify the condition was actually persisted to the
API server.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Status condition not persisted when OIDCConfigHash unchanged after transient error

2 participants