Skip to content

Extract backendCheck type for per-backend goroutine lifecycle#4532

Merged
ChrisJBurns merged 3 commits intomainfrom
cburns/refactor-backend-check-lifecycle
Apr 3, 2026
Merged

Extract backendCheck type for per-backend goroutine lifecycle#4532
ChrisJBurns merged 3 commits intomainfrom
cburns/refactor-backend-check-lifecycle

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

Summary

Addresses review feedback from #4524: the activeChecks map held raw context.CancelFunc values, and goroutine lifecycle (cancel, create context, add to waitgroup, launch goroutine) was duplicated across Start() and UpdateBackends().

This introduces a backendCheck struct that owns both the backend snapshot and the cancel function for a single health check goroutine, with start() and stop() methods. activeChecks now holds *backendCheck instead of bare cancel funcs, which:

  • Consolidates goroutine lifecycle into one place
  • Eliminates the separate oldBackends map in UpdateBackends since activeChecks now carries backend data directly
  • Gives a natural place to hang future per-backend state (e.g., circuit breaker)

Type of change

  • Refactoring (no functional changes)

Changes

File Change
pkg/vmcp/health/monitor.go Add backendCheck struct with start()/stop() methods; update Monitor.activeChecks type from map[string]context.CancelFunc to map[string]*backendCheck; simplify Start() and UpdateBackends()

Test plan

  • All existing health monitor tests pass (no behavioral change)
  • Lint passes with zero issues
  • Full build succeeds

Generated with Claude Code

Introduce a backendCheck struct that owns the backend snapshot and cancel
function for a single health check goroutine. This replaces the raw
context.CancelFunc values in the activeChecks map and consolidates the
scattered goroutine start/stop mechanics into start() and stop() methods.

The change decouples per-backend lifecycle management from the Monitor's
coordination logic in Start() and UpdateBackends(), making the
restart-on-property-change logic clearer. It also eliminates the need for
a separate oldBackends map in UpdateBackends since activeChecks now
carries the backend data directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Apr 3, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please review this with the agents. only raise critical/high issues and medium issues that are non controversial. ignore low/informational style / nits. recommend approval or rejection.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @ChrisJBurns's task in 2m 50s —— View job


PR Review — Extract backendCheck for goroutine lifecycle

  • Gather context
  • Read changed file
  • Run code review analysis
  • Post findings and recommendation

Findings

No critical issues found. Two medium issues worth noting:

Medium 1: backendCheck methods require external locking but don't document it

backendCheck.start() must be called while holding both m.mu AND m.backendsMu (to prevent wg.Add() racing with wg.Wait() in Stop()). backendCheck.stop() must be called while holding backendsMu. Neither the struct nor its methods document these invariants — the nil-check in stop() implies it's self-contained/concurrent-safe, which it isn't.

This is contained today because backendCheck is unexported and only called from Monitor methods that hold the correct locks. But the encapsulation is misleading for future maintainers.

Fix this →

Medium 2: Brief goroutine overlap when a backend changes (pre-existing)

When UpdateBackends() detects a changed backend (URL updated), existing.stop() signals the old goroutine's context, but it may still be mid-performHealthCheck when the new goroutine starts. Both goroutines briefly write health results for the same backend ID to the status tracker. The status tracker is mutex-protected so there's no data race, but stale results from the old goroutine can briefly overwrite new ones. This behavior is identical to the pre-refactor code — not a regression.


Summary

This is a clean, well-motivated refactoring. The backendCheck struct successfully consolidates goroutine lifecycle, eliminates the redundant oldBackends map, and gives a natural home for future per-backend state. The locking protocol is preserved correctly. The behavior is semantically equivalent to the code before the refactor.

The two medium issues above are about documentation and pre-existing behavior, not correctness bugs. Both are non-blocking.

Recommendation: APPROVE

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.82%. Comparing base (bd4a04f) to head (c514a92).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4532      +/-   ##
==========================================
- Coverage   68.85%   68.82%   -0.03%     
==========================================
  Files         505      505              
  Lines       52318    52316       -2     
==========================================
- Hits        36022    36009      -13     
- Misses      13506    13515       +9     
- Partials     2790     2792       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add thread-safety comments to backendCheck struct, start(), and stop()
documenting that callers must hold the Monitor's locks. start() requires
both m.mu and m.backendsMu to prevent wg.Add() racing with wg.Wait() in
Stop(). stop() requires m.backendsMu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 3, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

@claude Please re-review this with the agents. only raise critical/high issues and medium issues that are non controversial. ignore low/informational style / nits. recommend approval or rejection.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @ChrisJBurns's task in 2m 19s —— View job


PR Review — Extract backendCheck for goroutine lifecycle

  • Gather context
  • Read changed file
  • Run code review analysis
  • Post findings and recommendation

Findings

No critical or high issues found. Two medium issues identified:

Medium 1: Redundant backendsMu.RLock in Stop() is misleading

Location: Stop(), lines 312–314

m.backendsMu.RLock()
backendCount := len(m.backends)
m.backendsMu.RUnlock()

Stop() holds m.mu at this point. UpdateBackends() also acquires m.mu first before touching m.backends, so UpdateBackends cannot interleave here. The backendsMu.RLock is redundant — m.mu already provides the needed exclusion — and creates a false impression of what it is protecting. The logged backendCount will always be consistent without it.

Fix this →

Medium 2: nolint:gosec // G118 references a nonexistent gosec rule

Location: backendCheck.start(), line 75

ctx, cancel := context.WithCancel(parentCtx) //nolint:gosec // G118 - cancel stored in bc.cancel, called via stop()

gosec rule G118 does not exist in the gosec ruleset. The suppression annotation may not be suppressing anything meaningful, and the incorrect rule ID could confuse future readers or linter tooling. This is pre-existing but worth correcting.

Fix this →


Summary

This is a clean, correct refactoring. The backendCheck struct properly encapsulates goroutine lifecycle, the locking protocol is accurately documented on start()/stop(), and no new data races or goroutine leaks are introduced. The two medium issues above are both pre-existing or annotation problems — neither is a correctness regression from this change.

Recommendation: APPROVE

The directive referenced G118 which does not exist in the gosec ruleset.
The cancel func is stored in bc.cancel on the next line so no linter
complains without the suppression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Apr 3, 2026
@ChrisJBurns ChrisJBurns merged commit fcc244f into main Apr 3, 2026
40 checks passed
@ChrisJBurns ChrisJBurns deleted the cburns/refactor-backend-check-lifecycle branch April 3, 2026 20:32
func (bc *backendCheck) start(parentCtx context.Context, m *Monitor, isInitial bool) {
ctx, cancel := context.WithCancel(parentCtx)
bc.cancel = cancel
m.wg.Add(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to reach back into the parent's state here. Fine for now, since we can gradually separate these two.

MatteoManzoni pushed a commit to DocPlanner/toolhive that referenced this pull request Apr 4, 2026
…ok#4532)

* Extract backendCheck type for per-backend goroutine lifecycle

Introduce a backendCheck struct that owns the backend snapshot and cancel
function for a single health check goroutine. This replaces the raw
context.CancelFunc values in the activeChecks map and consolidates the
scattered goroutine start/stop mechanics into start() and stop() methods.

The change decouples per-backend lifecycle management from the Monitor's
coordination logic in Start() and UpdateBackends(), making the
restart-on-property-change logic clearer. It also eliminates the need for
a separate oldBackends map in UpdateBackends since activeChecks now
carries the backend data directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Document locking requirements on backendCheck methods

Add thread-safety comments to backendCheck struct, start(), and stop()
documenting that callers must hold the Monitor's locks. start() requires
both m.mu and m.backendsMu to prevent wg.Add() racing with wg.Wait() in
Stop(). stop() requires m.backendsMu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove bogus nolint:gosec directive on context.WithCancel

The directive referenced G118 which does not exist in the gosec ruleset.
The cancel func is stored in bc.cancel on the next line so no linter
complains without the suppression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants