Skip to content

Conversation

@therealnb
Copy link
Contributor

Summary

This PR fixes a race condition in the discovery manager that causes duplicate capability aggregations when multiple concurrent requests arrive simultaneously at startup.

Problem

Without this fix, when multiple requests arrive concurrently:

  1. Each request checks the cache (all miss)
  2. Each request triggers expensive capability aggregation
  3. This results in duplicate work and "recursive-looking duplicate 'Starting capability aggregation' logs"

This is a distinct issue from the race condition fixed in #3450.

Solution

Uses singleflight.Group to deduplicate concurrent capability aggregation requests:

  • Multiple concurrent requests for the same cache key wait for a single aggregation
  • The first request performs the work, subsequent requests reuse the result
  • Includes a double-check of the cache after acquiring the singleflight lock
  • Prevents wasted compute and confusing duplicate logs

Changes

  • Added singleflight.Group field to DefaultManager
  • Modified Discover() method to wrap aggregation in singleFlight.Do()
  • Added double-check cache pattern after acquiring singleflight lock

Testing

Existing tests pass. The race condition is most visible under high concurrency at startup when multiple clients connect simultaneously.

Use singleflight to ensure only one capability aggregation happens per cache key, even when multiple concurrent requests arrive. This prevents the recursive-looking duplicate 'Starting capability aggregation' logs.
@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Jan 27, 2026
@therealnb therealnb marked this pull request as draft January 27, 2026 17:00
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.93%. Comparing base (2acfcfc) to head (8d92893).

Files with missing lines Patch % Lines
pkg/vmcp/discovery/manager.go 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3471      +/-   ##
==========================================
- Coverage   64.95%   64.93%   -0.02%     
==========================================
  Files         396      396              
  Lines       38492    38499       +7     
==========================================
- Hits        25001    25000       -1     
- Misses      11542    11550       +8     
  Partials     1949     1949              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants