Skip to content

Prevent response cache mutex from serializing concurrent HTTP action requests#21880

Merged
george-dorin merged 5 commits intodevelopfrom
CRE-3247-http-action-cache-mutex
Apr 16, 2026
Merged

Prevent response cache mutex from serializing concurrent HTTP action requests#21880
george-dorin merged 5 commits intodevelopfrom
CRE-3247-http-action-cache-mutex

Conversation

@george-dorin
Copy link
Copy Markdown
Contributor

@george-dorin george-dorin commented Apr 7, 2026

  • Fix responseCache.Fetch() holding a global mutex during the entire HTTP round-trip, which serialized all concurrent HTTP action requests
    • Split the mutex into short-lived locks around cache map access only
    • Add singleflight to deduplicate concurrent requests to the same cache key without blocking unrelated requests
  • Change cacheMu from sync.Mutex to sync.RWMutex — cache lookups use RLock, only writes take the exclusive lock
  • Simplify unlock pattern: unlock immediately after map access instead of branching across code paths
  • Remove unused workflowID parameter from isExpiredOrNotCached, Fetch, Set, and the ResponseCache interface — it was never used in cache key generation despite the comment claiming
    otherwise
  • Remove dead extractWorkflowIDFromRequestPath function (only caller was removed)
  • Fix misleading struct comment that claimed cache keys were prefixed by workflowID
  • Add tests verifying panic propagation through singleflight to all concurrent waiters
  • Use synctest for deterministic, non-flaky concurrency tests

Problem: responseCache.Fetch() acquires cacheMu.Lock() and holds it until the HTTP request completes. Every concurrent goroutine blocks on the mutex, causing sequential execution even when Store: false (caching disabled), as long as MaxAgeMs > 0

Fix: The mutex now only protects cache map reads and writes (microseconds). The HTTP request runs outside the lock. singleflight.Group ensures concurrent requests to the same cache key are deduplicated (one fetch, shared result), preserving the original cache semantics.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

👋 george-dorin, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

✅ No conflicts with other open PRs targeting develop

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 7, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

Comment thread core/services/gateway/handlers/capabilities/v2/response_cache.go Outdated
Comment thread core/services/gateway/handlers/capabilities/v2/response_cache.go Outdated
Comment thread core/services/gateway/handlers/capabilities/v2/response_cache_test.go Outdated
pavel-raykov
pavel-raykov previously approved these changes Apr 8, 2026
@mchain0 mchain0 requested review from bolekk and nolag April 9, 2026 12:42
Comment thread core/services/gateway/handlers/capabilities/v2/response_cache_test.go Outdated
@mchain0 mchain0 requested a review from jinhoonbang April 9, 2026 14:08
// requests to the same cache key — only one fetchFn runs, others
// wait for its result. Requests to different keys run in parallel.
result, _, _ := rc.flight.Do(cacheKey, func() (interface{}, error) {
return fetchFn(), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move the cacheKey read and write inside the flight.Do()? one requirement for HTTP action is best-effort single-attempt delivery. As it stands, for HTTP action that has caching feature enabled can fail to deduplicate outgoing HTTP request.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6e8de5b by moving the cache read and write inside flight.Do(). The singleflight key now stays active until the result is cached, closing the window. Added an inner cache re-check (double-checked locking) so back-to-back flights for the same key find the previous flight's cached result instead of fetching again.

Comment thread core/services/gateway/handlers/capabilities/v2/response_cache_test.go Outdated
@cl-sonarqube-production
Copy link
Copy Markdown

@george-dorin george-dorin added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@george-dorin george-dorin added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@george-dorin george-dorin added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@george-dorin george-dorin added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@george-dorin george-dorin added this pull request to the merge queue Apr 16, 2026
Merged via the queue into develop with commit 6b6d76b Apr 16, 2026
258 of 262 checks passed
@george-dorin george-dorin deleted the CRE-3247-http-action-cache-mutex branch April 16, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants